CN116662552A - Financial text data classification method, device, terminal equipment and medium - Google Patents

Financial text data classification method, device, terminal equipment and medium Download PDF

Info

Publication number
CN116662552A
CN116662552A CN202310791575.9A CN202310791575A CN116662552A CN 116662552 A CN116662552 A CN 116662552A CN 202310791575 A CN202310791575 A CN 202310791575A CN 116662552 A CN116662552 A CN 116662552A
Authority
CN
China
Prior art keywords
text data
model
training
sample
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310791575.9A
Other languages
Chinese (zh)
Inventor
伏勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310791575.9A priority Critical patent/CN116662552A/en
Publication of CN116662552A publication Critical patent/CN116662552A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a financial text data classification method, a financial text data classification device, terminal equipment and a medium, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring target financial text data input by a user; performing feature extraction and classification processing on the target financial text data by utilizing a target classification network in the target large-scale language model to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of a target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on an original financial text data sample and an enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample. According to the method, the enhancement sample is generated by utilizing the pre-training language model, and the target classification network is obtained based on the training of the original financial text data sample and the enhancement sample, so that the accuracy of the target classification network is improved.

Description

Financial text data classification method, device, terminal equipment and medium
Technical Field
The application relates to the field of artificial intelligence, in particular to a financial text data classification method, a financial text data classification device, terminal equipment and a financial text data classification medium.
Background
The effectiveness of natural language processing (Natural Language Processing, NLP) depends to a large extent on the quality and quantity of training data. However, there are limited training data available in practical applications, privacy concerns or high cost of manual annotation, so it is challenging to train an accurate NLP model and popularize it well into invisible samples. Likewise, the problem of insufficient training data is particularly pronounced in small sample Learning (FSL) scenarios, where models trained based on source domain data are expected to be generalized from only a few examples of the target domain. In such a scenario, the data of the target domain is typically less, and the quality is also lower.
The existing FSL method mainly improves the learning and generalization capability of a model through better architecture design, uses a language model as a basis, and then uses limited sample and meta learning or a prompt-based method to carry out fine tuning on the model. However, the performance of these methods is still limited in nature by the quality and quantity of the source domain data and the target domain data. While text data enhancement is an effective strategy to overcome the limited sample inventory in many natural language processing tasks, the strategy widely used in the prior art is to data enhance training data to better capture data invariance and increase sample size.
However, the current text data enhancement method has the problems of poor quality and lack of reliability of the generated enhancement sample to a great extent, and may also have the problems of lack of diversity and lack of integrity of the generated enhancement sample, which finally results in poor accuracy of the network classification obtained by training.
Disclosure of Invention
The application provides a financial text data classification method, a device, terminal equipment and a medium, which are used for solving the problem of poor model classification accuracy caused by poor quality and lack of diversity of enhanced samples in the prior art.
In a first aspect, the present application provides a method for classifying financial text data, including:
acquiring target financial text data input by a user;
performing feature extraction and classification processing on the target financial text data by utilizing a target classification network in a target large-scale language model to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of the target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on the original financial text data sample and the enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample.
In a second aspect, the present application provides a financial text data classification apparatus comprising:
the acquisition module is used for acquiring target financial text data input by a user;
the classification module is used for carrying out feature extraction and classification processing on the target financial text data by utilizing a target classification network in the target large-scale language model to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of the target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on the original financial text data sample and the enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample.
In a third aspect, the present application provides a terminal device, including: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method as described in the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for performing the method according to the first aspect when executed by a processor.
In a fifth aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect.
The application provides a financial text data classification method, which comprises the following steps: acquiring target financial text data input by a user; performing feature extraction and classification processing on the target financial text data by utilizing a target classification network in the target large-scale language model to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of a target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on an original financial text data sample and an enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample.
According to the application, in a small sample learning scene, by introducing a pre-training language model, on the basis of acquiring the original financial text data sample, an enhanced sample with higher quality and diversity can be generated, so that more samples are used for training, and the classification accuracy of a target classification network is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flow chart of a method for classifying financial text data according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a LLM model according to an embodiment of the present application;
fig. 3 is a schematic diagram of an effect of the text data classification method provided by the embodiment of the application applied to the financial field;
fig. 4 is a schematic diagram of an effect of applying the text data classification method provided by the embodiment of the application to other fields;
fig. 5 is a schematic structural diagram of a financial text data classifying device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.
It should be noted that the method, the device, the terminal device and the medium for classifying the financial text data provided by the application can be used in the field of artificial intelligence and can also be used in other fields except the field of artificial intelligence, so that the application fields of the method, the device, the terminal device and the medium for classifying the financial text data are not limited.
The specific application scene of the application is a small sample learning scene, and the scene has the problem of insufficient training data. Because in such a scenario the data of the target area is typically less, and the quality is also lower. In general, to overcome such problems, a widely used strategy is to data enhance training data, thereby better capturing data invariance and increasing sample size. However, the current text data enhancement method may have the problems of poor quality and lack of reliability of the generated enhancement sample, and may also have the problems of lack of diversity and lack of integrity of the generated enhancement sample, which ultimately results in poor accuracy of the network classification obtained by training. The application provides a financial text data classification method, which aims to solve the technical problems in the prior art.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Example 1:
fig. 1 is a flow chart of a method for classifying financial text data according to an embodiment of the present application. As shown in fig. 1, the method of the present embodiment includes the following steps:
s10, acquiring target financial text data input by a user.
In an embodiment of the present application, the target financial text data is data related to banking, for example: deposit and deposit amount, withdrawal and withdrawal amount, transacting bank cards, logging off bank cards, and the like. In terms of the acquisition mode, the text data can be converted by voice input of a user, can be converted by sound recording provided by a sound recording device, and can also be directly input by the user by using an input device, so that the acquisition mode of the target financial text data is not particularly limited.
S20, performing feature extraction and classification processing on the target financial text data by utilizing a target classification network in the target large-scale language model to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of a target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on an original financial text data sample and an enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample.
In an embodiment of the present application, the large language model (Large Language Model, LLM), otherwise known as large language model, is an artificial intelligence model. The LLM model belongs to one of the autoregressive language models, and contains a target classification network using a converter decoder block as a model backbone. The original financial text data sample can be a sentence, the sentence has semantics, the semantics are equivalent to some marks of word vectors corresponding to the original financial text data sample, after each word vector is processed, some characteristics in the corresponding sentence can be displayed, and the marks are mainly used for marking some characteristics in the sentence. The enhanced sample is an auxiliary sample in text classification.
The embodiment of the application provides a pre-training language model for obtaining enhanced samples with consistent semantic features and diversity, wherein the pre-training language model can adopt ChartGPT, can also adopt other models, can be obtained based on human feedback reinforcement learning, and is specifically described in the following embodiment 2, and is not repeated here.
It should be noted that, when the method is applied to other fields, the target financial text data applied in the financial field is replaced by the text data in other fields, so that the same effect can be achieved. That is, the text data enhancing method based on ChartGPT (i.e., chatAug) may be applied to financial text data, and may also be applied to other text data in other fields, which is not particularly limited in the embodiment of the present application.
According to the application, in a small sample learning scene, by introducing a pre-training language model, on the basis of acquiring the original financial text data sample, an enhanced sample with higher quality and diversity can be generated, so that more samples are used for training, and the classification accuracy of a target classification network is improved.
Based on the above embodiments, the technical solution of the present application will be described in more detail below in conjunction with several specific embodiments.
Fig. 2 is a schematic structural diagram of LLM model according to an embodiment of the present application. As shown in fig. 2, the LLM model includes: a pre-training language model, which may be ChartGPT, and a target classification network, which includes a BERT model and a fully connected softmax classifier.
The transform-based bi-directional semantic coding characterization model (Bidirectional Encoder Representations from Transformers, BERT) is a language characterization model and is a deep bi-directional characterization model that specifically embodies that utilizes all of the context information simultaneously in characterizing each word. The specific results of the BERT model are not particularly limited in the embodiment of the application.
The structure of the LLM model refers to an overall framework of the large language model-based financial text data enhancement method. As shown in fig. 2, the present embodiment applies ChatGPT in LLM model to enhance data of original financial text data sample. In the embodiment, all types of samples are input into the ChatGPT in the LLM model, and the ChatGPT in the LLM model is prompted to generate an enhanced sample which keeps semantic consistency with the marks of the original financial text data sample. Based on the original financial text data sample and the strong sample, training an initial classification network based on the BERT model to obtain a target classification network, and evaluating the classification performance of the target classification network. The training steps are shown in the steps S21 to S24, and mainly comprise the following steps: based on the original financial text data samples and the enhanced samples, fine tuning is performed on an initial classification network derived based on the original sample set. Whereas ChatGPT in LLM model is mainly used for data enhancement to generate enhancement samples.
In a possible implementation manner, a training process of the target classification network includes the following steps:
s21, acquiring an original sample set, and establishing an initial classification network based on the original sample set; wherein the original sample set includes original financial text data samples and their corresponding tags.
In an embodiment of the present application, the original sample set, otherwise referred to as the base data set, is denoted as D b ={(x i ,y i ) An ith original financial text data sample x i Corresponding label y i E label space Yb.
S22, aiming at the labels of each category, generating a plurality of enhancement samples corresponding to the labels by utilizing a pre-training language model prompt based on the corresponding original financial text data samples.
In this step S22, the present embodiment may implement data input, and in this work of implementing data input, the present embodiment uses natural language processing of financial scenes as tasks, and performs the experiment of the present embodiment on a preset reference (i.e., accuracy of the target classification network including the BERT model and data enhancement effect of ChatGPT). Data enhancement is particularly desirable in natural language processing of financial scenes because the significant burden of manual annotation and strict privacy regulations make large-scale data tagging impractical. The present embodiment may select a dataset that is widely used in natural language processing and text mining studies as the original sample set. The original sample set consists of scientific summaries from about 2 tens of thousands of financial fields, all annotated as raw financial text data samples with task specific labels, which can be tags, including but not limited to: named entities, relationships between named entities, other semantic roles, etc. This dataset has been used to develop and evaluate machine learning models of various natural language processing tasks, including but not limited to: named entity recognition, relationship extraction, text classification, etc.
S23, combining all enhancement samples corresponding to all types of labels into an enhancement sample set.
In an embodiment of the present application, the enhanced sample set, alternatively referred to as the enhanced data set, the new data set, is denoted as D n ={(x j ,y j ) -a }; wherein the j-th enhanced sample x j Corresponding label y j E the label space Yn. Further, any sample in the new data set or the basic data set may be referred to as a marked sample.
The embodiment of the application gives a basic data set and a new data set. In a small sample learning scenario, the basic data set D b With a relatively large number of marked samples, while the new data set D n There are a small number of labeled samples. To evaluate the performance of small sample learning on a new data set, the goal of this embodiment is to train a BERT model with both the base data set and the new data set, while enabling the target classification network containing the BERT model to achieve satisfactory versatility on the new data set.
S24, training the initial classification network based on the original sample set and the enhanced sample set until the classification performance of the initial classification network reaches a preset value, and stopping training to obtain the target classification network.
The present embodiment is trained by an initial classification network comprising a BERT model, resulting in a target classification network that is capable of accurately classifying enhanced samples. The output characteristics of the top layer of the BERT model can be written as: z= [ z ] c ,z 1 ,...,z n ],
Wherein z is c Is a representation of class specific token CLS. For enhanced text classification, z c Is typically input into a task-specific classifier (i.e., a fully connected softmax classifier) for final prediction. The embodiment of the application can avoid the problems of overfitting phenomenon and lack of generalization capability caused by small sample learning.
In a possible implementation manner, in step S21, an initial classification network is established based on the original sample set, including the following steps:
s211, establishing a network framework comprising the BERT model and a fully-connected softmax classifier.
S212, in the process of pre-training the network framework, extracting features of the original financial text data samples in the original sample set by using the BERT model to obtain a feature set. The present embodiment may record the feature set as h N
S213, based on the feature set, predicting the label of the original financial text data sample by using the fully-connected softmax classifier to obtain a prediction result.
In step S213, the present embodiment predicts the prediction result using the following formula:
wherein OUT is the predicted result of the prediction,is W e Is used to determine the transposed matrix of (a),wherein W is e A matrix is embedded for the labels of the samples.
And S214, adjusting parameters of the network frame according to the deviation between the prediction result and the label corresponding to the original financial text data sample, and taking the pre-trained network frame as an initial classification network when the ending condition is met.
The initial classification network is a LLM-based text data classification model. In this embodiment, by executing steps S211 to S214, a text data classification model based on LLM can be established, and the specific establishment procedure is shown in steps S1 to S2, which are not described herein. The initial classification network provides a model framework for the target classification network, so that the establishment of the initial classification network can provide framework support for the target classification network with high classification accuracy in subsequent establishment.
In one possible implementation, the BERT model includes a plurality of Transformer blocks; this transducer block is alternatively referred to as a transducer decoder block. During the pre-training process, the LLM model works on a set of samples x= { X 1 ,x 2 ,...,x n Unsupervised distribution estimation of samples x consisting of m markers i Defined as x i =(s 1 ,s 2 ,...,s m ). The goal of pre-training is to maximize the following possibilities:
wherein L (x) i ) For the possibility, P is x i The probability of becoming an enhanced sample, which represents the similarity between the enhanced sample and the original sample, the higher the similarity, the more likely it is to become an enhanced sample, θ being a trainable parameter of the LLM model. The marking of the sample is represented by a mark embedding matrix and a position embedding matrix, wherein:
h 0 =x i W e +W p
wherein h is 0 For sample x i Is marked by W e To mark the embedded matrix, W p A matrix is embedded for the location.
Step S212, extracting features of the original financial text data samples in the original sample set by using the BERT model to obtain a feature set, wherein the method comprises the following steps:
s1, extracting first features from original financial text data samples by a first transducer block, and extracting second features from the features extracted from the previous transducer block by the rest of transducer blocks.
If the BERT model includes N transducer blocks, the samples extracted by the nth transducer block are characterized by:
h n =transformer_blocks(h n-1 )
wherein h is n For the features of the samples extracted by the nth transducer block, or the output of the top transducer block (or the top layer of the BERT model), h n-1 Features of the samples extracted for the N-1 th transducer block.
S2, forming a feature set by the first features and all the second features.
In the embodiment of the application, the feature node set can provide data support for predicting the label of the original financial text data sample by using the fully-connected softmax classifier to obtain a prediction result.
In a possible implementation, the purpose of steps S31-S33 is to apply a human feedback based reinforcement learning (Reinforcement Learning from Human Feedback, RLHF) method, where the pre-trained language model in the LLM model, after being pre-trained, will be fine-tuned by applying the RLHF method, and the RLHF method fine-tunes the pre-trained language model according to human feedback, so that the output of the pre-trained language model coincides with the user' S intention for various tasks. The RLHF method of LLM model includes three steps, S31 to S33:
before the step S22 of prompting to generate the plurality of enhanced samples corresponding to the tag by using the pre-training language model, the method further includes the following steps:
s31, taking the manually marked data as supervision data, and performing supervision fine tuning training on the SFT model.
Specifically, the LLM model will be further trained using the marker data. To facilitate human-computer interaction, the artificial intelligence training staff may act as a user and an artificial intelligence assistant to interact, such that the artificial intelligence assistant builds answers according to the prompts. In the embodiment of the application, answers (i.e. manually marked data) carrying prompts are used as supervision data to further train the pre-training language model. Further pre-training may result in a Supervisory Fine-tuning (SFT) model.
S32, establishing a reward model based on the trained SFT model.
The present embodiment trains a reward model based on the SFT model, inputs a prompt and response, and outputs a scalar reward. The outputs are sorted from best to worst by the tag to construct a sorted dataset.
S33, training by adopting a reinforcement learning mode based on a sequencing data set returned by the reward model to obtain a pre-training language model; wherein the ordered data set includes a plurality of data arranged in a tag scoring order.
In this step S33, the present embodiment uses the rewards model, and the LLM model can be fine-tuned by using the near-end policy optimization (Proximal Policy Optimization, PPO) method in the reinforcement learning mode.
The pre-training language model obtained through the steps S31 to S33 has a stronger semantic understanding function, and can enable the generated enhanced samples to have diversity and accuracy.
In a possible implementation manner, in the process of performing step S32, the method further includes:
s34, acquiring parameters of the rewarding model, and calculating a loss function of the rewarding model based on the parameters of the rewarding model.
The loss function between the two outputs is defined as follows:
wherein θ r Is a parameter of the reward model, r is the sum of the prompts in the sequence with this inputThe corresponding sequence number of the response, x, y are the prompt and response, respectively, and sigma is the pair of y w And y l Gamma is a pre-training loss coefficient controlling the intensity of the pre-training gradient, D c Is a dataset for human comparison.
In a possible implementation manner, step S33, training by adopting a reinforcement learning manner based on the sorting data set returned by the reward model, to obtain a pre-training language model, includes the following steps:
and fine tuning is carried out by adopting a near-end strategy optimization algorithm based on the sequencing data set returned by the reward model, so as to obtain the pre-training language model.
To repair performance regression on common natural language processing datasets, RLHF mixes pre-training gradients into PPO gradients, also called PPO-ptx, whose objective function is:
wherein, gamma is the pre-training loss coefficient for controlling the intensity of the pre-training gradient, D pretrain Is a pre-training distribution of the data,is a learned RL strategy model, θ r (x, y) is the feedback model, β is the reward coefficient controlling the penalty intensity KL divergence constraint, θ SFT Is a supervised training model.
In one possible implementation, the classification performance of the initial classification network is represented by an objective function; the objective function is composed of a cross entropy function and a contrast learning loss function.
The objective functions of learning the enhancement samples generated by data enhancement include a cross entropy function and a contrast learning loss function. The present embodiment will be z c Input to a fully connected layer as the final predicted fully connected softmax classifier:
wherein, the liquid crystal display device comprises a liquid crystal display device,for classification result, W c T Is W c Transposed matrix of W c 、b c Are trainable parameters, z, in the initial classification network c Is a representation of class specific token CLS.
The cross entropy function has the expression:
wherein C is the output dimension, the value of which is equal to the union of the tag space of the basic dataset and the new dataset, y dc Is the true value of the original sample. In order to fully utilize priori knowledge in the basic data set to guide learning of the new data set, the embodiment introduces a contrast loss function, so that sample representations of the same category are more compact, and sample representations of different categories are more separated. The contrast loss between pairs of samples of the same batch is defined as follows:
wherein v is i And v i ' zc for samples belonging to the same class; v i And v j Z for samples belonging to different classes c The method comprises the steps of carrying out a first treatment on the surface of the cos is the cosine similarity between two samples.
It should be noted that in the fine tuning stage of the model, the present embodiment uses only the cross entropy function as the objective function. In the small sample learning phase, the present embodiment uses the cross entropy function and the contrast learning loss function as the objective functions.
The embodiment of the application provides a financial text data classification method, which is a financial text data enhancement method based on a LLM model. In the embodiment, the classification performance of the LLM model-based financial text data classification method is tested through experiments performed on a data set in the financial field.
In addition, the embodiment makes some experiments on data sets in the financial field and the general field, and tests the classification performance of the text data classification method based on the LLM model.
As shown in table 1, the accuracy of BERT without data enhancement in the financial dataset was only 63.6%. However, the accuracy rate reaches 88.9% after text data enhancement by ChatGPT in this embodiment. These results demonstrate that data enhancement using ChatGPT can more effectively improve the performance of machine learning models in various applications in the financial field.
Table 1 results of application of methods to financial data sets
In fig. 3, the five-pointed star shape represents the text data classification method based on ChatGPT provided in the present embodiment, and all the circle shapes represent other models. As can be seen from table 1 and fig. 3, compared with the classification performance of the existing method for classifying the financial text data, the classification accuracy of the financial text data is improved by two digits, and further research on the reliability and the integrity of the generated enhanced samples shows that the method can generate more diversified enhanced samples, and meanwhile, the accuracy of the enhanced samples is maintained, so that the labels of the enhanced samples and the labels corresponding to the original financial text data samples have higher semantic similarity.
As shown in table 2, in the general data set, the text data classification method based on ChatGPT can obtain higher accuracy. In a general data set subjected to text data enhancement, an enhancement sample generated by data enhancement based on ChatGPT can improve the accuracy of a BERT model by 83.5%, and the accuracy of the BERT model without adopting the enhancement sample is only 79.2%.
Table 2 results of application of methods to generic data sets
In fig. 4, the five-pointed star shape represents the text data classification method based on ChatGPT provided in the present embodiment, and all the circle shapes represent other models.
As can be seen from table 2 and fig. 4, compared with the classification performance of the existing text data classification method, the classification accuracy of the text data is improved by two digits, and further research on the reliability and integrity of the generated enhanced samples shows that the method can generate more diversified enhanced samples, and meanwhile, the accuracy of the enhanced samples is maintained, so that the labels of the enhanced samples have higher semantic similarity with the labels corresponding to the original text data samples.
To sum up, this embodiment introduces ChatGPT as a data enhancement tool for small sample learning text classification. Specifically, each input sentence is restated into multiple (e.g., 6) additional sentences using ChatGPT, thereby increasing the sample. Compared with the prior data enhancement method, the LLM model constructed by the ChatGPT is more suitable for data enhancement, for the following reasons: chatGPT is obtained by pretraining a large-scale corpus, has wider semantic expression space, and is beneficial to improving the diversity of data enhancement. Because the fine tuning phase of ChatGPT introduces a large number of manually annotated samples, the language generated by ChatGPT is more consistent with the expression habits of humans. Therefore, the embodiment of the application can ensure that the LLM has higher quality compared with different expressions by reinforcement learning.
Example 3:
fig. 5 is a schematic structural diagram of a financial text data classifying device according to an embodiment of the present application. The apparatus of this embodiment may be in the form of software and/or hardware. As shown in fig. 5, the apparatus for classifying financial text data provided in this embodiment includes: an acquisition module 51 and a classification module 52. Wherein:
the acquiring module 51 is configured to acquire target financial text data input by a user.
The classification module 52 is configured to perform feature extraction and classification processing on the target financial text data by using a target classification network in the target large language model, so as to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of a target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on an original financial text data sample and an enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample.
In a possible implementation manner, the financial text data classifying device is used for:
acquiring an original sample set, and establishing an initial classification network based on the original sample set; wherein the original sample set includes original financial text data samples and their corresponding tags.
For each category of labels, generating a plurality of enhancement samples corresponding to the labels by utilizing a pre-trained language model prompt based on the corresponding original financial text data samples.
All enhancement samples corresponding to all types of tags are combined into an enhancement sample set.
Training an initial classification network based on the original sample set and the enhanced sample set until the classification performance of the initial classification network reaches a preset value, and stopping training to obtain a target classification network.
In a possible implementation manner, the financial text data classifying device is further configured to:
a network framework is built that includes a BERT model and a fully connected softmax classifier.
And in the process of pre-training the network framework, extracting features of the original financial text data samples in the original sample set by using the BERT model to obtain a feature set.
Based on the feature set, the label of the original financial text data sample is predicted by using the fully-connected softmax classifier, and a prediction result is obtained.
And adjusting parameters of the network frame according to the deviation between the prediction result and the label corresponding to the original financial text data sample, and taking the pre-trained network frame as an initial classification network when the ending condition is met.
In a possible implementation manner, the financial text data classifying device is further configured to:
taking the manually marked data as supervision data, and performing supervision fine tuning training on the SFT model;
establishing a reward model based on the trained SFT model;
training by adopting a reinforcement learning mode based on the sequencing data set returned by the reward model to obtain a pre-training language model; wherein the ordered data set includes a plurality of data arranged in a tag scoring order.
In a possible implementation manner, the financial text data classifying device is further configured to:
parameters of the bonus model are obtained and a loss function of the bonus model is calculated based on the parameters of the bonus model.
In a possible implementation manner, the financial text data classifying device is further configured to:
and fine tuning is carried out by adopting a near-end strategy optimization algorithm based on the sequencing data set returned by the reward model, so as to obtain the pre-training language model.
In one possible implementation, the classification performance of the initial classification network is represented by an objective function; the objective function is composed of a cross entropy function and a contrast learning loss function.
In one possible implementation, the BERT model includes a plurality of Transformer blocks; the financial text data classification device is further configured to:
the first transducer block extracts a first feature from the original financial text data sample and the remaining transducer blocks each extract a second feature from the features extracted from the previous transducer block.
The first feature and all the second features are formed into a feature set.
The apparatus for classifying financial text data provided in this embodiment may be used to execute the method for classifying financial text data provided in any of the above method embodiments, and its implementation principle and technical effects are similar, and will not be described here again.
According to an embodiment of the present application, the present application also provides a terminal device and a readable storage medium.
Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application. The terminal device comprises a receiver 60, a transmitter 61, at least one processor 62 and a memory 63, and the terminal device formed by the above components may be used to implement the above-mentioned specific embodiments of the present application, which are not described here again.
The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and when the processor executes the computer executable instructions, the steps of the method in the embodiment are realized.
The embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the above embodiments.
Various implementations of the above-described systems and techniques of the application may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or terminal device.
In the context of the present application, a computer-readable storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may be a machine readable signal medium or a machine readable storage medium. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a computer-readable storage medium would include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data terminal device), or that includes a middleware component (e.g., an application terminal device), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the principle of the present application should be included in the protection scope of the present application.

Claims (12)

1. A method of classifying financial text data, comprising:
acquiring target financial text data input by a user;
performing feature extraction and classification processing on the target financial text data by utilizing a target classification network in a target large-scale language model to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of the target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on the original financial text data sample and the enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample.
2. The method of claim 1, wherein the training process of the target classification network comprises:
acquiring an original sample set, and establishing an initial classification network based on the original sample set; the original sample set comprises original financial text data samples and corresponding labels thereof;
generating a plurality of enhancement samples corresponding to the labels by utilizing the pre-training language model prompt based on the corresponding original financial text data samples aiming at the labels of each category;
combining all the enhancement samples corresponding to all types of labels into an enhancement sample set;
training the initial classification network based on the original sample set and the enhanced sample set until the classification performance of the initial classification network reaches a preset value, and stopping training to obtain the target classification network.
3. The method of claim 2, wherein the establishing an initial classification network based on the original sample set comprises:
establishing a network framework comprising a BERT model and a fully connected softmax classifier;
in the process of pre-training the network framework, extracting features of the original financial text data samples in the original sample set by utilizing the BERT model to obtain a feature set;
based on the feature set, predicting the label of the original financial text data sample by using the fully connected softmax classifier to obtain a prediction result;
and adjusting parameters of the network frame according to the deviation between the prediction result and the label corresponding to the original financial text data sample, and taking the pre-trained network frame as an initial classification network when the ending condition is met.
4. The method of claim 2, wherein prior to generating the plurality of enhanced samples corresponding to the tag using the pre-trained language model hints, the method further comprises:
taking the manually marked data as supervision data, and performing supervision fine tuning training on the SFT model;
establishing a reward model based on the trained SFT model;
training by adopting a reinforcement learning mode based on the sequencing data set returned by the reward model to obtain a pre-training language model; wherein the ordered data set includes a plurality of data arranged in a tag scoring order.
5. The method according to claim 4, wherein the method further comprises:
parameters of the bonus model are obtained, and a loss function of the bonus model is calculated based on the parameters of the bonus model.
6. The method of claim 4, wherein training by reinforcement learning based on the ranked data set returned by the reward model to obtain a pre-trained language model comprises:
and fine tuning is carried out by adopting a near-end strategy optimization algorithm based on the sequencing data set returned by the reward model, so as to obtain a pre-training language model.
7. The method of claim 2, wherein the classification performance of the initial classification network is represented using an objective function; the objective function is composed of a cross entropy function and a contrast learning loss function.
8. The method of claim 3, wherein the BERT model comprises a plurality of Transformer blocks;
extracting features of the original financial text data sample in the original sample set by using the BERT model to obtain a feature set, including:
the first transducer block extracts a first feature from the original financial text data sample, and the other transducer blocks extract second features from the features extracted from the previous transducer block;
and forming the first feature and all second features into the feature set.
9. A financial document data sorting apparatus, comprising:
the acquisition module is used for acquiring target financial text data input by a user;
the classification module is used for carrying out feature extraction and classification processing on the target financial text data by utilizing a target classification network in the target large-scale language model to obtain a corresponding classification result; the target large-scale language model is a LLM model composed of the target classification network and a pre-training language model, and the target classification network is obtained by training the target classification network based on the original financial text data sample and the enhancement sample after the pre-training language model generates the enhancement sample according to the obtained original financial text data sample.
10. A terminal device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 8.
11. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-8.
CN202310791575.9A 2023-06-29 2023-06-29 Financial text data classification method, device, terminal equipment and medium Pending CN116662552A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310791575.9A CN116662552A (en) 2023-06-29 2023-06-29 Financial text data classification method, device, terminal equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310791575.9A CN116662552A (en) 2023-06-29 2023-06-29 Financial text data classification method, device, terminal equipment and medium

Publications (1)

Publication Number Publication Date
CN116662552A true CN116662552A (en) 2023-08-29

Family

ID=87722495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310791575.9A Pending CN116662552A (en) 2023-06-29 2023-06-29 Financial text data classification method, device, terminal equipment and medium

Country Status (1)

Country Link
CN (1) CN116662552A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057413A (en) * 2023-09-27 2023-11-14 珠高智能科技(深圳)有限公司 Reinforcement learning model fine tuning method, apparatus, computer device and storage medium
CN117391902A (en) * 2023-12-13 2024-01-12 北京师范大学珠海校区 Evaluation method and device for Chinese core literacy education based on large language model
CN117544508A (en) * 2023-10-13 2024-02-09 北京六方云信息技术有限公司 Network equipment configuration query method and device, terminal equipment and storage medium
CN117787241A (en) * 2023-12-27 2024-03-29 人民网股份有限公司 Method and device for controlling length of generated text based on large language model
CN117892799A (en) * 2024-03-15 2024-04-16 中国科学技术大学 Financial intelligent analysis model training method and system with multi-level tasks as guidance

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057413A (en) * 2023-09-27 2023-11-14 珠高智能科技(深圳)有限公司 Reinforcement learning model fine tuning method, apparatus, computer device and storage medium
CN117057413B (en) * 2023-09-27 2024-03-15 传申弘安智能(深圳)有限公司 Reinforcement learning model fine tuning method, apparatus, computer device and storage medium
CN117544508A (en) * 2023-10-13 2024-02-09 北京六方云信息技术有限公司 Network equipment configuration query method and device, terminal equipment and storage medium
CN117391902A (en) * 2023-12-13 2024-01-12 北京师范大学珠海校区 Evaluation method and device for Chinese core literacy education based on large language model
CN117391902B (en) * 2023-12-13 2024-04-26 北京师范大学珠海校区 Evaluation method and device for Chinese core literacy education based on large language model
CN117787241A (en) * 2023-12-27 2024-03-29 人民网股份有限公司 Method and device for controlling length of generated text based on large language model
CN117892799A (en) * 2024-03-15 2024-04-16 中国科学技术大学 Financial intelligent analysis model training method and system with multi-level tasks as guidance
CN117892799B (en) * 2024-03-15 2024-06-04 中国科学技术大学 Financial intelligent analysis model training method and system with multi-level tasks as guidance

Similar Documents

Publication Publication Date Title
CN116662552A (en) Financial text data classification method, device, terminal equipment and medium
CN110222188B (en) Company notice processing method for multi-task learning and server
CN111581966B (en) Context feature-fused aspect-level emotion classification method and device
CN109947912A (en) A kind of model method based on paragraph internal reasoning and combined problem answer matches
CN107844481B (en) Text recognition error detection method and device
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN114926150B (en) Digital intelligent auditing method and device for transformer technology compliance assessment
CN112883153B (en) Relationship classification method and device based on information enhancement BERT
Zhu et al. Topic-guided attention for image captioning
CN115861995B (en) Visual question-answering method and device, electronic equipment and storage medium
CN111695335A (en) Intelligent interviewing method and device and terminal equipment
CN113988079A (en) Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method
CN114265937A (en) Intelligent classification analysis method and system of scientific and technological information, storage medium and server
CN116341519A (en) Event causal relation extraction method, device and storage medium based on background knowledge
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
Wang et al. Recognizing handwritten mathematical expressions as LaTex sequences using a multiscale robust neural network
CN111046674B (en) Semantic understanding method and device, electronic equipment and storage medium
CN115496077B (en) Multimode emotion analysis method and device based on modal observation and grading
CN115374281B (en) Session emotion analysis method based on multi-granularity fusion and graph convolution network
CN116595189A (en) Zero sample relation triplet extraction method and system based on two stages
CN115860002A (en) Combat task generation method and system based on event extraction
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition
CN114565804A (en) NLP model training and recognizing system
CN114358579A (en) Evaluation method, evaluation device, electronic device, and computer-readable storage medium
CN109800763A (en) A kind of handwritten Chinese recognition methods based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination