CN114169316A - Financial market income prediction model construction method and device and electronic equipment - Google Patents

Financial market income prediction model construction method and device and electronic equipment Download PDF

Info

Publication number
CN114169316A
CN114169316A CN202111494885.1A CN202111494885A CN114169316A CN 114169316 A CN114169316 A CN 114169316A CN 202111494885 A CN202111494885 A CN 202111494885A CN 114169316 A CN114169316 A CN 114169316A
Authority
CN
China
Prior art keywords
dictionary
prediction
model
data
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111494885.1A
Other languages
Chinese (zh)
Inventor
谢伟
周文泽
刘慕雨
潘玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111494885.1A priority Critical patent/CN114169316A/en
Publication of CN114169316A publication Critical patent/CN114169316A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Abstract

The disclosure provides a construction method of a financial market income prediction model, which can be applied to the financial field or other fields. The method comprises the following steps: acquiring a prediction dictionary, wherein the prediction dictionary comprises a general dictionary and first text data, and the first text data comprises news data; determining an emotion index using the prediction dictionary; and constructing a prediction model based on the emotion index. The present disclosure also provides a financial market revenue forecasting method, apparatus, device, storage medium and program product.

Description

Financial market income prediction model construction method and device and electronic equipment
Technical Field
The present disclosure relates to the field of finance, and more particularly, to the field of financial market revenue prediction, and more particularly, to a method, an apparatus, an electronic device, a medium, and a program product for constructing a financial market revenue prediction model.
Background
Since the big data era, the amount of text data on the internet has increased, and more financial technology companies begin to research the financial market by using a text big data method. The stock market, the most active part of the financial market, has been the focus of the industry for the research on the profitability thereof. In China, investors in stock market are mainly scattered households, and the investors tend to acquire investment information from a financial news portal. Financial news is used as a carrier of investment information and mostly exists in the form of text data, wherein the embedded emotional attitude can influence the investment tendency of investors to a great extent, and further generates a linkage effect on the fluctuation of stock prices. Therefore, the method has important significance for analyzing the text information of the financial news, mining the emotion and emotion contained in the text information and corresponding to the research on the profitability of the stock market.
The traditional text analysis method mostly uses dictionary methods, and the used dictionaries comprise a Harvard dictionary, an MPQA subjective emotion dictionary, a sentiWordNet dictionary, a HowNet dictionary (BETA), a Qinghua recognition and derviation dictionary and the like. The emotion dictionaries are general dictionaries most commonly used in text analysis at present, but most of the construction of the dictionaries is based on literary works, media reports and the like, and the applicability and the accuracy in a specific field have certain problems.
Therefore, when a dictionary method is used for text analysis of a financial market, how to construct an emotion dictionary suitable for the financial field is a technical problem to be solved urgently.
Disclosure of Invention
In view of the above, the present disclosure provides a method, apparatus, electronic device, medium, and program product for constructing a financial market revenue prediction model.
According to a first aspect of the present disclosure, there is provided a method for constructing a financial market profit prediction model, including: acquiring a prediction dictionary, wherein the prediction dictionary comprises a general dictionary and first text data, and the first text data comprises news data; determining an emotion index using the prediction dictionary; and constructing a prediction model based on the emotion index.
According to an embodiment of the present disclosure, the step of obtaining a prediction dictionary includes: obtaining N universal dictionaries, wherein N is more than or equal to 1; merging the N universal dictionaries to obtain a first dictionary; carrying out duplication removing operation on the first dictionary to obtain a second dictionary; acquiring the first text data in the first text; constructing a first word stock based on the first text data; and constructing a prediction dictionary based on the second dictionary and the first lexicon.
According to an embodiment of the present disclosure, the step of constructing a prediction dictionary includes: extracting coincident words of the first lexicon and the second lexicon to form a third lexicon; screening the third dictionary according to a first preset condition to obtain a screened fourth dictionary; merging the first lexicon and the fourth lexicon to obtain a fifth lexicon; and carrying out duplication removing operation on the fifth dictionary to obtain a prediction dictionary.
According to an embodiment of the present disclosure, the step of performing a deduplication operation on the fifth dictionary includes: calculating cosine similarity of all word vectors in the fifth dictionary; and performing a deduplication operation based on the cosine similarity.
According to an embodiment of the present disclosure, the determining an emotion index using the prediction dictionary includes: acquiring second text data in preset unit time; determining a number of active words and a number of passive words in the second text data using the prediction dictionary; and calculating an emotion index based on the number of active words and the number of passive words.
According to an embodiment of the present disclosure, the formula for calculating the emotion index is:
Figure BDA0003398393530000021
in the formula: n is a radical ofPThe number of active words, Nn the number of passive words, and SI the emotion index.
According to an embodiment of the present disclosure, the step of constructing a prediction model based on the emotion index includes: constructing an initial model by taking the emotion index as an influence factor; acquiring historical income data; and training the initial model based on the historical revenue data.
According to an embodiment of the present disclosure, the step of training the initial model includes: based on a preset classification standard, dividing the historical income data into training data and verification data; constructing a reference model; training the reference model and the initial model respectively by using training data; verifying the trained reference model and the initial model respectively by using the verification data; based on the verification result, scoring the trained reference model and the initial model; and when the difference of the scores of the initial model and the reference model is higher than a preset threshold value, finishing training.
According to an embodiment of the present disclosure, the step of training the initial model further comprises: and when the difference between the scores of the initial model and the reference model is not higher than a preset threshold value, adjusting the parameters of the initial model, and re-training.
According to an embodiment of the present disclosure, the step of constructing an initial model includes: and constructing an initial model by taking the emotion index as an influence factor on the basis of a hidden Markov model.
A second aspect of the present disclosure provides a financial market revenue forecasting method, including the steps of:
acquiring a current emotion index of an investor;
inputting the current emotion index of the investor into a financial market income prediction model, wherein the financial market income prediction model is constructed according to the financial market income prediction method; and
and predicting financial market income according to the output result of the financial market income prediction model.
A third aspect of the present disclosure provides a construction apparatus of a financial market profit prediction model, including: the device comprises an acquisition module, a prediction module and a display module, wherein the acquisition module is used for acquiring a prediction dictionary, the prediction dictionary comprises a general dictionary and first text data, and the first text data comprises news data; the emotion index determining module is used for determining an emotion index by utilizing the prediction dictionary; and the modeling module is used for constructing a prediction model based on the emotion index.
A fourth aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above method of constructing a financial market revenue forecasting model and the financial market revenue forecasting method.
The fifth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described construction method of a financial market revenue prediction model and the financial market revenue prediction method.
A sixth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described method of constructing a financial market revenue prediction model and the financial market revenue prediction method.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a method of constructing a financial market revenue prediction model and an application scenario diagram of a financial market revenue prediction method, apparatus, device, medium, and program product according to embodiments of the disclosure;
FIG. 2 schematically illustrates a flow chart of a method of constructing a financial market revenue prediction model in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of a financial market revenue forecasting method in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a schematic diagram of a financial market revenue forecasting system architecture, in accordance with an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of the emotion dictionary generating apparatus 1 shown in FIG. 4 according to the embodiment of the present disclosure;
fig. 6 schematically shows a schematic diagram of the sentiment index generation apparatus 2 shown in fig. 4 according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a schematic diagram of the revenue prediction model generation apparatus 3 of FIG. 4, in accordance with an embodiment of the present disclosure;
FIG. 8 schematically illustrates a schematic view of the model evaluation device 4 of FIG. 4, in accordance with an embodiment of the present disclosure;
FIG. 9 schematically illustrates an example flow diagram of the financial market revenue forecasting system of FIG. 4 providing a service in accordance with an embodiment of the present disclosure;
FIG. 10 is a block diagram schematically illustrating an apparatus for constructing a financial market revenue prediction model according to an embodiment of the present disclosure; and
FIG. 11 schematically illustrates a block diagram of an electronic device adapted to implement a method of construction of a financial market revenue prediction model and a method of financial market revenue prediction, in accordance with an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The financial news text information is analyzed, the emotion and emotion contained in the financial news text information are mined, and the method has important significance on research of the profitability of the stock market. And a perfect financial field emotion dictionary is constructed, investment information in financial news is fully utilized to predict the income direction of a stock market, and the technical problem to be solved in the field is urgent.
The disclosure aims to provide a financial market profitability direction prediction method based on news text big data and emotion analysis technology. The method constructs a financial news emotion dictionary by fusing a universal text analysis dictionary and a financial field emotion dictionary, and then constructs an emotion index and uses the emotion index in a financial market stock earning rate prediction model, so that the accuracy of the prediction model is improved.
The embodiment of the disclosure provides a construction method of a financial market income prediction model, which comprises the following steps: acquiring a prediction dictionary, wherein the prediction dictionary comprises a general dictionary and first text data, and the first text data comprises news data; determining an emotion index using the prediction dictionary; and constructing a prediction model based on the emotion index.
It should be noted that the method and the device determined by the present disclosure can be used in the financial field in the field of financial market profit prediction, and can also be used in market profit prediction in any field other than the financial field.
Fig. 1 schematically illustrates a construction method of a financial market gain prediction model and application scenario diagrams of a financial market gain prediction method, apparatus, device, medium, and program product according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the construction method of the financial market profit prediction model and the financial market profit prediction method provided by the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the device for constructing the financial market profit prediction model provided by the embodiment of the present disclosure may be generally disposed in the server 105. The construction method of the financial market profit prediction model and the financial market profit prediction method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the device for constructing the financial market profit prediction model provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2 schematically shows a flow chart of a method of constructing a financial market revenue prediction model according to an embodiment of the present disclosure.
As shown in fig. 2, the method for constructing the financial market profit prediction model according to this embodiment includes operations S201 to S203.
In operation S201, a prediction dictionary is acquired, wherein the prediction dictionary includes a general dictionary and first text data, and the first text data includes news data.
The general dictionary refers to the existing general dictionaries, including a harvard dictionary, an MPQA subjective emotion dictionary, a sentiWordNet dictionary, a HowNet dictionary (BETA), a Qinghua commendation and derogation dictionary, and the like. The first text data is text data related to finance and mainly includes finance news data such as news data acquired by a new wave finance, a financial society, a snowball, and the like.
In operation S202, an emotion index is determined using the prediction dictionary.
Wherein, the emotion index (Sentiment lndex) can be used for representing the emotion of the investor, and the emotion of the investor generates a linkage effect on the fluctuation of the stock price. Therefore, through the determination of the emotion index, the emotion of the investor can be judged, and the financial market income can be predicted.
In operation S203, a prediction model is constructed based on the emotion index.
According to an embodiment of the present disclosure, the step of obtaining a prediction dictionary includes:
1. and acquiring N universal dictionaries, wherein N is more than or equal to 1, the number of the lattices of the universal dictionaries can be determined according to actual conditions, and at least one universal dictionary is selected.
2. And combining the N universal dictionaries to obtain a first dictionary.
3. And carrying out a duplication removing operation on the first dictionary to obtain a second dictionary, wherein the calculation efficiency of the later step can be improved by duplication removing of the dictionaries because repeated vocabularies possibly exist in a plurality of general dictionaries.
4. Acquiring the first text data within a first preset time, wherein the first preset time can be determined according to needs and computing power, for example, selecting data of the past year as the first text data, and acquiring financial news of the past year (such as new and wave finance, financial society, snowball and the like as data sources) as the first text data through a web crawler.
5. And constructing a first Word bank based on the first text data, wherein the first Word bank is constructed by preprocessing the first text data such as Word segmentation, deletion and the like, and then training by using a natural language processing model such as Word2vec or BERT and the like to obtain the first Word bank.
6. And constructing a prediction dictionary based on the second dictionary and the first lexicon, wherein the second dictionary obtained in the step 3 and the first lexicon obtained in the step 5 are matched and screened (non-financial and financial words are removed) to form the prediction dictionary.
According to an embodiment of the present disclosure, the step of constructing a prediction dictionary includes:
1. and extracting coincident words of the first lexicon and the second lexicon to form a third lexicon.
2. And screening the third dictionary according to a first preset condition to obtain a screened fourth dictionary, wherein the first preset condition can be determined according to actual conditions, for example, in order to obtain a prediction dictionary of the financial market income prediction direction, the first preset condition can be set to be related to the financial market income prediction direction.
3. And combining the first lexicon and the fourth lexicon to obtain a fifth lexicon, wherein after the coincident words of the first lexicon and the second lexicon are extracted, the words which are not in the second lexicon but can be used for predicting financial market income still exist in the first lexicon, so that the fourth lexicon can be further expanded to obtain the fifth lexicon.
4. And carrying out duplication removing operation on the fifth dictionary to obtain a prediction dictionary.
According to an embodiment of the present disclosure, the step of performing a deduplication operation on the fifth dictionary includes: calculating cosine similarity of all word vectors in the fifth dictionary; and performing a deduplication operation based on the cosine similarity. Because words with different expressions but consistent meanings may exist, deduplication can be further expanded by calculating cosine similarity of word vectors to improve the accuracy of the lexicon.
According to an embodiment of the present disclosure, the determining an emotion index using the prediction dictionary includes: acquiring second text data in preset unit time; determining a number of active words and a number of passive words in the second text data using the prediction dictionary; and calculating an emotion index based on the number of active words and the number of passive words.
The preset unit time may be determined according to the actual condition, and may be one day, one week or one month. For example, the first preset time is one year, that is, the financial news of the past year is obtained through a web crawler, the preset unit time is one day, that is, the emotional index of each day is calculated for the financial news of each day of the past year, or the emotional index of each day is calculated for the financial news of the past half year, specifically, the emotional indexes of a plurality of preset unit times in the first preset time need to be calculated, which can be determined according to practical situations, and the disclosure does not limit the number of the emotional indexes.
It should be noted that if there is a word in the second text data that is not in the prediction dictionary but is consistent with the word meaning representation existing in the prediction dictionary, when the prediction dictionary is used for judgment, whether the word is an active word or a passive word can be judged by calculating the cosine similarity of the word vector.
According to an embodiment of the present disclosure, the formula for calculating the emotion index is:
Figure BDA0003398393530000091
in the formula: n is a radical ofPThe number of active words, Nn the number of passive words, and SI the emotion index.
According to an embodiment of the present disclosure, the step of constructing a prediction model based on the emotion index includes: constructing an initial model by taking the emotion index as an influence factor; acquiring historical income data; and training the initial model based on the historical revenue data.
And calculating the emotion indexes of all preset unit time within the first preset time, and constructing an initial model by using the emotion indexes as influence factors. The historical income data is also in the first preset time, the time unit of the historical income data is consistent with the preset unit time for calculating the emotion index, and if the preset unit time is one day, the time unit of the historical income data is also one day; if the preset unit time is one week, the time unit of the historical income data is one week, so that the emotion index of each preset unit time is guaranteed to have the historical income data corresponding to the emotion index, and subsequent model training is carried out. For example, if the first preset time is one year and the preset unit time is one day, calculating a corresponding emotion index for each day of financial news of each day in the past year, and then constructing an initial model by using all emotion indexes as influence factors; and acquiring corresponding historical income data of each day, and training the model by using the emotion index of each day and the corresponding historical income data.
According to an embodiment of the present disclosure, the step of training the initial model includes: based on a preset classification standard, dividing the historical income data into training data and verification data; constructing a reference model; training the reference model and the initial model respectively by using training data; verifying the trained reference model and the initial model respectively by using the verification data; based on the verification result, scoring the trained reference model and the initial model; and when the difference of the scores of the initial model and the reference model is higher than a preset threshold value, finishing training. In this case, historical income data of a year may be selected, and then data of a single day (the date of the day is singular, for example, 11 months and 11 days) may be used as training data, and data of a double day may be used as verification data. The reference model can be selected according to actual needs, such as a Logistic model.
According to an embodiment of the present disclosure, the step of training the initial model further comprises: and when the difference between the scores of the initial model and the reference model is not higher than a preset threshold value, adjusting the parameters of the initial model, and re-training. And when the difference between the scores of the initial model and the reference model is higher than a preset threshold value, successfully training the model, and stopping training.
The preset threshold value can be adjusted according to actual conditions, and can be set according to actual needs. The scoring model can adopt a logarithm scoring rule evaluation model and the like, and can be selected according to actual needs, which is not limited by the disclosure. Meanwhile, the model reference and the logarithm scoring criterion are introduced to consider the effect of the prediction model, and the model is continuously adjusted and optimized, so that the evaluation error can be reduced, the optimal model is finally obtained, and the accuracy of the stock market profit prediction is improved.
According to an embodiment of the present disclosure, the step of constructing an initial model includes: and constructing an initial model by taking the emotion index as an influence factor on the basis of a hidden Markov model. It should be noted that the Model of the present disclosure may be a classic Hidden Markov Model (HMM for short), or may select another Model, and any Model that can be used for prediction may be used, which is not limited in the present disclosure.
By the method for constructing the financial market income prediction model, the general dictionary and the financial news text data are fused, the prediction dictionary suitable for the financial field is constructed, and the accuracy of the financial market stock market income prediction model is improved.
FIG. 3 schematically illustrates a flow chart of a financial market revenue forecasting method in accordance with an embodiment of the present disclosure.
As shown in fig. 3, the method for constructing the financial market profit prediction model according to this embodiment includes operations S301 to S303.
In operation S301, a current sentiment index of an investor is acquired.
In operation S302, the investor' S current emotional index is input into a financial market revenue prediction model, which is constructed according to the above-described method.
In operation S303, a financial market profit is predicted according to an output result of the financial market profit prediction model.
FIG. 4 schematically illustrates a schematic diagram of a financial market revenue forecasting system architecture, according to an embodiment of the present disclosure.
As shown in fig. 4, the financial market revenue forecasting system of the embodiment of the present disclosure includes: an emotion dictionary creation device 1, an emotion index creation device 2, a profit prediction model creation device 3, and a model evaluation device 4. The emotion dictionary generation device 1 is connected with the emotion index generation device 2; the emotion index generation device 2 is connected with the income prediction model generation device 3; the emotion index generation device 2 is connected with the model evaluation device 4; the profit prediction model generation device 3 is connected to the model evaluation device 4.
The emotion dictionary generation device 1 is used for obtaining a financial news emotion dictionary, namely a prediction dictionary through data cleaning, model training and data perfecting by integrating a general dictionary including a financial dictionary, a financial field emotion dictionary and the like and a financial news and natural language processing model.
The emotion index generation device 2 is used for constructing an investor emotion index for the financial news text in a daily unit based on the prediction dictionary.
The profit prediction Model generation device 3 is configured to add the emotion index as an influence factor to a financial market stock index profit direction prediction Model (HMM). And training and optimizing the model according to the initialized parameters or the adjusted parameters.
HMMs are statistical models that describe a Markov Process (Markov Process) with implicit unknown parameters. In the normal markov model, the states are directly visible to the observer. The transition probability of such a state is an overall parameter. In the hidden markov model, states are not directly visible, but some variables affected by the states are visible. Each state has a probability distribution over the symbols that may be output. The sequence of output symbols can reveal some information of the state sequence.
Assuming that the observed results are Y:
Y=y(0),y(1),...y(L-1) (2)
the hiding conditions are X:
X=x(0),x(1),...x(L-1) (3)
with a length of L, the probability of the Markov model can be expressed as:
Figure BDA0003398393530000121
from this probabilistic model, it can be known that the markov model takes into account the information before and after the time point.
The model evaluation device 4 is used for evaluating the HMM model through a reference model Logistic and introducing a logarithm scoring rule, and iteratively adjusting model parameters through multiple rounds of evaluation to finally obtain the optimal HMM model.
Fig. 5 schematically shows a schematic diagram of the emotion dictionary generating apparatus 1 shown in fig. 4 according to an embodiment of the present disclosure.
As shown in fig. 5, the emotion dictionary apparatus 1 includes a dictionary merging unit 11, a financial news thesaurus unit 12, and a financial emotion dictionary unit 13.
The dictionary merging unit 11 is configured to merge and deduplicate the general dictionaries including the hownnet, the recognition and derogation dictionary, the financial domain emotion dictionary Loughran-McDonald dictionary, the financial dictionary, the chinese financial emotion dictionary, and the like (the dictionaries may be appropriately expanded or deleted), and obtain a second dictionary.
The financial news Word bank unit 12 is configured to obtain financial news (such as new and unrestrained financial, financial society, snowball, and the like as data sources) of the past year as first text data through a web crawler, perform preprocessing such as Word segmentation and deletion coincidence, and then perform training by using natural language processing models such as Word2vec or BERT to obtain a first Word bank.
Financial emotion dictionary unit 13: matching and screening the second dictionary and the first lexicon (removing non-financial and financial words) to form a fifth dictionary; and based on the first lexicon, expanding and de-duplicating the fifth lexicon by calculating the cosine similarity of the word vector, and finally obtaining a prediction lexicon.
Fig. 6 schematically shows a schematic diagram of the emotion index generation apparatus 2 shown in fig. 4 according to an embodiment of the present disclosure.
As shown in fig. 6, the emotion index generation apparatus 2 includes a financial news search unit 21, an emotion index calculation unit 22, and an emotion index storage unit 23.
The financial news inquiry unit 21 is used for inquiring the financial news acquired by the web crawler, and acquiring news for 1 day or more according to preset parameters and storing the news into a local cache.
The emotion index calculation unit 22 is configured to obtain 1-day or multiple-day news in the local cache based on the prediction dictionary, calculate an emotion index according to formula (1) with daily news as a unit, and store the calculation result as a daily emotion index. After the data in the local cache is calculated, the query unit is driven to query again, the next round of calculation is carried out, and the emotion indexes of all historical data are calculated after circulation is carried out for multiple times.
Emotion index storage unit 23: the emotion index calculation result is stored, and local cache and persistent storage are provided.
FIG. 7 schematically illustrates a schematic diagram of the revenue prediction model generation apparatus 3 shown in FIG. 4, according to an embodiment of the present disclosure.
As shown in fig. 7, the profit prediction model generation apparatus 3 includes a model construction unit 31 and a model training unit 32.
The model construction unit 31 is configured to construct an HMM according to the initialization parameter and the emotion index parameter.
The model training unit 32 is used for training the constructed HMM on the selected historical data of the stock returns (such as the daily fluctuation data of the Shanghai depth 300 index of the last 1 year or more). However, the historical data is divided into training data and verification data, for example, data for a single day (the date of the day is a single number, for example, 11 months and 11 days) is taken as the training data, and data for two days is taken as the verification data.
Fig. 8 schematically illustrates a schematic diagram of the model evaluation device 4 shown in fig. 4 according to an embodiment of the present disclosure.
As shown in fig. 8, the model evaluation device 4 includes a reference model unit 41, a model scoring unit 42, and a parameter adjusting unit 43.
The reference model unit 41 is used for training on the same data using a Logistic model, and is used as a model evaluation reference (benchmark).
The model scoring unit 42 is used for testing the Logistic model and the HMM on the verification data, and scoring the Logistic model and the HMM by using a logarithmic scoring rule.
The parameter adjusting unit 43 is configured to adjust HMM parameters according to the model scoring result, and drive the prediction model generating device to retrain the model.
FIG. 9 schematically illustrates an example flow diagram of the provision of a service by the financial market revenue forecasting system of FIG. 4 in accordance with an embodiment of the present disclosure.
Step S901: and starting the system.
Step S902: the emotion dictionary creating device is activated to create a prediction dictionary.
Step S903: the emotion index generation device is started to generate emotion indexes of historical data by taking a day as a unit.
Step S904: the revenue prediction model generation apparatus starts, generates a model (HMM) based on preset parameters (emotional index factors), and trains on historical data.
Step S905: and starting the model evaluation device to evaluate the trained model.
Step S906: and adjusting the retraining model according to the model evaluation result and the parameters.
Step S907: and obtaining a final prediction model through multiple rounds of training and evaluation.
Based on the construction method of the financial market income prediction model, the disclosure also provides a construction device of the financial market income prediction model. The apparatus will be described in detail below with reference to fig. 10.
Fig. 10 is a block diagram schematically illustrating a construction apparatus of a financial market profit prediction model according to an embodiment of the present disclosure.
As shown in fig. 10, the apparatus 1000 for constructing a financial market profit prediction model according to this embodiment includes an obtaining module 1010, an emotion index determination 1020, and a modeling module 1030.
The obtaining module 1010 is configured to obtain a prediction dictionary, where the prediction dictionary includes a general dictionary and first text data, and the first text data includes news data. In an embodiment, the obtaining module 1010 may be configured to perform the operation S201 described above, which is not described herein again.
The sentiment index determination 1020 is used to determine a sentiment index using the prediction dictionary. In an embodiment, the emotion index determination 1020 may be used to perform the operation S202 described above, and will not be described herein again.
The modeling module 1030 is used for constructing a prediction model based on the emotion index. In an embodiment, the modeling module 1030 may be configured to perform the operation S203 described above, which is not described herein again.
According to an embodiment of the present disclosure, any of the obtaining module 1010, the emotion index determination 1020, and the modeling module 1030 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 1010, the mood index determination 1020, and the modeling module 1030 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware. Alternatively, at least one of the obtaining module 1010, the mood index determination 1020 and the modeling module 1030 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
FIG. 11 schematically illustrates a block diagram of an electronic device adapted to implement a method of construction of a financial market revenue prediction model and a method of financial market revenue prediction, in accordance with an embodiment of the present disclosure.
As shown in fig. 11, an electronic device 1100 according to an embodiment of the present disclosure includes a processor 1101, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to the embodiments of the present disclosure.
In the RAM1103, various programs and data necessary for the operation of the electronic device 1100 are stored. The processor 1101, the ROM 1102, and the RAM1103 are connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1102 and/or the RAM 1103. It is noted that the programs may also be stored in one or more memories other than the ROM 1102 and RAM 1103. The processor 1101 may also perform various operations of the method flows according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 1100 may also include input/output (I/O) interface 1105, input/output (I/O) interface 1105 also connected to bus 1104, according to an embodiment of the disclosure. Electronic device 1100 may also include one or more of the following components connected to I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 1102 and/or the RAM1103 and/or one or more memories other than the ROM 1102 and the RAM1103 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. The program code is for causing a computer system to perform the methods of the embodiments of the disclosure when the computer program product is run on the computer system.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 1101. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication part 1109, and/or installed from the removable medium 1111. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The computer program, when executed by the processor 1101, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (15)

1. A construction method of a financial market income prediction model is characterized by comprising the following steps:
acquiring a prediction dictionary, wherein the prediction dictionary comprises a general dictionary and first text data, and the first text data comprises news data;
determining an emotion index using the prediction dictionary; and
and constructing a prediction model based on the emotion index.
2. The method of claim 1, wherein the step of obtaining a prediction dictionary comprises:
obtaining N universal dictionaries, wherein N is more than or equal to 1;
merging the N universal dictionaries to obtain a first dictionary;
carrying out duplication removing operation on the first dictionary to obtain a second dictionary;
acquiring the first text data within first preset time;
constructing a first word stock based on the first text data; and
a prediction dictionary is constructed based on the second dictionary and the first lexicon.
3. The method of claim 2, wherein the step of constructing a prediction dictionary comprises:
extracting coincident words of the first lexicon and the second lexicon to form a third lexicon;
screening the third dictionary according to a first preset condition to obtain a screened fourth dictionary;
merging the first lexicon and the fourth lexicon to obtain a fifth lexicon; and
and carrying out duplication removing operation on the fifth dictionary to obtain a prediction dictionary.
4. The method of claim 3, wherein the step of de-duplicating the fifth dictionary comprises:
calculating cosine similarity of all word vectors in the fifth dictionary; and
and carrying out a deduplication operation based on the cosine similarity.
5. The method of claim 1, wherein the step of determining an emotion index using the prediction dictionary comprises:
acquiring second text data in preset unit time;
determining a number of active words and a number of passive words in the second text data using the prediction dictionary; and
calculating an emotion index based on the number of active words and the number of passive words.
6. The method of claim 5, wherein the formula for calculating the sentiment index is:
Figure FDA0003398393520000021
in the formula: n is a radical ofPIndicating the number of active words, NnIndicating the number of negative words and SI the sentiment index.
7. The method of claim 6, wherein the step of constructing a predictive model based on the sentiment index comprises:
constructing an initial model by taking the emotion index as an influence factor;
acquiring historical income data; and
training the initial model based on the historical revenue data.
8. The method of claim 7, wherein the step of training the initial model comprises:
based on a pre-vanishing classification criterion, dividing the historical revenue data into training data and verification data;
constructing a reference model;
training the reference model and the initial model respectively by using training data;
verifying the trained reference model and the initial model respectively by using the verification data;
based on the verification result, scoring the trained reference model and the initial model; and
and when the difference of the scores of the initial model and the reference model is higher than a preset threshold value, finishing training.
9. The method of claim 8, further comprising:
and when the difference between the scores of the initial model and the reference model is not higher than a preset threshold value, adjusting the parameters of the initial model, and re-training.
10. The method of claim 7, wherein the step of constructing an initial model comprises:
and constructing an initial model by taking the emotion index as an influence factor on the basis of a hidden Markov model.
11. A financial market revenue forecasting method, comprising the steps of:
acquiring a current emotion index of an investor;
inputting the investor's current emotional index into a financial market gain prediction model, wherein the financial market gain prediction model is constructed according to the method of any one of claims 1-10; and
and predicting financial market income according to the output result of the financial market income prediction model.
12. An apparatus for constructing a financial market revenue prediction model, comprising:
the device comprises an acquisition module, a prediction module and a display module, wherein the acquisition module is used for acquiring a prediction dictionary, the prediction dictionary comprises a general dictionary and first text data, and the first text data comprises news data;
the emotion index determining module is used for determining an emotion index by utilizing the prediction dictionary;
and the modeling module is used for constructing a prediction model based on the emotion index.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-11.
14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 11.
15. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 11.
CN202111494885.1A 2021-12-08 2021-12-08 Financial market income prediction model construction method and device and electronic equipment Pending CN114169316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111494885.1A CN114169316A (en) 2021-12-08 2021-12-08 Financial market income prediction model construction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111494885.1A CN114169316A (en) 2021-12-08 2021-12-08 Financial market income prediction model construction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114169316A true CN114169316A (en) 2022-03-11

Family

ID=80484650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111494885.1A Pending CN114169316A (en) 2021-12-08 2021-12-08 Financial market income prediction model construction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114169316A (en)

Similar Documents

Publication Publication Date Title
JP2020537224A (en) Determining cross-document rhetorical connections based on parsing and identification of named entities
CN110390408B (en) Transaction object prediction method and device
US20210191938A1 (en) Summarized logical forms based on abstract meaning representation and discourse trees
CN112463968B (en) Text classification method and device and electronic equipment
CN111783039B (en) Risk determination method, risk determination device, computer system and storage medium
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN110737824B (en) Content query method and device
CN115689717A (en) Enterprise risk early warning method, device, electronic equipment, medium and program product
CN111191677A (en) User characteristic data generation method and device and electronic equipment
CN113220999A (en) User feature generation method and device, electronic equipment and storage medium
CN111126073B (en) Semantic retrieval method and device
CN113515625A (en) Test result classification model training method, classification method and device
CN112749238A (en) Search ranking method and device, electronic equipment and computer-readable storage medium
CN114036921A (en) Policy information matching method and device
CN112241433A (en) Product demonstration method and device, computer equipment and storage medium
CN115795345A (en) Information processing method, device, equipment and storage medium
CN114626370A (en) Training method, risk early warning method, apparatus, device, medium, and program product
CN114169316A (en) Financial market income prediction model construction method and device and electronic equipment
KR20230059364A (en) Public opinion poll system using language model and method thereof
CN112069807A (en) Text data theme extraction method and device, computer equipment and storage medium
US20230377043A1 (en) Methods and systems for investment scoring and ranking
CN117077656B (en) Demonstration relation mining method and device, medium and electronic equipment
CN114386433A (en) Data processing method, device and equipment based on emotion analysis and storage medium
CN116308602A (en) Recommended product information generation method and device, electronic equipment and medium
CN114742648A (en) Product pushing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination