CN114519613B

CN114519613B - Price data processing method and device, electronic equipment and storage medium

Info

Publication number: CN114519613B
Application number: CN202210160991.4A
Authority: CN
Inventors: 刘羲; 舒畅; 陈又新
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2023-07-25
Anticipated expiration: 2042-02-22
Also published as: WO2023159756A1; CN114519613A

Abstract

The embodiment of the application provides a price data processing method and device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring original data to be predicted; wherein the raw data includes target report data and target transaction data; constructing index factor characteristics according to the target transaction data; constructing public opinion factor characteristics according to target report data; screening the index factor features and the public opinion factor features to obtain quantized transaction features; extracting characteristics of a plurality of quantized transaction characteristics through a preset first neural network model to obtain a plurality of distributed characteristic vectors; and inputting the plurality of distributed feature vectors into a preset second neural network model for price prediction processing to obtain target price data. According to the technical scheme, the accuracy of price data prediction can be improved.

Description

Price data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for processing price data, an electronic device, and a storage medium.

Background

Typically, a merchant or price estimation entity will make predictions about future prices for products. In the related art, a machine learning model represented by linear regression is used for predicting price data, however, the original data input into the machine learning model for prediction is nonlinear data, and the prediction result is inaccurate easily due to the prediction of the price data by the machine learning model, so how to improve the accuracy of price data prediction becomes a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the application mainly aims to provide a price data processing method and device, electronic equipment and storage medium, and aims to improve the accuracy of price data prediction.

To achieve the above object, a first aspect of an embodiment of the present application proposes a method for processing price data, the method including:

acquiring original data to be predicted; wherein the raw data includes target report data and target transaction data;

constructing index factor characteristics according to the target transaction data;

constructing public opinion factor characteristics according to the target report data;

screening the index factor features and the public opinion factor features to obtain quantized transaction features;

Extracting the characteristics of a plurality of quantized transaction characteristics through a preset first neural network model to obtain a plurality of distributed characteristic vectors;

and inputting the plurality of distributed feature vectors into a preset second neural network model to perform price prediction processing, so as to obtain target price data.

In some embodiments, the constructing the public opinion factor feature according to the target report data includes:

carrying out emotion classification on the target report data to obtain report emotion types;

extracting text features of the target report data to obtain a first reference value for representing the value of the text information;

performing readability evaluation on the target report data to obtain a second reference value for representing the readability value of the report;

and obtaining the public opinion factor characteristic according to the report emotion category, the first reference value and the second reference value.

In some embodiments, the text feature extraction of the target report data to obtain a first reference value for characterizing the value of the text information includes:

classifying the target report data to obtain a target word segmentation set, a target sentence set, a target paragraph set and a target grammar set;

Carrying out statistics scoring processing on the target word segmentation set to obtain word segmentation scoring values;

carrying out statistics scoring processing on the target sentence set to obtain sentence scoring values;

carrying out statistics scoring processing on the target paragraph set to obtain paragraph scoring values;

carrying out statistics scoring processing on the target grammar set to obtain grammar scoring values;

and obtaining the first reference value according to the word segmentation score value, the sentence score value, the paragraph score value and the grammar score value.

In some embodiments, the obtaining the public opinion factor feature according to the reported emotion classification, the first reference value, and the second reference value includes:

multiplying the first reference value and the second reference value to obtain a target reference value;

and carrying out forward processing or reverse processing on the target reference value according to the report emotion type to obtain the public opinion factor characteristic.

In some embodiments, the first neural network model comprises: an input layer and a convolution layer; the feature extraction is performed on the quantized transaction features through a preset first neural network model to obtain distributed feature vectors, including:

Inputting a plurality of the quantized transaction features into the first neural network model;

preprocessing each quantized transaction feature through the input layer to obtain corresponding standardized data;

and carrying out convolution processing on the standardized data through the convolution layer to obtain a plurality of distributed feature vectors.

In some embodiments, the convolving the normalized data with the convolution layer to obtain a plurality of distributed feature vectors includes:

performing format conversion on the standardized data through the convolution layer to obtain standard format data;

and convolving the standard format data through the convolution of the convolution layer to obtain a plurality of distributed feature vectors.

In some embodiments, the second neural network model comprises a recurrent neural network; inputting the plurality of distributed feature vectors into a preset second neural network model for price prediction processing to obtain target price data, wherein the method comprises the following steps of:

inputting a plurality of the distributed feature vectors into the recurrent neural network;

and carrying out cyclic iteration processing on the distributed feature vectors through a cyclic layer in the cyclic neural network to obtain the target price data.

To achieve the above object, a second aspect of the embodiments of the present application proposes a price data processing device, the device including:

the acquisition module is used for acquiring the original data to be predicted; wherein the raw data includes target report data and target transaction data;

the first construction module is used for constructing index factor characteristics according to the target transaction data;

the second construction module is used for constructing public opinion factor characteristics according to the target report data;

the screening module is used for screening the index factor features and the public opinion factor features to obtain quantized transaction features;

the feature extraction module is used for extracting features of a plurality of quantized transaction features through a preset first neural network model to obtain a plurality of distributed feature vectors;

and the prediction module is used for inputting the plurality of distributed feature vectors into a preset second neural network model to perform price prediction processing so as to obtain target price data.

To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, including:

at least one memory;

at least one processor;

At least one program;

the program is stored in the memory, and the processor executes the at least one program to implement:

the method of any of the embodiments of the first aspect.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium that is a computer-readable storage medium storing computer-executable instructions for causing a computer to execute:

the method of any of the embodiments of the first aspect.

The price data processing method, the price data processing device, the electronic equipment and the storage medium are provided by the embodiment of the application, and the original data to be predicted are obtained; the method comprises the steps of firstly, obtaining target report data, obtaining target transaction data, then, constructing index factor characteristics according to the target transaction data, constructing public opinion factor characteristics according to the target report data, then, screening the obtained index factor characteristics and the public opinion factor characteristics to obtain quantized transaction characteristics, extracting the quantized transaction characteristics through a preset first neural network model to obtain distributed characteristic vectors, and finally, inputting the distributed characteristic vectors into a preset second neural network model to conduct price prediction processing to obtain target price data. The accuracy of price data prediction is improved by acquiring various original data, the problem that irrelevant data is too much when the characteristic extraction is carried out on quantitative transaction characteristics is avoided through a screening mechanism, and the accuracy of price data prediction is further improved.

Drawings

FIG. 1 is a flow chart of a method of processing price data provided by an embodiment of the present application;

FIG. 2 is a flowchart of a specific method of step S300 in FIG. 1;

FIG. 3 is a flowchart of a specific method of step S320 in FIG. 2;

FIG. 4 is a flowchart of a specific method of step S500 in FIG. 1;

fig. 5 is a flowchart of a specific method of step S520 in fig. 4;

FIG. 6 is a flowchart of a specific method of step S600 in FIG. 1;

FIG. 7 is a block diagram of a processing device for price data provided by an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

First, several nouns referred to in this application are parsed:

artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Natural language processing (natural language processing, NLP): NLP is a branch of artificial intelligence that is a interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics, and is processed, understood, and applied to human languages (e.g., chinese, english, etc.). Natural language processing includes parsing, semantic analysis, chapter understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, handwriting and print character recognition, voice recognition and text-to-speech conversion, information intent recognition, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, and the like, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, and the like.

Convolutional neural network model (Convolutional Neural Networks, CNN): the convolutional neural network is a feed-forward neural network, which consists of a plurality of convolutional layers and pooling layers. The basic structure of CNN is composed of an input layer, a convolution layer (convolutional layer), a pooling layer (also called a sampling layer), a full-connection layer, and an output layer. The number of the convolution layers and the pooling layers are generally a plurality of, and the convolution layers and the pooling layers are alternately arranged, namely one convolution layer is connected with one pooling layer, one convolution layer is connected after the pooling layer, and the like. Because each neuron of the output feature map in the convolution layer is locally connected with the input of the neuron, the input value of the neuron is obtained by carrying out weighted summation on the corresponding connection weight and the local input and adding the offset value, the process is equivalent to the convolution process, and the CNN is also named. The convolutional neural network is evolved from a multi-layer perceptron (MLP), and has excellent performance in the field of image processing due to the structural characteristics of local area connection, weight sharing and downsampling. The specificity of convolutional neural networks compared with other neural networks is mainly in two aspects of weight sharing and local connection. The weight sharing makes the network structure of the convolutional neural network more similar to that of the biological neural network. Local connections are not connected to all neurons of the n-1 layer, as in conventional neural networks, but rather between the neurons of the n-1 layer and a portion of the neurons of the n-layer. The two characteristics have the effects of reducing the complexity of a network model and reducing the number of weights.

Zero padding (zero padding): zero values are used to fill in the edges of the input matrix so that we can filter the edges of the input image matrix. One benefit of zero padding is that it allows us to control the size of the feature map. The use of zero padding, also known as deconvolution, is not applicable to zero padding, known as strict convolution.

The recurrent neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network (recursive neural network) that takes sequence data as input, performs recursion (recovery) in the evolution direction of the sequence, and all nodes (circulation units) are chained.

Long-short term memory neural network (LSTM) is a special Recurrent Neural Network (RNN) which is specially designed to solve the Long-term dependence problem of common recurrent neural network, and all RNNs have a chained form of repeated neural network modules. In the training of the original RNN, the problems of gradient explosion or gradient disappearance easily occur along with the lengthening of the training time and the increase of the network layer number, so that longer sequence data cannot be processed, and information of long-distance data cannot be acquired. The fields of LSTM applications include: text generation, machine translation, speech recognition, generation of image descriptions and videomarks, etc.

Word segmentation: the word segmentation process is to automatically add spaces or other boundary marks between words in the text. English words naturally have space separation and are easy to separate according to space, but a plurality of words are sometimes required to be used as a separating word, for example, some nouns such as 'New York' are required to be treated as a word. The Chinese character has no space, and the word segmentation is a problem which needs to be specially solved. The principle of word segmentation is similar, whether English or Chinese. Automatic Chinese word segmentation is to have a computer system automatically add spaces or other boundary marks between words in Chinese text. The commonly used chinese word segmentation tools are Jieba.

Along with the continuous development of economy and science and technology and the continuous progress of artificial intelligence technology, an intelligent price data processing method is widely applied.

Based on the above, the embodiment of the application provides a method and a device for processing price data, electronic equipment and a storage medium, and the accuracy of price data prediction can be improved by acquiring various original data to predict price trend.

The method and apparatus for processing price data, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments, and the method for processing price data in the embodiments of the present application is described first.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a price data processing method, and relates to the technical field of artificial intelligence. The price data processing method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements a processing method of price data, but is not limited to the above form.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The following describes in detail the method of processing price data according to the embodiment of the present application with reference to the accompanying drawings.

Fig. 1 is an optional flowchart of a method for processing price data according to an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S100 to S600, and these six steps are described in detail below in connection with fig. 1.

Step S100, obtaining original data to be predicted; wherein the raw data includes target report data and target transaction data;

step S200, constructing index factor characteristics according to target transaction data;

step S300, constructing public opinion factor features according to target report data;

step S400, screening the index factor features and the public opinion factor features to obtain quantized transaction features;

step S500, extracting characteristics of a plurality of quantized transaction characteristics through a preset first neural network model to obtain a plurality of distributed characteristic vectors;

step S600, inputting the distributed feature vectors into a preset second neural network model for price prediction processing to obtain target price data.

According to the price data processing method, original data to be predicted are obtained; the method comprises the steps of firstly, obtaining target report data, obtaining target transaction data, then, constructing index factor characteristics according to the target transaction data, constructing public opinion factor characteristics according to the target report data, then, screening the obtained index factor characteristics and the public opinion factor characteristics to obtain quantized transaction characteristics, extracting the quantized transaction characteristics through a preset first neural network model to obtain distributed characteristic vectors, and finally, inputting the distributed characteristic vectors into a preset second neural network model to conduct price prediction processing to obtain target price data. The accuracy of price data prediction is improved by acquiring various original data, the problem that irrelevant data is too much when the characteristic extraction is carried out on quantitative transaction characteristics is avoided through a screening mechanism, and the accuracy of price data prediction is further improved.

In step S100 of some embodiments, the original data may be obtained by writing a web crawler, setting up a data source, and then performing targeted crawling of the data. The data query can also be performed through some public websites to obtain the original data. The technical scheme of the embodiment of the application can realize price data prediction of one stock in a stock market, one fund in a fund market or one futures in a futures market and the like, and compared with a prediction method adopting a machine learning model in the prior art, the target price data obtained by the price data processing method of the embodiment of the application is more accurate. And price forecast is carried out on one stock, one fund of the fund market or one futures of the futures market for a plurality of times to obtain a plurality of corresponding target price data, so that the price trend of the stock, the fund or the futures is obtained.

For example, the object to be predicted is one stock of the stock market, that is, the technical scheme of the embodiment of the application predicts the price data of one stock of the stock market, and in the application scenario of price data prediction of the stock, the mode of obtaining the original data may be to go to the company website corresponding to the stock to be tested to check some target report data, and to go to the trading platform to check the target report data and the target trade data.

In some embodiments, the target report data includes at least one of: target industry report data, target company report data, target news data, and target comment data.

The target transaction data includes at least one of: open price, close price, maximum price, minimum price, and volume.

According to the method and the device for predicting the price, the original data are obtained through obtaining the target report data and the target transaction data, and then the influence of the target report data and the target transaction data on the future trend of the price is comprehensively considered, so that the accuracy of predicting the price data is improved.

For example, in an application scenario of price data prediction of a stock, future trends of a certain stock a need to be predicted, then price data of a stock a needs to be obtained for a period of time, and original data of the stock a needs to be obtained, including target report data and target transaction data, where the target report data includes, but is not limited to, at least one of the following: stock a's corresponding company's business report data (i.e., target business report data), stock a's corresponding company's report data (i.e., target company report data), news data related to stock a's company (i.e., target news data), and stock a's comment data in the stock bar (i.e., target comment data). The target transaction data includes, but is not limited to, at least one of: the historical listing price of stock A, the historical highest price of stock A, the historical lowest price of stock A, the historical trading volume of stock A and the current price of stock A.

In step S200 of some embodiments, constructing the index factor features includes index factor construction, the index factors including, but not limited to, at least one of: OBV (On Balance Volume, energy tide) factor, CCI (homeotropic index) factor, KDJ (random index) factor, and the like.

Referring to fig. 2, in some embodiments, step S300 may include, but is not limited to, steps S310 to S340, which are described in detail below in conjunction with fig. 2.

Step S310, carrying out emotion classification on target report data to obtain report emotion types;

step S320, extracting text features of the target report data to obtain a first reference value for representing the value of the text information;

step S330, performing readability evaluation on the target report data to obtain a second reference value for representing the readability value of the report;

step S340, according to the reported emotion category, the first reference value and the second reference value, the public opinion factor feature is obtained.

In step S310 of some embodiments, the target report data is emotion-classified by natural language processing NLP, resulting in a report emotion classification. Reporting emotion categories include, but are not limited to: negative emotion, positive emotion and neutral emotion.

Referring to fig. 3, in some embodiments, step S320 may include, but is not limited to, steps S321 to S326:

step S321, classifying the target report data to obtain a target word segmentation set, a target sentence set, a target paragraph set and a target grammar set;

step S322, carrying out statistics scoring processing on the target word segmentation set to obtain word segmentation scoring values;

step S323, carrying out statistics scoring processing on the target sentence set to obtain a sentence scoring value;

step S324, carrying out statistics scoring processing on the target paragraph set to obtain paragraph scoring values;

step S325, carrying out statistics scoring processing on the target grammar set to obtain grammar scoring values;

step S326, a first reference value is obtained according to the word score value, the sentence score value, the paragraph score value and the grammar score value.

Specifically, in step S321 of some embodiments, the target report data is classified according to terms, sentences, paragraphs, and grammar by using a natural language processing NLP model, so as to obtain a target word segmentation set, a target sentence set, a target paragraph set, and a target grammar set. When the target word segmentation set is obtained, word segmentation processing can be performed on each sentence in the obtained target sentence set, and the target word segmentation set is obtained.

In step S322 of some embodiments, a statistical scoring process is performed on the target word segmentation set to obtain a word segmentation score value. Mainly comprises the following steps: counting the word frequency of nouns, connective words, virtual words and other kinds in the target word segmentation set, counting the proportion of connective words and virtual words in the target word segmentation set, calculating the proportion, counting the number of four word idioms in the target word segmentation set, counting the relative proportion of single words and phrases in the target word segmentation set, calculating the proportion of nouns in the target word segmentation set and the like. And scoring according to the proportion and the quantity obtained by statistics to obtain a word segmentation scoring value.

In step S323 of some embodiments, similar to step S322, in this example, the target sentence set is statistically scored to obtain a sentence scoring value, which mainly includes the following steps:

the method comprises the steps of carrying out statistics on average sentence length of each sentence in a target sentence set, carrying out statistics on non-text information proportion in each sentence in the target sentence set, carrying out statistical analysis on structural composition of each sentence in the target sentence set and the like. And scoring the average sentence length, the non-text information proportion and the sentence structure composition obtained through statistics to obtain a sentence scoring value.

In step S324 of some embodiments, similar to the foregoing step S322, in this embodiment, the statistical scoring process is performed on the target paragraph set to obtain a paragraph score value, which mainly includes the following steps:

statistics are made on the number of paragraphs in the set of target paragraphs, statistics are made on the number of sentences contained in each paragraph in the set of target paragraphs, the average number of sentences for the paragraphs is calculated, the SMOG index for the paragraphs is calculated, etc. And then scoring the counted paragraph number, the average sentence number and the SMOG index to obtain a paragraph scoring value.

In step S325 of some embodiments, similar to the foregoing step S322, in this embodiment, statistical scoring is performed on the target grammar set to obtain grammar scores, which mainly includes the following steps:

statistics is performed on the height of the grammar tree in the target grammar set, statistics is performed on the node number of the grammar tree in the target grammar set, statistics is performed on the noun phrase proportion of the grammar tree, statistics is performed on the verb phrase proportion of the grammar tree, statistics is performed on the adjective phrase proportion of the grammar tree, and the like. And then scoring all the proportions and the quantity obtained by the statistics to obtain a grammar scoring value.

In step S326 of some embodiments, the word score value, the sentence score value, the paragraph score value, and the grammar score value obtained in steps S322 to S325 are summed to obtain a first reference value.

In step S330 of some embodiments, the readability evaluation may be performed on the target report data by processing the NLP model in natural language, resulting in a second reference value for characterizing the readability value of the report. The second reference value is a value ranging from 0 to 1, and can be determined to be the maximum value of 1 if the NLP model judges that the target report data is read smoothly and has a higher value. If other situations occur, the NLP model can be withheld according to the actual situation until the second reference value is 0.

In step S340 of some embodiments, according to the reported emotion classification, the first reference value and the second reference value, the public opinion factor feature is specifically implemented by the following steps:

and carrying out positive or negative processing on the target reference value according to the report emotion type to obtain the public opinion factor characteristic.

Specifically, for the reported emotion type, if the reported emotion type is neutral emotion or positive emotion, when the target reference value is positively processed, the obtained public opinion factor characteristic is positive number; if the emotion type is reported to be negative emotion, the public opinion factor characteristic is obtained as negative number when the target reference value is processed in a reverse mode.

In some embodiments, the reported emotion classification may also be numerically processed, with specific operations being: if the reported emotion type is neutral emotion or positive emotion, the reported emotion type is +1, and then multiplying the reported emotion type and a target reference value to obtain a public opinion factor characteristic; if the reported emotion type is negative emotion, the reported emotion type is-1, and then multiplying the reported emotion type and the target reference value to obtain the public opinion factor characteristic.

It should be noted that the second reference value is a coefficient between 0 and 1. The first reference value may be a specific value or a vector, which is used to characterize the specific situation of the target word segmentation set, the target paragraph set, the target sentence set and the target grammar set in the target report data.

In step S400 of some embodiments, since the obtained target report data includes comment data (target comment data) of an object to be predicted (such as a stock), however, the comment data generally has a small influence on financial investment, and therefore a large amount of useless comment data needs to be filtered out, so that excessive useless data input into the first neural network is avoided, and accuracy of price data prediction of the stock is affected. In addition, when the index factors are constructed, the effectiveness of some index factors in specific fields is not great or is invalid, in order to improve the accuracy of stock price data prediction, the index factor characteristics and the public opinion factor characteristics need to be screened, the effective characteristics are reserved, and a plurality of quantized transaction characteristics are obtained. The method can be used for carrying out validity test on the index factor characteristics and the public opinion factor characteristics, and taking the index factor characteristics and the public opinion factor characteristics which pass the test as quantized transaction characteristics.

Referring to fig. 4, in some embodiments, the first neural network model includes: an input layer and a convolution layer; step S500 may include, but is not limited to, steps S510 to S530:

step S510, inputting a plurality of quantized transaction features into a first neural network model;

step S520, preprocessing each quantized transaction feature through an input layer to obtain corresponding standardized data;

in step S530, the normalized data is convolved by the convolution layer to obtain a plurality of distributed feature vectors.

Specifically, in this embodiment, the first neural network model is a CNN convolutional neural network model, and the CNN convolutional neural network model has very high efficiency in feature extraction, so that feature extraction is performed through the CNN convolutional neural network model first, and the efficiency of overall prediction of financial investment trend can be improved.

In step S510 of some embodiments, since the present application is a processing method of price data, the obtained raw data is mainly financial time series data, but there is no meaningful two-dimensional spatial relationship required by a two-dimensional convolutional neural network in a general storage structure of financial time series data, and when the financial time series data is processed by using the two-dimensional convolutional network, there is a problem that longitudinal movement may cause loss of time information. Therefore, the embodiment of the application selects to use the one-dimensional convolutional neural network model for feature extraction and prediction. Each quantized transaction feature is first preprocessed to obtain standardized data.

Preprocessing is carried out through an input layer of the CNN convolutional neural network, specifically:

after the index factor construction and public opinion evaluation of the original data in the previous step, the obtained index factor characteristics and public opinion factor characteristics are screened, and finally 49 quantized transaction characteristics are reserved. And then the quantized transaction characteristics are normalized and processed by (0, 1) of the input layer to obtain normalized data. Each standardized data includes characteristic data of N transaction days before the transaction day to be measured. For example, in the application scenario of trend prediction of stock price, the stock price of the 6 th trade day is predicted by using the original data of the first 5 trade days, then a matrix with the trade characteristics of 49×5 is quantized, and then the matrix is preprocessed to obtain the corresponding standardized data.

Referring to fig. 5, in some embodiments, step S530 may include, but is not limited to, steps S531 to S532:

step S531, format conversion is carried out on the standardized data through a convolution layer, and standard format data are obtained;

in step S532, the standard format data is convolved by the convolution check of the convolution layer, so as to obtain a plurality of distributed feature vectors.

Specifically, in this embodiment, convolution processing is performed by a convolution layer of the CNN convolutional neural network model, specifically:

The function of the convolution layer is to extract features of a local region, different convolution kernels corresponding to different feature extractors. In this embodiment, since one-dimensional convolution is used, in order to use one-dimensional convolution, format conversion needs to be performed on the standardized data to obtain standard format data. The specific operation is as follows: the 49 features are taken as wide, the N trade days before the trade day to be measured are long, the step length of the filter, namely the time interval during sliding, is set to 1, and the zero filling parameter is set to 1. And carrying out convolution processing on the input data by using one-dimensional convolution check with the size of 3, wherein the number of convolution kernels is 32, namely, each convolution check input space transversely slides according to the window size of 49 x 3 to extract features, training 32 convolution kernels, and extracting 32 different features in total. After one-dimensional convolution, 32 1*N distributed eigenvectors are obtained.

In the above application scenario of price data prediction of stocks, the stock price of the 6 th trading day is predicted by using the original data of the first 5 trading days, and the 32 1*5 distributed eigenvectors are obtained after format conversion and convolution.

In some embodiments, the obtained 32 1*N distributed feature vectors may be directly subjected to vector stitching, and then the stitched vectors pass through a full connection layer of the CNN convolutional neural network model, so that the target price data may also be obtained, however, in the financial field, the data are mostly financial time series data, and it is difficult to capture time series information by using the CNN convolutional neural network model alone, so that the obtained target price data is inaccurate.

Referring to fig. 6, in some embodiments, step S600 may include, but is not limited to, steps S610 to S620:

step S610, inputting a plurality of distributed feature vectors into a recurrent neural network;

and step S620, performing loop iteration processing on the distributed feature vectors through a loop layer in the loop neural network to obtain target price data.

In particular, in this embodiment, since the distributed feature vector belongs to time-series data, it is more appropriate to use the recurrent neural network to predict the trend, and the obtained target price data is more accurate.

In this embodiment, the second neural network model takes the form of an LSTM recurrent neural network model. The specific operation is as follows:

and inputting the plurality of distributed feature vectors obtained by the CNN convolutional neural network model into the LSTM convolutional neural network model, and performing cyclic iteration processing on the plurality of distributed feature vectors through a cyclic layer in the convolutional neural network to obtain target price data.

According to the technical scheme, the CNN convolutional neural network model and the LSTM convolutional neural network model are combined to predict the trend of financial investment, so that the interactive features between the original data can be rapidly captured, the time sequence information of the original data can be captured, the accuracy of the trend prediction of the financial investment is improved, and the prediction efficiency is improved.

Referring to fig. 7, an embodiment of the present application further provides a processing device for price data, which may implement the method for processing price data, where the device includes: an acquisition module 700, a first construction module 800, a second construction module 900, a screening module 1000, a feature extraction module 1100, and a prediction module 1200.

An acquisition module 700, configured to acquire original data to be predicted; wherein the raw data includes target report data and target transaction data;

a first construction module 800, configured to construct an index factor feature according to the target transaction data;

a second construction module 900, configured to construct a public opinion factor feature according to the target report data;

the screening module 1000 is configured to perform screening processing on the multiple index factor features and the multiple public opinion factor features to obtain multiple quantized transaction features;

the feature extraction module 1100 is configured to perform feature extraction on a plurality of quantized transaction features through a preset first neural network model, so as to obtain a plurality of distributed feature vectors;

the prediction module 1200 is configured to input a plurality of distributed feature vectors into a preset second neural network model to perform price prediction processing, so as to obtain target price data.

According to the price data processing device, original data to be predicted are obtained; the method comprises the steps of firstly, obtaining target report data, obtaining target transaction data, then, constructing index factor characteristics according to the target transaction data, constructing public opinion factor characteristics according to the target report data, then, screening the obtained index factor characteristics and the public opinion factor characteristics to obtain quantized transaction characteristics, extracting the quantized transaction characteristics through a preset first neural network model to obtain distributed characteristic vectors, and finally, inputting the distributed characteristic vectors into a preset second neural network model to conduct price prediction processing to obtain target price data. The accuracy of price data prediction is improved by acquiring various original data, the problem that irrelevant data is too much when the characteristic extraction is carried out on quantitative transaction characteristics is avoided through a screening mechanism, and the accuracy of price data prediction is further improved.

Note that, the processing device of the price data in the embodiment of the present application corresponds to the foregoing processing method of price data, and a specific embodiment is substantially the same as the specific embodiment of the foregoing processing method of price data, which is not described herein again.

The embodiment of the application also provides electronic equipment, which comprises: the price data processing system comprises a memory, a processor, a program stored in the memory and capable of running on the processor, and a data bus for realizing connection communication between the processor and the memory, wherein the program is executed by the processor to realize the price data processing method. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

The electronic device of the embodiment of the application is configured to execute the foregoing method for processing price data, by acquiring original data to be predicted; the method comprises the steps of firstly, obtaining target report data, obtaining target transaction data, then, constructing index factor characteristics according to the target transaction data, constructing public opinion factor characteristics according to the target report data, then, screening the obtained index factor characteristics and the public opinion factor characteristics to obtain quantized transaction characteristics, extracting the quantized transaction characteristics through a preset first neural network model to obtain distributed characteristic vectors, and finally, inputting the distributed characteristic vectors into a preset second neural network model to conduct price prediction processing to obtain target price data. The accuracy of price data prediction is improved by acquiring various original data, the problem that irrelevant data is too much when the characteristic extraction is carried out on quantitative transaction characteristics is avoided through a screening mechanism, and the accuracy of price data prediction is further improved.

Referring to fig. 8, fig. 8 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

the processor 1300 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;

memory 1400 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). Memory 1400 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented in software or firmware, relevant program codes are stored in memory 1400, and the processing method for executing the price data of the embodiments of the present disclosure is called by processor 1300;

an input/output interface 1500 for implementing information input and output;

the communication interface 1600 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

Bus 1700 transfers information between the various components of the device (e.g., processor 1300, memory 1400, input/output interface 1500, and communication interface 1600);

wherein processor 1300, memory 1400, input/output interface 1500, and communication interface 1600 enable communication connection among each other within a device via bus 1700.

The embodiment of the application also provides a storage medium, which is a computer readable storage medium and is used for computer readable storage, the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to realize the method for processing price data.

The storage medium of the embodiment of the present application is configured to execute the foregoing method for processing price data by acquiring original data to be predicted; the method comprises the steps of firstly, obtaining target report data, obtaining target transaction data, then, constructing index factor characteristics according to the target transaction data, constructing public opinion factor characteristics according to the target report data, then, screening the obtained index factor characteristics and the public opinion factor characteristics to obtain quantized transaction characteristics, extracting the quantized transaction characteristics through a preset first neural network model to obtain distributed characteristic vectors, and finally, inputting the distributed characteristic vectors into a preset second neural network model to conduct price prediction processing to obtain target price data. The accuracy of price data prediction is improved by acquiring various original data, the problem that irrelevant data is too much when the characteristic extraction is carried out on quantitative transaction characteristics is avoided through a screening mechanism, and the accuracy of price data prediction is further improved.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the solutions shown in fig. 1-8 are not limiting to embodiments of the present application, and may include more or fewer steps than illustrated, or may combine certain steps, or different steps.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method of processing price data, the method comprising:

performing validity check on the index factor features and the public opinion factor features, and taking the index factor features and the public opinion factor features which pass the check test as quantized transaction features to obtain a plurality of quantized transaction features;

extracting the characteristics of a plurality of quantized transaction characteristics through a preset first neural network model to obtain a plurality of distributed characteristic vectors; the first neural network model is a one-dimensional convolutional neural network model;

inputting a plurality of distributed feature vectors into a preset second neural network model for price prediction processing to obtain target price data; wherein the second neural network model is an LSTM cyclic neural network model;

Wherein, the constructing the public opinion factor feature according to the target report data includes:

carrying out emotion classification on the target report data through natural language processing to obtain report emotion types; wherein the reported emotion categories include negative emotion, positive emotion and neutral emotion;

extracting text characteristics of the target report data to obtain word segmentation score values, sentence score values, paragraph score values and grammar score values;

according to the word segmentation score value, the sentence score value, the paragraph score value and the grammar score value, the first reference value for representing the text information value is obtained;

performing readability evaluation on the target report data through natural language processing to obtain a second reference value for representing the readability value of the research report; wherein, the value range of the second reference value is [ 0,1 ];

if the reported emotion type is the positive emotion or the neutral emotion, performing positive processing on the target reference value to obtain the public opinion factor characteristic; and if the reported emotion type is the negative emotion, performing inverse processing on the target reference value to obtain the public opinion factor characteristic.

2. The method of claim 1, wherein the text feature extraction of the target report data to obtain a word score value, a sentence score value, a paragraph score value, and a grammar score value comprises:

carrying out statistical scoring processing on the target word segmentation set to obtain the word segmentation scoring value;

carrying out statistical scoring processing on the target sentence set to obtain the sentence scoring value;

carrying out statistical scoring processing on the target paragraph set to obtain the paragraph scoring value;

and carrying out statistical scoring processing on the target grammar set to obtain the grammar scoring value.

3. The method according to claim 1 or 2, wherein the first neural network model comprises: an input layer and a convolution layer; the feature extraction is performed on the quantized transaction features through a preset first neural network model to obtain distributed feature vectors, including:

4. A method according to claim 3, wherein said convolving said normalized data with said convolution layer to obtain a plurality of said distributed feature vectors, comprising:

5. The method of claim 1 or 2, wherein the second neural network model comprises a recurrent neural network; inputting the plurality of distributed feature vectors into a preset second neural network model for price prediction processing to obtain target price data, wherein the method comprises the following steps of:

6. A price data processing device, the device comprising:

the screening module is used for carrying out validity check on the index factor characteristics and the public opinion factor characteristics, taking the index factor characteristics and the public opinion factor characteristics which pass the check and test as quantized transaction characteristics, and obtaining a plurality of quantized transaction characteristics;

the feature extraction module is used for extracting features of a plurality of quantized transaction features through a preset first neural network model to obtain a plurality of distributed feature vectors; the first neural network model is a one-dimensional convolutional neural network model;

the prediction module is used for inputting a plurality of distributed feature vectors into a preset second neural network model to perform price prediction processing to obtain target price data; wherein the second neural network model is an LSTM cyclic neural network model;

the second construction module is configured to construct a public opinion factor feature according to the target report data, and includes:

7. An electronic device, comprising:

at least one memory;

at least one processor;

at least one program;

the method of any one of claims 1 to 5.

8. A storage medium that is a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions for causing a computer to perform:

the method of any one of claims 1 to 5.