CN111221938B - Prediction result generation method, terminal and storage medium - Google Patents

Prediction result generation method, terminal and storage medium Download PDF

Info

Publication number
CN111221938B
CN111221938B CN201911142180.6A CN201911142180A CN111221938B CN 111221938 B CN111221938 B CN 111221938B CN 201911142180 A CN201911142180 A CN 201911142180A CN 111221938 B CN111221938 B CN 111221938B
Authority
CN
China
Prior art keywords
predicted
main body
prediction
text data
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911142180.6A
Other languages
Chinese (zh)
Other versions
CN111221938A (en
Inventor
赵恺
郭健
郭家栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Deep Asset Management Co ltd
Peng Cheng Laboratory
Original Assignee
Hangzhou Deep Asset Management Co ltd
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Deep Asset Management Co ltd, Peng Cheng Laboratory filed Critical Hangzhou Deep Asset Management Co ltd
Priority to CN201911142180.6A priority Critical patent/CN111221938B/en
Publication of CN111221938A publication Critical patent/CN111221938A/en
Application granted granted Critical
Publication of CN111221938B publication Critical patent/CN111221938B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a prediction result generation method, a terminal and a storage medium, wherein the prediction result generation method comprises the following steps: acquiring a main body name to be predicted, and acquiring a key interest group corresponding to the main body to be predicted according to the main body name to be predicted; and obtaining a prediction result of the main body to be predicted according to the key interest group. According to the prediction result generation method provided by the invention, a large amount of data are screened according to the names of the main bodies to be predicted, the key interest group containing the characteristics of the main bodies to be predicted is obtained to generate the prediction result of the main bodies to be predicted, the corresponding prediction result can be output only by taking the names of the main bodies to be predicted as input, and the prediction rule is simple and effective.

Description

Prediction result generation method, terminal and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a prediction result generating method, a terminal, and a storage medium.
Background
With the development of the internet, the propagation speed and quantity of information have been greatly developed, and the information affects the mind and behavior of an information acquirer, thereby affecting the development of related subjects. For example, policy information and the like may affect the mind and behavior of investors in the stock market, and thus affect the situation of the stock market; the yield information of vegetables and the like can influence the psychology and behaviors of vegetable merchants and consumers, thereby causing the fluctuation of the price of the vegetables. However, in the prior art, the method for predicting the development result of the related subject by using the existing information is often limited to the traditional statistical analysis and rule classification, and mainly uses mature word bag models, TF-IDF (word frequency-inverse text sequence) and other manual features to express the text, and then combines with emotion dictionary and language rules constructed by artificial priori knowledge to judge the emotion tendency of the text, and map the emotion tendency of the text to the related subject to complete the result prediction, and the method has the following defects: the feature capturing capability is poor, the rule is complex mainly based on manual knowledge, the classification effect is influenced by the subdivision field, and the precision of the generated prediction result is low.
Accordingly, there is a need for improvement and development in the art.
Disclosure of Invention
The invention aims to solve the technical problems of complex prediction rules and low precision in the prior art by providing a prediction result generation method, a terminal and a storage medium aiming at the defects in the prior art.
The technical scheme of the invention is as follows:
in a first aspect, an embodiment of the present invention provides a method for generating a prediction result, where the method includes:
acquiring a main body name to be predicted, and acquiring a key interest group corresponding to the main body to be predicted according to the main body name to be predicted;
and obtaining a prediction result of the main body to be predicted according to the key interest group.
The method for generating the prediction result, wherein the step of obtaining the name of the main body to be predicted, before obtaining the key interest group corresponding to the main body to be predicted according to the name of the main body to be predicted, further comprises:
and acquiring text data associated with the main body to be predicted and a known result of the main body to be predicted, and storing the text data and the known result into a preset database.
The method for generating a prediction result, wherein the obtaining text data associated with the main body to be predicted and a known result of the main body to be predicted, and storing the text data and the known result in a preset database specifically includes:
Acquiring information data related to the main body to be predicted, screening, acquiring text data comprising the main body name to be predicted, and establishing an association relationship between the text data and the main body name to be predicted;
acquiring a known result of the main body to be predicted, and establishing an association relation between the known result and the text data;
and storing the text data and the known result into the database according to the preset time unit classification respectively.
The method for generating the prediction result, wherein the obtaining the main body name to be predicted, and obtaining the key interest group corresponding to the main body name to be predicted according to the main body name to be predicted specifically includes:
extracting the text data associated with the subject name to be predicted from the database according to the subject name to be predicted;
and processing the text data to extract the key interest group containing the main body name to be predicted.
The method for generating a prediction result, wherein the obtaining the prediction result of the main body to be predicted according to the key interest group specifically includes:
acquiring a feature vector corresponding to the main body to be predicted according to the key interest group;
And inputting the feature vector into a trained prediction model, and outputting a prediction result corresponding to the feature vector, wherein the trained prediction model is trained according to the feature vector and the known result related to the main body name to be predicted.
The method for generating a prediction result, wherein the obtaining, according to the key interest group, the feature vector corresponding to the main body to be predicted specifically includes:
extracting effective time steps of the key intention group;
and obtaining word vectors of all the effective time steps corresponding to the key interest group, inputting the word vectors into a pre-constructed feature extraction model, and outputting the feature vectors corresponding to the prediction main body.
The method for generating a prediction result, wherein the inputting the feature vector into a trained prediction model, and outputting the prediction result corresponding to the feature vector specifically includes:
inputting the feature vector into the trained prediction model, and outputting the probability of at least one prediction result category corresponding to the feature vector;
and respectively weighting the probabilities of the at least one predicted result category according to a preset weighting rule, and then obtaining the predicted result corresponding to the feature vector.
The method for generating a prediction result, wherein the obtaining the prediction result of the main body to be predicted according to the key interest group further includes:
obtaining a plurality of prediction results corresponding to a plurality of main bodies to be predicted in a main body group to be predicted respectively;
and generating group prediction results corresponding to the main body group to be predicted according to the preset generation rules and the plurality of prediction results.
In a second aspect, an embodiment of the present invention provides a terminal, including: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform a method of generating a prediction result implementing any of the above.
In a third aspect, embodiments of the present invention further provide a storage medium, where the storage medium stores a plurality of instructions adapted to be loaded by a processor and to perform the steps of the prediction result generation method as described in any one of the above.
The invention has the technical effects that: according to the prediction result generation method provided by the invention, a large amount of data are screened according to the names of the main bodies to be predicted, the key interest group containing the characteristics of the main bodies to be predicted is obtained to generate the prediction result of the main bodies to be predicted, the corresponding prediction result can be output only by taking the names of the main bodies to be predicted as input, and the prediction rule is simple and effective.
Drawings
FIG. 1 is a flowchart of a first embodiment of a method for generating a prediction result according to the present invention;
FIG. 2 is a first sub-flowchart of a first embodiment of a method for generating a prediction result according to the present invention;
FIG. 3 is a second sub-flowchart of the first embodiment of a method for generating a prediction result according to the present invention;
fig. 4 is a functional schematic of a terminal according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear and obvious, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The prediction result generation method provided by the invention can be applied to a terminal. The terminal may be, but is not limited to, various personal computers, notebook computers, cell phones, tablet computers, car computers, and portable wearable devices. The terminal of the invention adopts a multi-core processor. The processor of the terminal may be at least one of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a video processing unit (Video Processing Unit, VPU), and the like.
Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of the present invention, where the method for generating a prediction result includes the steps of:
s100, acquiring a subject name to be predicted, and acquiring a key interest group corresponding to the subject to be predicted according to the subject name to be predicted.
Specifically, the key interest group refers to information that has high correlation with the subject to be predicted and can represent characteristics of the subject to be predicted, for example, a sentence or a paragraph containing the subject to be predicted has a correlation with the subject to be predicted that is greater than a sentence or a paragraph not containing the subject to be predicted, and therefore, in the present invention, a sentence or a paragraph containing the subject to be predicted is selected as the key interest group of the subject to be predicted. In one possible implementation manner, the key interest group corresponding to the subject to be predicted is obtained directly from the name of the subject to be predicted in the internet, but in the internet, a large amount of information is stored, and it is obviously difficult and cumbersome to directly process the large amount of information. Therefore, in this embodiment, first, coarse screening is performed on all information in the internet, so as to obtain information data related to the subject to be predicted.
Specifically, in one implementation manner, a database is pre-established, and before the step S100, the method further includes the steps of obtaining text data associated with the to-be-predicted subject and a known result of the to-be-predicted subject, and storing the text data and the known result in a preset database, as shown in fig. 2, specifically including:
s001, acquiring information data related to the main body to be predicted, screening, acquiring text data comprising the main body name to be predicted, and establishing an association relationship between the text data and the main body name to be predicted.
In order to improve the efficiency of data acquisition, a plurality of subjects to be predicted are convenient to predict, and the relevant information data of the same type of subjects to be predicted may have a very high repetition rate, so the information data relevant to the subjects to be predicted are acquired according to the category of the subjects to be predicted, and thus the acquired information data can be used as the information basis for predicting a plurality of subjects to be predicted, and the relevant information data are not required to be acquired respectively for a single subject to be predicted, thereby improving the efficiency of information acquisition. For example, when the main body to be predicted is a certain stock, such as safe bank, not only the information related to the safe bank is obtained, but also the related information of the whole stock market is obtained, so that the prediction of other stock of the listed company, such as the bank of the sponsor, can be simultaneously realized; when the main body to be predicted is a fruit, such as an apple, the relevant information of the apple is not obtained only, but all the relevant information of the vegetables and fruits are obtained. In order to improve the reliability of the generated prediction result, the present application preferably selects official information as the information data, for example, news, personal columns of authoritative specialists, messages issued by official public numbers, messages issued by official micro-blogs, and the like. The information data may be selected according to the type of the subject to be predicted, for example, when prediction of stock price is performed, that is, when the subject to be predicted is a stock, news reports of main financial media, information released from a media official number, and the like are selected as the information data; and when the prediction of the vegetable price is performed, that is, the subject to be predicted is a vegetable, agricultural news, weather news, etc. are selected as the information data.
The information data may be obtained by directly accessing an information integration platform, for example, when the subject to be predicted is a stock, a news interface of a financial information integration platform such as a superior mine may be set to be accessed by a program at regular time, so as to obtain financial news integrated on the platform.
The information data can also be captured by a crawler, which is a program or script that automatically captures information according to certain rules. In the present invention, a crawling rule may be specified for a crawler, for example, a predetermined web page, a crawling web page of a client, etc., and information of news, comments, etc. released in the client, etc., may be accessed every preset time as the information data. The crawler can be built based on a common crawler framework such as Pysider, scrapy.
From the above description, it can be seen that when all information in the internet is coarsely screened, various forms of information are obtained when information data related to the main body to be predicted is obtained, and for convenience in storage and management, the obtained information data is further screened and processed and then stored in the database.
Firstly, preprocessing the information data, specifically preprocessing each information data, reserving text data in the information data, removing other data forms such as pictures, for example, for a news, reserving only text parts in the news, and discarding other data such as news map. Generally, the text data contains the most information, and the memory space occupied by the text data is small, so that the text data in the information data is only reserved. And then matching the text content according to the name of the main body to be predicted. In order to achieve the purpose of generating the prediction results of the plurality of subjects to be predicted, the information data related to the category corresponding to the subjects to be predicted is obtained, and correspondingly, when the information data is screened, names of all subjects to be predicted included in the category corresponding to the subjects to be predicted are obtained to carry out matching screening. For example, when the category corresponding to the main body to be predicted is stock market, acquiring company names of all the listed companies in the stock market as matching objects, and matching the text data. Specifically, the matching of the text data refers to traversing the text data, when the main body name to be predicted exists in the text data, determining that the matching is successful, reserving the text data, and establishing an association relationship between the text data and the main body name to be predicted.
The establishing of the association relationship between the text data and the main subject name to be predicted may be adding the main subject name to be predicted successfully matched in the naming of the text data, so that after the main subject name to be predicted is obtained, the text data with the main subject name to be predicted in the naming can be obtained according to the main subject name to be predicted; the method may further include writing the subject name to be predicted into a first line of the text data, and after the subject name to be predicted is obtained, reading the first line of all the text data to obtain the text data with the subject name to be predicted in the first line.
It is easy to see that, for the same text data, matching with more than one subject name to be predicted is possible to succeed, and then, respectively establishing association relations between the text data and the subjects to be predicted. And discarding the text data which cannot be successfully matched with any subject name to be predicted.
S002, obtaining the known result of the main body to be predicted, and establishing the association relation between the known result and the main body name to be predicted.
Specifically, the known result of the subject to be predicted refers to a known result of a prediction target corresponding to the prediction subject that can be obtained from existing information, for example, when the subject to be predicted is a stock of a company of a boss and the prediction target is a performance trend of the stock, the known result is a profit situation of the stock, and then the profit situations of the stocks of all companies on the boss in the stock market can be obtained as the known result; when the subject to be predicted is a fruit, the target is predicted to place the price trend of the fruit, then the retail price fluctuation condition of the fruit can be obtained to obtain the known result.
Taking the method of predicting the tendency of stock performance of the listed companies as an example, the method of obtaining the known result will be described in detail, and the stock performance is characterized by the profitability of the stock, so that the profitability of all the listed companies in the stock market is obtained respectively, and the profitability of all the listed companies is sorted in descending order, so that the stock performance of all the listed companies is classified into three categories: good, general, bad. The stock performance corresponding to each listed company is the known result corresponding to the listed company.
After the known result of the main body to be predicted is obtained, establishing an association relation between the known result and the main body name to be predicted.
S003, classifying the text data and the known result according to a preset time unit, and storing the classified text data and the known result into the database.
Specifically, each text data and the known result have corresponding time attributes, wherein the time attributes of the text data refer to the time of information data release corresponding to the text data, and the known result refers to the time of occurrence of the known result. After the text data and the known result are obtained, classifying the text data and the known result according to a preset time unit, for example, classifying the text data and the known result by taking one day as a time unit, classifying the text data with the same time attribute as the same day into the same class, storing the text data and the known result of all subjects to be predicted with the same time attribute as the same day into the same class, and storing the known result in the same set.
The time unit may be set differently according to the prediction target of the subject to be predicted, for example, if it is required to predict the result of the subject to be predicted after several hours, the time unit may be set to one hour.
Through the steps, the prediction result generation method provided by the embodiment performs preliminary screening and processing on a large amount of data existing on the internet, and obtains the text data and the known result associated with the main body to be predicted.
The obtaining the main body name to be predicted, and obtaining the key interest group corresponding to the main body to be predicted according to the main body name to be predicted specifically comprises:
s110, extracting the text data associated with the subject name to be predicted from the database according to the subject name to be predicted;
s120, processing the text data, and extracting the key interest group containing the main body name to be predicted.
Specifically, when a prediction result of the main body to be predicted needs to be generated, acquiring the main body name to be predicted, and extracting the text data associated with the main body name to be predicted from the database according to the association relation between the main body name to be predicted and the text data.
And after the text data associated with the main body name to be predicted is acquired, processing the text data, and extracting a key interest group containing the main body name to be predicted. The key interest group refers to information which has high correlation with the subject to be predicted and can represent the characteristics of the subject to be predicted, as described above. For the subject to be predicted in the same category, the text data corresponding to the subject to be predicted may be repeated, for example, in a certain paragraph of financial news, there may be comments made on stocks of a plurality of marketing companies. Therefore, in this embodiment, when generating the prediction result of a specific subject to be predicted, in order to more accurately select the key interest group related to the subject to be predicted, the influence of the related information of other subjects to be predicted on the prediction result of the subject to be predicted is reduced, and the sentence of the subject name to be predicted is extracted from the text data as the key interest group.
Specifically, the extraction of the key interest group may be accomplished by a pre-trained naming classifier, specifically, the names of the subjects to be predicted, including short names, common names, and the like, included in the categories of the subjects to be predicted are collected first. Training the naming classifier according to the text data and the main body names to be predicted, so that the naming classifier can identify the main body names to be predicted in the text data, and extract the key intention group containing the main body names to be predicted. Specifically, the method for extracting the main body name to be predicted by the naming classifier comprises the following steps: traversing the text data, marking the name of the main body to be predicted as 1 when the name of the main body to be predicted is found, marking other texts which are not the names of the main body to be predicted as 0, and extracting sentences containing the mark 1 as the key idea group.
Step S100, obtaining a subject name to be predicted, and after obtaining a key interest group corresponding to a pre-stored subject according to the subject name to be predicted, includes the steps of:
s200, obtaining a prediction result of the main body to be predicted according to the key interest group.
Specifically, as shown in fig. 3, the obtaining the prediction result of the subject to be predicted according to the key interest group includes the steps of:
s210, obtaining the feature vector corresponding to the main body to be predicted according to the key interest group.
Specifically, since the key interest group is a sentence, is text data, and the computer can only operate on numbers, the key interest group cannot be directly understood by the computer, and needs to be converted into a number quantity which can be recognized by the computer. In this embodiment, the key interest group is converted into a vector, and the vector is the feature vector corresponding to the main body to be predicted. The step of obtaining the feature vector corresponding to the main body to be predicted according to the key interest group specifically comprises the following steps:
s211, extracting the effective time steps of the key interest groups.
The time step is the minimum unit forming the key meaning group, the key meaning group is a sentence, and the corresponding minimum unit is a word, so that the time step for generating the key meaning group divides the key meaning group. In the art, there are a plurality of word segmentation models, such as a pkuseeg (an open source chinese word segmentation kit withdrawn from the university of beijing), jieba (a Python-based chinese word segmentation component) word segmentation model, and the like. The keyword group is segmented by the person skilled in the art by selecting a segmentation model according to actual situations, and the invention is not particularly limited thereto. After the word segmentation is completed, each word corresponds to one of the time steps. For example, "affected by drought", tomato yield decrease "corresponds to a time step of" affected by "" drought "," affected "," tomato "," yield "decrease".
The valid time steps refer to meaningful words, and not every word has an actual meaning in a sentence, for example, the words such as o, ya, etc. in chinese, the words such as the word in english, etc. in order to reduce the data processing amount and increase the data processing speed, in this embodiment, the words without actual meaning are discarded, and only the meaningful words are left, i.e. the valid time steps are extracted.
It should be noted that the present invention is not limited to the step of extracting the valid time steps from the keyword group after the keyword group is obtained, but may be the step of generating the time steps corresponding to the text data by segmenting text content in the text data when the text data is obtained, and filtering along with the time steps corresponding to the text data, and extracting the valid time steps corresponding to the text data. The key ideas are sentences extracted from the text data, so that only the valid time step is reserved in the key ideas generated according to the text data after the valid time step is extracted from the text data.
S212, obtaining word vectors of all the effective time steps corresponding to the key interest group, inputting the word vectors into a pre-constructed feature extraction model, and outputting the feature vectors corresponding to the prediction subject.
In one implementation, the method for obtaining the feature vector is to convert all words into a word vector, and then calculate the word vector of all valid time steps in the key interest group to obtain the feature vector of the key interest group.
Specifically, the word vectors are obtained by pre-training based on the corpus in the corresponding corpus, in short, all words in one corpus are converted into one floating point number vector, and the word vectors can capture the internal relations of different words well, so that for words with closer meanings, the corresponding word vectors are also close.
As described above, in this embodiment, the official information, such as news and messages issued by official microblogs, is selected as the information data, and then the key interest group is obtained by screening, so in this embodiment, the language expression of the key interest group is written rather than spoken language. Based on this, in this embodiment, when the word vector is obtained, the daily report of people is taken as a corpus, and most of the daily report of people is written language, which is similar to the expression mode of the key meaning group in this embodiment. Specifically, the method adopts skip-grams (skip-mode) algorithm to process all the words appearing in the people daily report and convert the words into corresponding word vectors. Of course, those skilled in the art may select other word vector generation algorithms to generate the word vector, such as CBOW (continuous bags of words, continuous word bag model) to generate the word vector, which is not particularly limited in this regard.
After the effective time steps of the key interest group are obtained, word vectors corresponding to the effective time steps are obtained, the word vectors corresponding to all the effective time steps in the key interest group are used as input, and the feature vectors corresponding to the prediction main body are output by utilizing a pre-built feature extraction model. Specifically, a transducer model (a natural language processing model proposed by google corporation) may be used as the feature extraction model to convert the input word vector and output a vector representing a sentence including the valid time step, i.e., the feature vector of the key intent group. Of course, the present invention is not limited to using a transducer model to output the feature vector, and those skilled in the art may select other natural speech processing models, such as an LSTM (long short-term memory) model, to complete the process of generating the feature vector from the word vector.
Since the word vectors corresponding to the words having the similar meanings are also similar, if two sentences having the similar meanings are provided with the words having the similar meanings, the feature vectors generated from the word vectors of the words having the similar meanings are also similar, that is, the feature vectors corresponding to the two key meaning groups having the similar meanings are also similar.
Returning to fig. 3, the obtaining the prediction result of the main body to be predicted according to the key interest group further includes the steps of:
s220, inputting the feature vector into a trained prediction model, and outputting a prediction result corresponding to the feature vector.
As described above, for the key interest group having a close meaning, the corresponding feature vector is also close, and the information included in the key interest group having a close meaning has a big probability of having a consistent influence on the mind and behavior of the information receiver, that is, the close feature vector corresponds to the close result, so the prediction model established in the present invention is trained according to the feature vector and the known result associated with the subject name to be predicted.
When training the prediction model, the training is performed based on deep learning (deep learning), specifically, the feature vector of the to-be-predicted subject at the time t is obtained according to the foregoing steps, meanwhile, the known result of the to-be-predicted subject at the time t+n is obtained, and as already described above, the approaching feature vector corresponds to the approaching result with a high probability, and if a large amount of data indicates that the known result of the to-be-predicted subject at the time t+n is X when the feature vector of the to-be-predicted subject at the time t is a, then, in the training, the prediction model obtains such a correspondence. When the prediction results of the main body to be predicted after n time units are needed to be generated, the feature vector of the main body to be predicted corresponding to the current time unit is obtained and is input into the trained prediction model, and if the feature vector is A, the prediction model outputs the prediction result of the main body to be predicted after the current time +n as X according to the corresponding relation obtained during training. That is, the predictive model is heavily data trained so that the predictive model can "learn" the association of the feature vector and the known result, and then invoke the trained predictive model to apply this association to generate the predicted result of the subject to be predicted corresponding to the feature vector.
Taking stock price prediction as an example, the income situation of the safe bank after 3 days needs to be predicted, training the prediction model by n=3, acquiring the feature vector corresponding to the safe bank every day in the database according to the steps, then corresponding the feature vector every day to the known result of the safe bank after 3 days, inputting the data into the prediction model for training, and learning the corresponding relation between the feature vector and the known result after 3 days by the trained prediction model. And when the prediction is carried out, the feature vector corresponding to the safe bank on the same day is acquired and is input into the trained prediction model, and the prediction model outputs the prediction result of the safe bank after 3 days. Of course, according to different prediction targets, different settings may be performed on parameters of the prediction model, for example, n=5, n=7, etc., and different settings may be performed on the time unit, for example, the time unit is set to one hour, one day, etc., and meanwhile, the feature vector of the main body to be predicted corresponding to the current time unit may be the feature vector corresponding to the key meaning group extracted from the text data of the current time unit with a time attribute, or may be the feature vector corresponding to the key meaning group extracted from the text data of m time units before the current time unit with a time attribute.
It can be appreciated that, due to timeliness of information data, the data in the database is updated cumulatively, so that the method for generating the prediction result provided by the invention is also performed cumulatively, that is, the training result is generated by using the trained prediction model and the training of the prediction model can be performed simultaneously.
Of course, the foregoing is merely a simple principle description of training the prediction model based on deep learning, and in fact, due to diversity of data sources, an idealized situation that a certain feature vector corresponds to one prediction result does not occur, in this embodiment, the feature vector is input into a trained prediction model, a prediction result corresponding to the feature vector is output, specifically, the feature vector is input into the trained prediction model first, a probability of at least one prediction result category corresponding to the feature vector is output, and then the probabilities of the at least one prediction result category are weighted according to a preset weighting rule, and then the prediction result corresponding to the feature vector is obtained.
Specifically, the probability of outputting at least one prediction result category corresponding to the feature vector is achievable by a softmax function (normalized exponential function) in the prediction model, which is a function commonly used in deep learning, maps the output to a real number between 0 and 1, and normalizes the guaranteed sum to 1, that is, the prediction model outputs the probability of at least one prediction result category through softmax, and the sum of all probabilities of the outputs is 1. Taking stock prediction as an example again, according to the foregoing, the prediction results may be classified into three types, namely good, general and bad, when predicting the stocks of the safe banks, the feature vector of the safe banks corresponding to the current time unit is input into the trained prediction model, the prediction model outputs the probabilities of the good, general and bad prediction results respectively, and the sum of the three is 1, for example, a data set such as output (0.6,0.3,0.1) corresponds to a good probability of 0.6, a general probability of 0.3 and a bad probability of 0.1 respectively.
After the probability of the at least one prediction result category is obtained, the probabilities of the at least one prediction result category are weighted according to a preset weighting rule, for example, for the to-be-predicted subject to be a stock, a good weight may be set to 3, a general weight is set to 1, a poor weight is set to-2, and then, assuming that the safe bank correspondingly outputs the dataset of (0.6,0.3,0.1), the weighting operation is performed to obtain: as a result of predicting a safe bank, 0.28 is 0.6x3+0.3x1+0.1 (-2) =0.28, and it can be seen that when a good weight is set as a positive number and a bad weight is set as a negative number, the larger the value of the result, the better the prediction performance of the corresponding stock. For another example, in the case that the main body to be predicted is a vegetable or fruit, assuming that the apples output the data sets (0.2,0.5,0.3) corresponding to the respective prices, the probability of rising is 0.2, the probability of holding the flat is 0.5, the probability of falling is 0.3, and then the output probabilities are weighted according to a preset weighting rule, for example, the rising weight is set to-1, the holding weight is set to 1, and the falling weight is set to 1, then the prediction result of the apples is generated to be 0.6, and it can be seen that when the rising weight is set to a negative number and the falling weight is set to a positive number, the greater the prediction result value is, the lower the prediction price of the corresponding vegetables is more likely to go.
The weighting rule may be set differently according to the actual situation, for example, due to stocks, different weighting values are set corresponding to different investment strategies, for example, when the investment strategy is advanced, the weighting value with good prediction result category is set higher, and when the investment strategy is kept, the weighting value with good prediction result is set to a smaller value.
In this embodiment, the group prediction result of the to-be-predicted subject group including a plurality of to-be-predicted subjects may be further obtained by combining the prediction results of the to-be-predicted subjects, and specifically includes:
obtaining a plurality of prediction results corresponding to a plurality of main bodies to be predicted in a main body group to be predicted respectively;
and generating group prediction results corresponding to the main body group to be predicted according to the preset generation rules and the plurality of prediction results.
The main body group to be predicted refers to a set including at least one main body to be predicted, and the main bodies to be predicted included by the main body group to be predicted have the same category, for example, when one main body to be predicted is safe silver, the main body group to be predicted may include all marketing companies; when one of the main bodies to be predicted is a tomato, the main body group to be predicted may include all vegetables and fruits.
After the prediction result of the main body to be predicted contained in the main body group to be predicted is obtained, a group prediction result corresponding to the main body group to be predicted can be generated according to a preset generation rule. For example, when the theme group to be predicted includes all the listed companies, the prediction results corresponding to the listed companies included in the main group with pre-stored main groups may be obtained, for example, the prediction results of the main waiting for prediction by the safe bank and the sponsor bank are mostly stock-well represented, and then, the group prediction results with well represented stock market as a whole may be output. Or obtaining the prediction result value corresponding to the main body to be predicted contained in the main body group to be predicted, and arranging the prediction result values in sequence to form all main body performance ranks to be predicted in the main body group to be predicted as the group prediction result.
In summary, according to the prediction result generation method provided by the embodiment, a large amount of data is screened according to the names of the main bodies to be predicted, the key interest group containing the main body features to be predicted is obtained, and the key interest group is converted into the feature vector which can be understood by a computer, so that the prediction result of the main bodies to be predicted is generated through the prediction model trained according to the feature vector and the known result of the main bodies to be predicted, and in the whole process, the corresponding prediction result can be output only by taking the names of the main bodies to be predicted as input, and the prediction rule is simple and effective.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Example two
In combination with the above embodiment, the present invention also provides a terminal, and a functional block diagram thereof may be shown in fig. 4. The terminal comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. Wherein the processor of the terminal is adapted to provide computing and control capabilities. The memory of the terminal includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the terminal is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of generating a prediction result. The display screen of the terminal can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the terminal is preset in the intelligent terminal and is used for detecting the current running temperature of the internal equipment.
It will be appreciated by those skilled in the art that the functional block diagram shown in fig. 4 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the terminal to which the present inventive arrangements may be applied, and that a particular terminal may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a terminal is provided, including a memory and a processor, the memory storing a computer program, the processor executing the computer program to perform at least the following steps:
acquiring a main body name to be predicted, and acquiring a key interest group corresponding to the main body to be predicted according to the main body name to be predicted;
and obtaining a prediction result of the main body to be predicted according to the key interest group.
The method for generating the prediction result, wherein the step of obtaining the name of the main body to be predicted, before obtaining the key interest group corresponding to the main body to be predicted according to the name of the main body to be predicted, further comprises:
and acquiring text data associated with the main body to be predicted and a known result of the main body to be predicted, and storing the text data and the known result into a preset database.
The method for generating a prediction result, wherein the obtaining text data associated with the main body to be predicted and a known result of the main body to be predicted, and storing the text data and the known result in a preset database specifically includes:
acquiring information data related to the main body to be predicted, screening, acquiring text data comprising the main body name to be predicted, and establishing an association relationship between the text data and the main body name to be predicted;
acquiring a known result of the main body to be predicted, and establishing an association relation between the known result and the text data;
and storing the text data and the known result into the database according to the preset time unit classification respectively.
The method for generating the prediction result, wherein the obtaining the main body name to be predicted, and obtaining the key interest group corresponding to the main body name to be predicted according to the main body name to be predicted specifically includes:
extracting the text data associated with the subject name to be predicted from the database according to the subject name to be predicted;
and processing the text data to extract the key interest group containing the main body name to be predicted.
The method for generating a prediction result, wherein the obtaining the prediction result of the main body to be predicted according to the key interest group specifically includes:
acquiring a feature vector corresponding to the main body to be predicted according to the key interest group;
and inputting the feature vector into a trained prediction model, and outputting a prediction result corresponding to the feature vector, wherein the trained prediction model is trained according to the feature vector and the known result related to the main body name to be predicted.
The method for generating a prediction result, wherein the obtaining, according to the key interest group, the feature vector corresponding to the main body to be predicted specifically includes:
extracting effective time steps of the key intention group;
and obtaining word vectors of all the effective time steps corresponding to the key interest group, inputting the word vectors into a pre-constructed feature extraction model, and outputting the feature vectors corresponding to the prediction main body.
The method for generating a prediction result, wherein the inputting the feature vector into a trained prediction model, and outputting the prediction result corresponding to the feature vector specifically includes:
inputting the feature vector into the trained prediction model, and outputting the probability of at least one prediction result category corresponding to the feature vector;
And respectively weighting the probabilities of the at least one predicted result category according to a preset weighting rule, and then obtaining the predicted result corresponding to the feature vector.
The method for generating a prediction result, wherein the obtaining the prediction result of the main body to be predicted according to the key interest group further includes:
obtaining a plurality of prediction results corresponding to a plurality of main bodies to be predicted in a main body group to be predicted respectively;
and generating group prediction results corresponding to the main body group to be predicted according to the preset generation rules and the plurality of prediction results.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims (6)

1. A prediction result generation method, characterized in that the prediction result generation method comprises:
acquiring a main body name to be predicted, acquiring a key interest group corresponding to the main body to be predicted according to the main body name to be predicted, and selecting sentences or paragraphs containing the main body name to be predicted as the key interest group corresponding to the main body to be predicted;
obtaining a prediction result of the main body to be predicted according to the key interest group;
the obtaining the main body name to be predicted, before obtaining the key interest group corresponding to the main body to be predicted according to the main body name to be predicted, further comprises:
acquiring text data associated with the main body to be predicted and a known result of the main body to be predicted, and storing the text data and the known result into a preset database;
the obtaining text data associated with the main body to be predicted and the known result of the main body to be predicted, and storing the text data and the known result into a preset database specifically includes:
Acquiring information data related to the main body to be predicted, preprocessing each information data, reserving text data in the information data, and removing other data forms except the text data in the information data, wherein the information data are all related information under the category of the main body to be predicted and are official data so as to acquire the text data comprising the main body name to be predicted, and establishing an association relation between the text data and the main body name to be predicted;
matching the text content according to the name of the main body to be predicted;
acquiring a known result of the main body to be predicted, and establishing an association relation between the known result and the text data;
the text data and the known result are respectively classified and stored into the database according to a preset time unit;
the obtaining the prediction result of the main body to be predicted according to the key interest group specifically includes:
acquiring a feature vector corresponding to the main body to be predicted according to the key interest group;
inputting the feature vector into a trained prediction model, and outputting a prediction result corresponding to the feature vector, wherein the trained prediction model is trained according to the feature vector and the known result related to the main body name to be predicted;
The obtaining the feature vector corresponding to the main body to be predicted according to the key interest group specifically includes:
extracting effective time steps of the key intention group;
and obtaining word vectors of all the effective time steps corresponding to the key interest group, inputting the word vectors into a pre-constructed feature extraction model, and outputting the feature vectors corresponding to the prediction main body.
2. The method for generating a prediction result according to claim 1, wherein the obtaining the subject name to be predicted, and obtaining the key interest group corresponding to the subject name to be predicted according to the subject name to be predicted specifically includes:
extracting the text data associated with the subject name to be predicted from the database according to the subject name to be predicted;
and processing the text data to extract the key interest group containing the main body name to be predicted.
3. The prediction result generation method according to claim 1, wherein the inputting the feature vector into the trained prediction model, and outputting the prediction result corresponding to the feature vector specifically includes:
inputting the feature vector into the trained prediction model, and outputting the probability of at least one prediction result category corresponding to the feature vector;
And respectively weighting the probabilities of the at least one predicted result category according to a preset weighting rule, and then obtaining the predicted result corresponding to the feature vector.
4. The method for generating a prediction result according to claim 1, wherein the obtaining the prediction result of the subject to be predicted according to the key interest group further comprises:
obtaining a plurality of prediction results corresponding to a plurality of main bodies to be predicted in a main body group to be predicted respectively;
and generating group prediction results corresponding to the main body group to be predicted according to the preset generation rules and the plurality of prediction results.
5. A terminal, the terminal comprising: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform a method of generating a prediction result implementing any of the preceding claims 1-4.
6. A storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the prediction result generation method of any of claims 1-4.
CN201911142180.6A 2019-11-20 2019-11-20 Prediction result generation method, terminal and storage medium Active CN111221938B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911142180.6A CN111221938B (en) 2019-11-20 2019-11-20 Prediction result generation method, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911142180.6A CN111221938B (en) 2019-11-20 2019-11-20 Prediction result generation method, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN111221938A CN111221938A (en) 2020-06-02
CN111221938B true CN111221938B (en) 2024-02-23

Family

ID=70832086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911142180.6A Active CN111221938B (en) 2019-11-20 2019-11-20 Prediction result generation method, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111221938B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222149A (en) * 2019-05-17 2019-09-10 华中科技大学 A kind of Time Series Forecasting Methods based on news public sentiment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222149A (en) * 2019-05-17 2019-09-10 华中科技大学 A kind of Time Series Forecasting Methods based on news public sentiment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晨 等.基于词典与规则的新闻文本情感倾向性分析.山东科学.2017,第第30卷卷(第第30卷期),参见正文第115-121页. *

Also Published As

Publication number Publication date
CN111221938A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN111881262B (en) Text emotion analysis method based on multi-channel neural network
CN111191092B (en) Label determining method and label determining model training method
CN113722438B (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN107943792B (en) Statement analysis method and device, terminal device and storage medium
CN111475622A (en) Text classification method, device, terminal and storage medium
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN110362798B (en) Method, apparatus, computer device and storage medium for judging information retrieval analysis
CN113627151B (en) Cross-modal data matching method, device, equipment and medium
CN112579729B (en) Training method and device for document quality evaluation model, electronic equipment and medium
CN109344246B (en) Electronic questionnaire generating method, computer readable storage medium and terminal device
CN112632256A (en) Information query method and device based on question-answering system, computer equipment and medium
CN112347245A (en) Viewpoint mining method and device for investment and financing field mechanism and electronic equipment
CN112765403A (en) Video classification method and device, electronic equipment and storage medium
US20230368003A1 (en) Adaptive sparse attention pattern
CN117725261A (en) Cross-modal retrieval method, device, equipment and medium for video text
CN109858035A (en) A kind of sensibility classification method, device, electronic equipment and readable storage medium storing program for executing
CN109635289B (en) Entry classification method and audit information extraction method
Háva et al. Supervised two-step feature extraction for structured representation of text data
CN111859955A (en) Public opinion data analysis model based on deep learning
CN111221938B (en) Prediction result generation method, terminal and storage medium
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN110851600A (en) Text data processing method and device based on deep learning
US20220335274A1 (en) Multi-stage computationally efficient neural network inference
Kostkina et al. Document categorization based on usage of features reduction with synonyms clustering in weak semantic map
CN112364258B (en) Recommendation method and system based on map, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant