CN109300031A - Data digging method and device based on stock comment data - Google Patents

Data digging method and device based on stock comment data Download PDF

Info

Publication number
CN109300031A
CN109300031A CN201810942719.5A CN201810942719A CN109300031A CN 109300031 A CN109300031 A CN 109300031A CN 201810942719 A CN201810942719 A CN 201810942719A CN 109300031 A CN109300031 A CN 109300031A
Authority
CN
China
Prior art keywords
stock
comment data
commentator
viewpoint
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810942719.5A
Other languages
Chinese (zh)
Inventor
王浩
张晨
庞旭林
杜长营
杨康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201810942719.5A priority Critical patent/CN109300031A/en
Publication of CN109300031A publication Critical patent/CN109300031A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a kind of data digging method and device based on stock comment data, this method comprises: obtaining stock comment data;Wherein, a stock comment data refers to single stock commentator to the single comment data of single stock;Based on acquired stock comment data, the viewpoint polarity distribution information of stock commentator is excavated;And based on acquired stock comment data, excavate the viewpoint reliability distributed intelligence of stock commentator.The present invention has merged a variety of Heterogeneous Information Sources, such as stock price timing, stock comment content of text and deliver stock comment stock commentator historical behavior, based on the multi-source heterogeneous big data, it is analysed in depth by data mining technology and extracts key feature, stock comment degree of reiability is carried out using these features, to choose good quality stock from massive information, investor can be helped more accurately to understand the general trend of market development and stock dynamic, used for investor or quant.

Description

Data digging method and device based on stock comment data
Technical field
The present invention relates to artificial intelligence and big data field, and in particular to a kind of data mining based on stock comment data Method, apparatus, electronic equipment and computer readable storage medium.
Background technique
Investor would generally find associated value information using search engine and help its final decision, and these decision processes Major part is the analytical judgment and experience by people.In fact, the stock comment data in internet contains abundant and has The semantic information of value can help investor to understand the general trend of market development and stock dynamic.Existing stock comment analysis method is usual The feeling polarities of capture stock comment are only focused only on, to understand that stock comment acts on the macroscopic view of the general trend of market development.However, in internet Stock comment usually contained a large amount of noise, such as waterborne troops and personal subjective tendency group psychology, to severely impact throwing The judgement of money person.Therefore the analysis of fine granularity authority is carried out to stock comment information using artificial intelligence technology, and then is automatically stock It is significantly that the people and stock analysis, which are an apprentice of selected good quality stock in massive information,.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State the data digging method based on stock comment data, device, electronic equipment and the computer readable storage medium of problem.
According to one aspect of the present invention, a kind of data digging method based on stock comment data, this method are provided Include:
Obtain stock comment data;Wherein, a stock comment data refers to single stock commentator to single stock Single comment data;
Based on acquired stock comment data, the viewpoint polarity distribution information of stock commentator is excavated;
And based on acquired stock comment data, excavate the viewpoint reliability distributed intelligence of stock commentator.
Optionally, this method further includes the step of stock comment data cleaning after the step of obtaining stock comment data Suddenly, it specifically includes:
Deleting viewpoint polarity is neutral stock comment data;
And/or
It deletes length and is less than stock comment data corresponding to the stock comment sequence of preset threshold;Wherein, stock is commented on Sequence refers to the combination for the stock comment data that same commentator comments on same stock in different time.
Optionally, a stock comment data includes:
Stock commentator mark, target stock, includes the polar content of viewpoint at the comment time.
Optionally, described based on acquired stock comment data, excavate the viewpoint polarity distribution information of stock commentator Including one of following or a variety of:
All historical stocks of same stock are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines the stock Ticket commentator issues the probability of stock comment data expected to fall for the stock;
All historical stocks of different stocks are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator publication is expected to rise, and determines stock commentator hair The probability of cloth stock comment data expected to fall;
All historical stocks of same stock are directed to based on the different stock commentators in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines that stock is commented The probability of stock comment data expected to fall is issued for the stock by member;
All historical stocks of different stocks are directed to based on the different stock commentators in acquired stock comment data Comment data determines the probability for the stock comment data that publication is expected to rise, and determines and issue the general of stock comment data expected to fall Rate.
Optionally, described based on acquired stock comment data, the viewpoint reliability for excavating stock commentator is distributed letter Breath includes:
According to the price timing information of different stocks, the same stock comment in acquired stock comment data is determined Correctness of the member for each stock comment data in all historical stock comment datas of different stocks;
According to the stock comment data quantity of the correct stock comment data quantity of a stock commentator and mistake, really The viewpoint reliability distribution of fixed stock commentator.
Optionally, this method further comprises:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, extract Stock comment data pair;
Stock comment data pair based on extraction count stock commentator and keep the probability of viewpoint and change the probability of viewpoint.
Optionally, this method further comprises:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, extract Stock comment data pair;
Stock comment data pair based on extraction determine that stock commentator changes the probability of viewpoint under the premise of viewpoint is correct TSRatio, and determine that stock commentator changes the probability FSRatio of viewpoint under the premise of viewpoint mistake.
Optionally, this method further comprises:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, extract Stock comment data pair;
Stock comment data pair based on extraction determine that stock commentator keeps viewpoint under the premise of viewpoint is correct, and protect The correct probability TCTRatio of the viewpoint held, and determine that stock commentator changes viewpoint under the premise of viewpoint is correct, and The correct probability TSTRatio of the viewpoint of change;
Stock comment data pair based on extraction determine that stock commentator keeps viewpoint under the premise of viewpoint mistake, and protect The correct probability FCTRatio of the viewpoint held, and determine that stock commentator changes viewpoint under the premise of viewpoint mistake, and The correct probability FSTRatio of the viewpoint of change.
Optionally, this method further comprises:
Receive the specified viewpoint information inquiry request about stock commentator;
Export result data corresponding with the inquiry request.
According to another aspect of the present invention, a kind of data mining device based on stock comment data, the dress are provided It sets and includes:
Acquiring unit is suitable for obtaining stock comment data;Wherein, a stock comment data refers to single stock commentator To the single comment data of single stock;
Unit is excavated, suitable for excavating the viewpoint polarity distribution letter of stock commentator based on acquired stock comment data Breath;And suitable for excavating the viewpoint reliability distributed intelligence of stock commentator based on acquired stock comment data.
Optionally, the device further include:
Data cleansing unit is neutral stock comment suitable for deleting viewpoint polarity from acquired stock comment data Data;And/or sequence institute is commented on less than the stock of preset threshold suitable for deleting length from from acquired stock comment data Corresponding stock comment data;
Wherein, stock comment sequence refers to that same commentator comments in the stock that different time comments on same stock The combination of data.
Optionally, a stock comment data includes:
Stock commentator mark, target stock, includes the polar content of viewpoint at the comment time.
Optionally, the excavation unit is adapted for carrying out one of following or a variety of:
All historical stocks of same stock are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines the stock Ticket commentator issues the probability of stock comment data expected to fall for the stock;
All historical stocks of different stocks are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator publication is expected to rise, and determines stock commentator hair The probability of cloth stock comment data expected to fall;
All historical stocks of same stock are directed to based on the different stock commentators in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines that stock is commented The probability of stock comment data expected to fall is issued for the stock by member;
All historical stocks of different stocks are directed to based on the different stock commentators in acquired stock comment data Comment data determines the probability for the stock comment data that publication is expected to rise, and determines and issue the general of stock comment data expected to fall Rate.
Optionally,
The excavation unit determines acquired stock comment data suitable for the price timing information according to different stocks In the same stock commentator for different stocks all historical stock comment datas in each stock comment data Correctness;And according to the stock comment data number of the correct stock comment data quantity of a stock commentator and mistake Amount determines the viewpoint reliability distribution of stock commentator.
Optionally,
The excavation unit is further adapted for based on same stock commentator in the stock comment sequence of same stock Each adjacent stock comment data extracts stock comment data pair;And suitable for the stock comment data pair based on extraction, count stock comment Member keeps the probability of viewpoint and changes the probability of viewpoint.
Optionally,
The excavation unit is further adapted for based on same stock commentator in the stock comment sequence of same stock Each adjacent stock comment data extracts stock comment data pair;And suitable for the stock comment data pair based on extraction, determine that the stock is commented on Member changes the probability TSRatio of viewpoint under the premise of viewpoint is correct, and determines stock commentator before viewpoint mistake Put the probability FSRatio for changing viewpoint.
Optionally,
The excavation unit is further adapted for based on same stock commentator in the stock comment sequence of same stock Each adjacent stock comment data extracts stock comment data pair;Suitable for the stock comment data pair based on extraction, determine that stock commentator exists Viewpoint, and the correct probability TCTRatio of viewpoint kept are kept under the premise of viewpoint is correct, and determine stock commentator Change viewpoint, and the viewpoint correct probability TSTRatio changed under the premise of viewpoint is correct;And suitable for based on extraction Stock comment data pair determine that stock commentator keeps viewpoint, and the viewpoint correct probability kept under the premise of viewpoint mistake FCTRatio, and determine that stock commentator changes viewpoint under the premise of viewpoint mistake, and the viewpoint changed is correctly general Rate FSTRatio.
Optionally, which further comprises:
Query processing unit, suitable for receiving the specified viewpoint information inquiry request about stock commentator;And output with The corresponding result data of the inquiry request.
According to a further aspect of the invention, a kind of electronic equipment is provided, the electronic equipment includes: processor, with And it is stored with the memory for the computer program that can be run on a processor;
Wherein, the processor, for executing any of the above-described institute when executing the computer program in the memory The method stated.
According to a further aspect of the invention, a kind of computer readable storage medium is provided, computer is stored thereon with Program, the computer program realize method described in any of the above embodiments when being executed by processor.
According to the technique and scheme of the present invention, by obtaining stock comment data;Wherein, a stock comment data refers to list Single comment data of a stock commentator to single stock;Based on acquired stock comment data, stock commentator is excavated Viewpoint polarity distribution information;And based on acquired stock comment data, excavate the viewpoint reliability point of stock commentator Cloth information.The present invention has merged a variety of Heterogeneous Information Sources, such as stock price timing, stock comment content of text and delivers stock comment The historical behavior of stock commentator is based on the multi-source heterogeneous big data, analyses in depth by data mining technology and extract key Feature, carrying out stock comment degree of reiability using these features can help to throw to choose good quality stock from massive information Money person more accurately understands the general trend of market development and stock dynamic, uses for investor or quant.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of data digging method process based on stock comment data according to an embodiment of the invention Figure;
Fig. 2 is a stock comment data information schematic diagram;
Fig. 3 is another representation schematic diagram of a stock comment data information;
Fig. 4 is the stock comment data volume schematic diagram after original stock comment data volume and cleaning;
Fig. 5 is using the profit situation schematic diagram after intelligence share-selecting method c selection stock;
Fig. 6 shows a kind of data mining device signal based on stock comment data according to an embodiment of the invention Figure;
Fig. 7 shows another data mining device signal based on stock comment data of one embodiment of the invention Figure;
Fig. 8 is the structural schematic diagram of the electronic equipment in the embodiment of the present invention;
Fig. 9 is the structural schematic diagram of one of embodiment of the present invention computer readable storage medium.
Specific embodiment
The explanation of nouns that the present invention occurs:
TSRatio:the Ratio of True-then-Shift, changes correct aspect ratio, comments for characterizing stock By member to stock comment viewpoint it is correct under the premise of change viewpoint a possibility that.
FSRatio:the Ratio of False-then-Shift changes wrong views ratio, comments for indicator stock A possibility that by member to viewpoint is changed under the premise of stock comment viewpoint mistake.
TCTRatio:the Reliability Ratio of True-then-Constant, consistent correct viewpoint are reliable Ratio, for characterize stock commentator it is correct to stock comment viewpoint under the premise of still keep the reliability of the viewpoint.
TSTRatio:the Reliability Ratio of True-then-Shift, changes correct viewpoint and reliably compares Rate, for characterize stock commentator to stock comment viewpoint it is correct under the premise of change viewpoint reliability.
FCTRatio:the Reliability Ratio of False-then-Constant, consistent wrong views can By ratio, for characterizing stock commentator to the reliability for still keeping the viewpoint under the premise of stock comment viewpoint mistake.
FSTRatio:the Reliability Ratio of False-then-Shift changes wrong views and reliably compares Rate comments on stock the reliability of change viewpoint under the premise of viewpoint mistake for characterizing stock commentator.
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Fig. 1 shows a kind of data digging method process based on stock comment data according to an embodiment of the invention Figure, as shown in Figure 1, this method comprises:
Step S11: stock comment data is obtained;Wherein, a stock comment data refers to single stock commentator to list The single comment data of a stock;
Wherein, a stock comment data includes: stock commentator mark, the comment time, target stock, includes viewpoint pole Property content, as shown in Fig. 2, Fig. 2 be a stock comment data information schematic diagram.
Further include the steps that stock comment data is cleaned after the step of obtaining stock comment data, specifically include:
Deleting viewpoint polarity is neutral stock comment data;And/or delete the stock comment that length is less than preset threshold Stock comment data corresponding to sequence;
Wherein, stock comment sequence refers to the stock that same stock commentator comments on same stock in different time The combination of comment data.
Step S12: based on acquired stock comment data, the viewpoint polarity distribution information of stock commentator is excavated;
The step includes one of following or a variety of:
All historical stocks of same stock are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines the stock Ticket commentator issues the probability of stock comment data expected to fall for the stock;
All historical stocks of different stocks are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator publication is expected to rise, and determines stock commentator hair The probability of cloth stock comment data expected to fall;
All historical stocks of same stock are directed to based on the different stock commentators in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines that stock is commented The probability of stock comment data expected to fall is issued for the stock by member;
All historical stocks of different stocks are directed to based on the different stock commentators in acquired stock comment data Comment data determines the probability for the stock comment data that publication is expected to rise, and determines and issue the general of stock comment data expected to fall Rate.
Step S13: and, based on acquired stock comment data, excavate the viewpoint reliability distribution of stock commentator Information.
The step includes:
According to the price timing information of different stocks, the same stock comment in acquired stock comment data is determined Correctness of the member for each stock comment data in all historical stock comment datas of different stocks;
According to the stock comment data quantity of the correct stock comment data quantity of a stock commentator and mistake, really The correct probability of the viewpoint of fixed stock commentator.
By obtaining stock comment data;Wherein, a stock comment data refers to single stock commentator to single stock The single comment data of ticket;Based on acquired stock comment data, the viewpoint polarity distribution information of stock commentator is excavated;With And based on acquired stock comment data, excavate the viewpoint reliability distributed intelligence of stock commentator.The present invention has merged more Kind of Heterogeneous Information Sources, for example, stock price timing, stock comment content of text and deliver stock comment stock commentator historical behavior, Based on the multi-source heterogeneous big data, key feature is analysed in depth and extracted by data mining technology, is carried out using these features Stock comment degree of reiability can help investor more accurately to understand city to choose good quality stock from massive information Field tendency and stock dynamic, use for investor or quant.
In some embodiments, Fig. 1 the method further includes:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, extract Stock comment data pair;
Stock comment data pair based on extraction count stock commentator and keep the probability of viewpoint and change the probability of viewpoint.
In some embodiments, Fig. 1 the method further includes:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, extract Stock comment data pair;
Stock comment data pair based on extraction determine that stock commentator changes the probability of viewpoint under the premise of viewpoint is correct TSRatio, and determine that stock commentator changes the probability FSRatio of viewpoint under the premise of viewpoint mistake.
In some embodiments, Fig. 1 the method further includes:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, extract Stock comment data pair;
Stock comment data pair based on extraction determine that stock commentator keeps viewpoint under the premise of viewpoint is correct, and protect The correct probability TCTRatio of the viewpoint held, and determine that stock commentator changes viewpoint under the premise of viewpoint is correct, and The correct probability TSTRatio of the viewpoint of change;
Stock comment data pair based on extraction determine that stock commentator keeps viewpoint under the premise of viewpoint mistake, and protect The correct probability FCTRatio of the viewpoint held, and determine that stock commentator changes viewpoint under the premise of viewpoint mistake, and The correct probability FSTRatio of the viewpoint of change.
In some embodiments, Fig. 1 the method further includes:
Receive the specified viewpoint information inquiry request about stock commentator;
Export result data corresponding with the inquiry request.
The solution proposed by the present invention that Reliability modeling is carried out to stock comment data, the program are one unified Frame, has merged a variety of Heterogeneous Information Sources, such as stock price timing, stock comment text content and delivers stock comment The historical behavior of stock commentator can effectively cross noise filtering, filter out valuable, reliable stock comment information, for investment Person or quant use;Can be applied not only to stock comment information fail-safe analysis, apply also for financial field other Aspect, such as Economic situations analysis, stock are precisely recommended, investment combination management and automated transaction.Specific implementation is as follows:
One, stock comment data cleaning treatment can tentatively wash the stock that internet obtains by data cleansing and comment By the noise of data, comprising:
(1) deleting viewpoint polarity is neutral stock comment data.
(2) sequence data and stock comment data corresponding to stock comment sequence of the length less than 5 are deleted.
Fig. 2 is a stock comment data information schematic diagram, as shown in Fig. 2, a stock comment text includes stock comment person 201 (allan), time 202 (8days ago), viewpoint polarity 203 (BUY, Bullish), target stock 204 (IBM), comment The information such as content 205 (I think there is a support at 173.11).
Wherein, because viewpoint polarity be in immediately, be difficult to be automatically recognized, i.e., deletion viewpoint polarity is neutral stock comment Data needs manually go to screen." length comments on sequence less than 5 stock " refers to that same stock comment people comments same stock By number less than 5.
Fig. 3 is another representation schematic diagram of a stock comment data information, it can be seen from the figure that target stock Be classified as A-share, for quizmaster to whether sh60000 is bought, stock commentator Liu Anlin answers this, comment the time be 2016-12-29, viewpoint polarity are to be expected to rise, and include the polar content of viewpoint are as follows: share price encounters a year line support, it may be considered that it buys in, Viewpoint is for reference.
Fig. 4 is the stock comment data amount schematic diagram after original issue stock comment data amount and cleaning, which is new Unrestrained financial planner website.It can be seen from the figure that the quantity after cleaning greatly reduces, a large amount of stock comment datas is disposed and have made an uproar Sound, and then reduce the calculation amount of follow-up data processing.
Two, stock commentator viewpoint polarity and reliability distribution pattern are excavated, and can pass through stock commentator's historical stock Comment information excavates its stock comment polarity tendency and reliability distribution, comprising:
(1) polarity distribution is commented on by the stock that stock commentator's historical stock comment information counts stock commentator, I.e. publication is expected to rise and probability distribution expected to fall.Excavate stock commentator viewpoint polarity distribution information include it is a kind of in four kinds of modes or It is a variety of, simplified summary are as follows: one-to-one, one-to-many, many-one and multi-to-multi, specifically:
All historical stocks of same stock are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines the stock Ticket commentator issues the probability of stock comment data expected to fall for the stock;
All historical stocks of different stocks are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator publication is expected to rise, and determines stock commentator hair The probability of cloth stock comment data expected to fall;
All historical stocks of same stock are directed to based on the different stock commentators in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines that stock is commented The probability of stock comment data expected to fall is issued for the stock by member;
All historical stocks of different stocks are directed to based on the different stock commentators in acquired stock comment data Comment data determines the probability for the stock comment data that publication is expected to rise, and determines and issue the general of stock comment data expected to fall Rate.
(2) reliability point is commented on by the stock that stock commentator's historical stock comment information counts stock commentator Cloth, i.e. stock comment on reliable and unreliable probability distribution.
Three, the identical of views sexual norm of stock commentator excavates, and comments on sequence data by stock commentator's historical stock and digs Dig its identical of views property probability distribution, comprising:
(1) each adjacent stock comment data in sequence is commented on based on stock of the same stock commentator to same stock, Extract stock comment data pair, i.e. 2-gram data pair, the data are to include the polar stock comment data pair of viewpoint;
(2) the stock comment data pair based on extraction counts stock commentator and keeps the probability of viewpoint and change viewpoint Probability.
For example, same stock commentator is to each adjacent stock comment data in the stock comment sequence of same stock are as follows: Be expected to rise, be expected to fall, is expected to fall, be expected to rise, being expected to rise, be based on above-mentioned data, obtain the polar 2-gram data pair of viewpoint, be respectively as follows: be expected to rise, It is expected to fall;It is expected to fall, expected to fall;It is expected to fall, be expected to rise;It is expected to rise, is expected to rise.Based on above-mentioned 2-gram data pair, stock commentator guarantor is counted The probability of viewpoint is held, i.e., identical of views probability is 0.5, and the probability for changing viewpoint is 0.5.
Four, stock commentator viewpoint change pattern is excavated, and comments on Series Data Mining by stock commentator's historical stock Its viewpoint change pattern, comprising:
(1) each adjacent stock comment data in sequence is commented on based on stock of the same stock commentator to same stock, Stock comment data pair is extracted, i.e., viewpoint polarity and sight are extracted to the comment sequence data of same stock using stock commentator Point two kinds of 2-gram data pair of correctness;
(2) the stock comment data pair based on extraction determines that stock commentator changes sight under the premise of viewpoint is correct The probability TSRatio of point, and determine that stock commentator changes the probability FSRatio of viewpoint under the premise of viewpoint mistake, Under the premise of changing the probability TSRatio of viewpoint, viewpoint mistake under the premise of viewpoint is correct to statistics according to viewpoint polarity data Change the probability FSRatio of viewpoint;
(3) the stock comment data pair based on extraction determines that stock commentator keeps seeing under the premise of viewpoint is correct Point, and the correct probability TCTRatio of viewpoint kept, and determine that stock commentator changes under the premise of viewpoint is correct Viewpoint, and the correct probability TSTRatio of viewpoint changed, i.e., keep viewpoint to Statistics according to data under the premise of correct Reliability TCTRatio (i.e. stock commentator previous moment viewpoint is correct, subsequent time still maintains the viewpoint and correct), it sees Change the reliability TSTRatio of viewpoint under the premise of point is correct;
(4) the stock comment data pair based on extraction determines that stock commentator keeps seeing under the premise of viewpoint mistake Point, and the correct probability FCTRatio of viewpoint kept, and determine that stock commentator changes under the premise of viewpoint mistake Viewpoint, and the correct probability FSTRatio of viewpoint changed, i.e., according to data to holding viewpoint under the premise of Statistics mistake Reliability FCTRatio (i.e. stock commentator previous moment viewpoint mistake, subsequent time still maintain the viewpoint and correct), it sees Change the reliability FSTRatio of viewpoint under the premise of point mistake.
For example, same stock commentator is to each adjacent stock comment data in the stock comment sequence of same stock are as follows: Be expected to rise, be expected to fall, is expected to fall, be expected to rise, being expected to rise, be based on above-mentioned data, obtain the polar 2-gram data pair of viewpoint, be respectively as follows: be expected to rise, It is expected to fall;It is expected to fall, expected to fall;It is expected to fall, be expected to rise;It is expected to rise, is expected to rise, while obtaining the 2-gram data pair of viewpoint correctness, corresponding point Not are as follows: correct, correct;It is mistake, correct;Correctly, mistake;Correctly, correctly.
Changing the probability TSRatio of viewpoint under the premise of viewpoint is correct to statistics according to viewpoint polarity data is 0.5, viewpoint The probability FSRatio for changing viewpoint under the premise of mistake is 0;According to data the reliable of viewpoint is kept to Statistics under the premise of correct Property TCTRatio be 0.25, under the premise of viewpoint is correct change viewpoint reliability TSTRatio be 0.25;According to data to statistics It is 0.25 that the reliability FCTRatio of viewpoint is kept under the premise of viewpoint mistake, changes the reliability of viewpoint under the premise of viewpoint mistake FSTRatio is 0.
Five, stock comment viewpoint Check up polarity (o (ci)) utilizes the historical stock comment text data training FM mould of collection Type carries out the classification prediction of viewpoint polarity to stock comment data based on trained FM model, wherein FM model, that is, machine learning Model is a kind of existing algorithm model, but the present invention has carried out specially treated to it, is applied to stock viewpoint Check up polarity, It specifically includes:
(1) it obtains the training set being made of stock comment text and verifying collects, and every concentrated for training set and verifying Stock comment text marks viewpoint polarity, that is, determines training set, development set and test set stock comment text, wherein development set and Test set is similar, is referred to as verifying collection.Wherein, development set obtains most for optimizing in the training process to model parameter Excellent model, test set is for testing the effect of model after training;Viewpoint is polar to be labeled as manually marking, i.e., artificial mark Infuse the feeling polarities (being expected to rise or expected to fall) of every stock comment text in training set and test set.
(2) word segmentation processing being carried out to training set text, statistics obtains dictionary, for example, " I thinks that tomorrow, stock can rise ", it can Participle are as follows: " I ", " thinking ", " tomorrow ", " stock ", " meeting ", " rising ", the similar segmenting method, statistics obtain dictionary.
(3) it is based on the dictionary, determines the TF-IDF feature of every stock comment text in training set, this feature is dictionary The vector of size, each dimension are TF-IDF value of the corresponding words based on the text.
TF-IDF (term frequency-inverse document frequency) be it is a kind of for information retrieval with The common weighting technique of data mining.TF means word frequency (Term Frequency), and IDF means inverse document frequency (Inverse Document Frequency).The main thought of TFIDF is: if some word or phrase go out in an article Existing frequency is high, and seldom occurs in other articles, then it is assumed that and this word or phrase have good class discrimination ability, It is adapted to classify.TFIDF is actually: TF*IDF, TF word frequency (Term Frequency), the reverse document-frequency of IDF (Inverse Document Frequency).TF indicates the frequency that entry occurs in document d.The main thought of IDF is: such as Document of the fruit comprising entry t is fewer, that is, n smaller, and IDF is bigger, then illustrates that entry t has good class discrimination ability. If the number of files comprising entry t is m in certain a kind of document C, and the total number of documents that other classes include t is k, it is clear that all to include The number of files n=m+k of t, when m is big, n is also big, and the value of the IDF obtained according to IDF formula can be small, just illustrates entry t Class discrimination is indifferent.
In simple terms, the everyday words often occurred in some other documents occurred in training set, such as " ", " " Deng the important ratio of these words is lower, and the viewpoint polarity of " being expected to rise ", " expected to fall " etc occurred in stock comment text Word, importance are higher.TF-IDF is exactly the feature for evaluating the importance of each word in dictionary.
It is the vector of dictionary size about the TF-IDF feature, each dimension is TF- of the corresponding words based on the text The understanding of IDF value, for example, contain 1000 words in 100 sentences altogether, then the vector of each sentence is 1000 dimensions, For example the initial vector is [1,0,0 ... ... 1], wherein 1 represents target word and occurs in sentence, 0 represents target word in sentence Do not occur, 1 and 0 in initial vector will be obtained multiplied by the TF-IDF value of the stock comment text that is, multiplied by the weight of the word To the TF-IDF feature of stock comment text.
(4) feature is extracted from the stock comment text of training set, using the feature of extraction as the defeated of machine learning model Enter, using the viewpoint polarity classification information of stock comment text as the output of machine learning model;Training set stock is commented on For the TF-IDF feature of text as mode input feature, it is output that stock, which comments on feeling polarities, i.e. output is expected to rise or expected to fall, That is output 1 or 0.
(5) sight of the viewpoint polarity classification information of the output based on machine learning model and respective stock comment text mark Point-polarity, the loss of computing machine learning model, and the parameter based on calculated loss Learning machine learning model;It is based on Training set utilizes the method optimizing of cross validation using the stochastic gradient descent calligraphy learning FM model parameter of adaptive regularization The value of hyper parameter k in FM model is adjusted, wherein the value of hyper parameter k is artificial specified value.
(6) based on verifying collection, FM modelling effect is evaluated and tested, specifically: feature is extracted from the stock comment text of verifying collection, The feature of extraction is input in machine learning model, the viewpoint polarity of the stock comment text of machine learning model output is obtained Classification information;The viewpoint of viewpoint polarity classification information and respective stock the comment text mark of output based on machine learning model Polarity evaluates and tests the effect of machine learning model.
(7) (5), (6) and (7) are repeated, until FM effect meets the requirements (such as accuracy rate is greater than 95%), then completes FM model Training.
(8) it is based on trained FM model, the classification of viewpoint polarity is carried out to stock comment text, obtains o (ci) attribute.
(9) the reliability r (ci) of each stock comment is calculated according to (formula 1):
Wherein,The date is represented,ForStock price,ForSecond day stock valence Lattice,It is 0 or 1.
(10) corresponding structural data is generated for stock comment text, which includes: stock commentator mark Know, comment on time, target stock, viewpoint polarity and reliability index, is i.e. building stock comments on cell data ci={ d (ci), a (ci), s (ci), t (ci), o (ci), r (ci) }, wherein d (ci) is comment content, and a (ci) is stock commentator mark, s It (ci) is target stock, t (ci) is the comment time, and o (ci) is viewpoint polarity, and r (ci) is reliability index.
Six, stock comment information reliability scoring method, i.e., to the reliable of a certain stock comment information of some stock commentator Property marking.Extract key feature from stock comment sequence, share price sequence and stock comment person's historical behavior data, based on disaggregated model and when Between the integrated study frame of series analysis model give a mark to the reliability of stock comment information, specifically include:
(1) feature vector is extracted based on stock comment data collection and share price sequence sets, firstly, being based on stock comment data collection In at least partly stock comment data in each stock comment data, extract one of following feature or a variety of compositions One feature vector:
This trand ticket comment data be expected to rise or viewpoint polarity information expected to fall;On how to determine that this trand ticket comments on number According to be expected to rise or viewpoint polarity information expected to fall, be elaborated in step 5, details are not described herein.
In all stock comment datas for stock s that the t same day is issued, the stock comment data quantity be expected to rise is seen The stock comment data quantity fallen;
It is issued in the past first preset length time from t days, in all stock comment datas for stock s, Stock comment data quantity, stock comment data quantity expected to fall, the correct stock comment data quantity of viewpoint and the sight being expected to rise The stock comment data quantity of point mistake;
The price series of stock s from t days in the past second preset length time;
The stock s that machine learning model for predicting Stock Price is predicted is defeated in the price of next day of trade and the model Standard deviation out;
From t days in the past third preset length time, in all stock comment datas of stock commentator a publication, Stock comment data quantity, stock comment data quantity expected to fall, the correct stock comment data quantity of viewpoint and the sight being expected to rise The stock comment data quantity of point mistake;
From t days in the past 4th preset length time, the stock for stock s of stock commentator a publication is commented on In data, stock comment data quantity, the stock comment data quantity expected to fall, the correct stock comment data number of viewpoint be expected to rise The stock comment data quantity of amount and viewpoint mistake;
The stock comment sequence issued in the past 5th preset length time from t days based on stock commentator a is true Fixed, change the probability of viewpoint under the premise of the viewpoint change probability OSRatio based on stock commentator a, viewpoint are correct Viewpoint and holding are kept under the premise of the probability FSRatio of change viewpoint, viewpoint are correct under the premise of TSRatio, viewpoint mistake The correct probability TCTRatio of viewpoint, viewpoint it is correct under the premise of change viewpoint and change the correct probability of viewpoint The correct probability FCTRatio of viewpoint and viewpoint mistake of holding viewpoint and holding under the premise of TSTRatio, viewpoint mistake Under the premise of one of the correct probability FSTRatio of viewpoint or a variety of that changes viewpoint and change;
Wherein, the stock commentator of this trand ticket comment data is a, and comment is stock s, issue date t.
On how to the viewpoint polarity distribution information of determining stock commentator a, it has been elaborated in step 3, Details are not described herein.
For example, key feature is extracted from stock comment sequence, share price sequence and stock comment person's historical behavior data, the key Feature includes: viewpoint polarity, historical stock state, price timing and stock commentator's historical behavior.Wherein, viewpoint polarity is to work as Preceding comment be expected to rise or it is expected to fall;Historical stock state includes two kinds of situations: first is not consider the time, all needles of same day publication To stock comment data quantity, the stock comment data quantity expected to fall in the stock comment data of stock s, being expected to rise;Second is In stock comment in past 7 days in all stock comment datas for stock s, the stock comment data quantity be expected to rise is seen The stock comment data quantity of the correct stock comment data quantity of stock comment data quantity, viewpoint and viewpoint mistake fallen; Price timing includes: the price series of stock s and the second day price predicted with arma modeling and defeated in 25 days in the past Standard deviation out;Stock comment person's historical behavior include: some stock comment person a made within past 7/30/90 day be expected to rise/it is expected to fall/just Really/mistake stock number of reviews;Some stock comment person to current stock made within past 7/30/90 day be expected to rise/it is expected to fall/just Really/mistake stock number of reviews;It is determined based on some stock comment person a in the stock comment sequence of publication in the past 7/30/90 day One of OSRatio, TSRatio, FSRatio, TCTRatio, TSTRatio or a variety of.
(2) the support vector machines model of Radial basis kernel function (formula 2) is based on using the training of extracted feature vector:
Enable Radial basis kernel function are as follows:
Wherein, x1And x2It is two feature vectors, variable can also be become;γ is the parameter of Radial basis kernel function, is generally set Be set to 1 divided by feature sum, such as 10000 features, then r is set to 0.0001;φ () maps primitive character To higher-dimension kernel spacing, in order to carry out the calculating of optimizing decision hyperplane (formula 3);
SVM model is
The principle of SVM is to solve for correctly dividing the maximum separating hyperplance of training dataset and geometry interval.It is defeated Entering is some feature samples points, and model can determine two things: 1, all data in one hyperplane of study, this hyperplane Point is ideally divided into two classes, and the output of the first kind is 1 (corresponding reliable stock comment), and the output of the second class is 0 (corresponding not reliable stock Comment) 2, all data points it is more remoter better with a distance from hyperplane.
If it (is all linearly can not in most cases that feature samples point, which is linearly inseparable in original space, Point), then it is desirable that he is mapped in higher dimensional space the mapping for making problem become linear separability, using by a kind of mapping It is exactly kernel function.
(3) pass through optimization (formula 4) calculating parameter ω and b:
s.t.yiTφ(ci)+b)≥1-ξi,
ξi≤ 0, i=1 ..., N, (formula 4)
Wherein C is the tradeoff parameter of noise and simplified Hyperplane classification in training sample, yiIt is whether stock comments on viewpoint Correct label.These three parameters of ω, b, ξ are all the parameters for needing model training to learn, and wherein ω and b is SVM model Two parameters being used in prediction;S.t. represent it is subsequent be front constraint condition, i.e., after two rows be the first row target The constraint condition of function.yiIt is the boundary of objective function, this boundary will be the bigger the better.
(4) training of share price sequence sets is utilized to be used for the machine learning model of predicting Stock Price, such as arma modeling, comprising:
A. training set and test set stock price sequence data are determined, input data is continuous several stock price data, Output is stock price data one day after;It is determined as the stock price sequence data of model training collection and test set, wherein instructing The each data practiced in collection or test set include: for the continuous several days stock price datas of input model and conduct The stock price data one day after of label;
B. based on training set training arma modeling, and collect the prediction effect for verifying model based on verifying;I.e. based on training set, Using maximal possibility estimation training arma modeling parameter, tuning is carried out to parameter p and q based on BIC criterion, based on trained Arma modeling lasts the share price of share price data prediction one day after using certain stock, verifies the prediction effect based on verifying collection.
Generally speaking, based on the Forecasting of Stock Prices of Time Series Analysis Model, stock historical price sequence, training ARMA are utilized Model, the price based on trained arma modeling prediction stock one day after.
(5) SVM model and the machine learning model for predicting Stock Price are integrated, is obtained for evaluating stock comment reliability Disaggregated model;I.e. based on Forecasting of Stock Prices result building classification equation, such as following formula 5:
Wherein,It isThe share price of time,It isThe predicted value of second day share price,It is stock Comment viewpoint feeling polarities, err (ci) be share price sequence data standard deviation, i.e. the error of Forecasting of Stock Prices value that currently exports of model Confidence value in other words.
(6) SVM model and arma modeling are integrated, final classification function is obtained, such as following formula 6:
h(ci) be 1 when, indicate stock comment it is reliable;h(ci) be -1 when, indicate stock comment it is unreliable.WhereinCalculation formula is such as Following formula 7:
[0,1] u ∈ in formula 7 is the weighting coefficient of SVM and arma modeling prediction result, is determined by experiment u=0.59 effect Fruit is best.
Stock comment reliability classification exact value 8 can be calculated according to the following formula:
As r υ (ci) it is higher when, it is more reliable to stock comment classification results.(formula 8) is the absolute value of the output result of (formula 7).
Seven, the probability calculation that stock rises or falls passes through the correlated characteristic and measurement extracted during stock comment degree of reiability As a result, calculating the probability that stock rises or falls, comprising:
(1) according to the following formula 9 calculate the branch stock ups and downs probability cf (sj)::
Wherein,Indicate stock comment data collectionIn stock comment data quantity, i.e., all stock number of reviews Summation, ciIndicate a stock comment data,For the viewpoint polarity of this trand ticket comment data,For this trand ticket The reliability index of comment data, r υ (ci) it is the exact value that reliability classification is carried out to this trand ticket comment data.
(2) 10 advance versus decline is predicted according to the following formula:
(3) probability that 11 calculating stocks rise or fall according to the following formula:
w(sj)=| cf (sj) | (formula 11)
As cf (sjWhen) >=0, w (sj) value it is bigger, the probability for illustrating that stock rises is larger, as cf (sj) < 0 when, w (sj) Value is bigger, and the probability for illustrating that stock falls is larger.
Eight, stock comment reliability model is completed, when specified viewpoint information inquiry request of the reception about stock commentator Export result data corresponding with the inquiry request.
Nine, the equity investment based on stock comment reliability model measurement, is based on the reliable stock of stock comment data reliability model discrimination It comments, and is invested according to this, comprising:
(1) the probability w (s that stock rises or falls is calculated to stocks all in stock pondj), wherein sjFor single stock;
(2) a variety of intelligent share-selecting methods:
A. the stock for choosing the highest predetermined number of probability for rising and rising carries out suggestion for investment, and capital authority reselection procedure is average The mode of weighting;That is the highest K stock of screening amount of increase index is as suggestion for investment, and capital authority reselection procedure average weighted Mode, i.e. every stock average investment G/K member, wherein G is the gross investment amount of money;
B. choose and rise and the stock of the highest predetermined number of probability that rises carries out suggestion for investment, and capital authority reselection procedure according to The mode of the probability weight to rise;That is K highest stock of screening amount of increase index is as suggestion for investment, and capital authority reselection procedure is pressed According to the mode of amount of increase exponential weighting, i.e. stock sjInvestmentMember
C. the highest stock of probability for rising and rising is chosen from each Stock block, and capital authority reselection procedure averagely adds The mode of power;The highest stock of amount of increase index is selected in i.e. each column as suggestion for investment, altogether M (M=10) a plate Block (see the table below 1), and the mode of capital authority reselection procedure average weighted, i.e. every equity investment G/M member.
Table 1:Sectors of stock symbols
Table 1 is stock column information, and Category represents column name, and #Covered Symbols represents the number of share of stock in column Mesh.
D. the highest stock of probability for rising and rising is chosen from each Stock block, and capital authority reselection procedure is according to rising Probability weight mode;The highest stock of amount of increase index is selected in i.e. each column as suggestion for investment, altogether M (M= 10) a plate, and the mode of capital authority reselection procedure average weighted, i.e. every stock sjInvestment
E. the one or more highest stock of the probability for rising and rising is chosen from each Stock block, is selected between each plate Average weighted mode is selected, is selected in the way of the probability weight to rise between the stock of each plate of selection;It is i.e. above-mentioned to select stocks Then the combination of method, such as respectively select the highest stock of Km amount of increase from each column first with average weighted or is pressed According to the mode of amount of increase exponential weighting, each stock is invested.It wherein can also be according to average to the gross investment of each column It weights or in the way of amount of increase exponential weighting.
Fig. 5 is using the profit situation schematic diagram after intelligence share-selecting method c selection stock, in January, 2016 to 2016 It selects December intelligence share-selecting method c to carry out simulation investment, each day of trade chooses K equity investment, get a profit situation such as Fig. 5 institute Show, invests 10000 yuan, K=M, every stock 10000/M altogether.
Fig. 6 shows a kind of data mining device signal based on stock comment data according to an embodiment of the invention Figure, the device 60 include:
Acquiring unit 601 is suitable for obtaining stock comment data;Wherein, a stock comment data refers to that single stock is commented By member to the single comment data of single stock, comprising: stock commentator mark, target stock, includes viewpoint pole at the comment time The content of property;
Unit 602 is excavated, suitable for excavating the viewpoint polarity distribution of stock commentator based on acquired stock comment data Information;And suitable for excavating the viewpoint reliability distributed intelligence of stock commentator based on acquired stock comment data.
By obtaining stock comment data, based on acquired stock comment data, the viewpoint pole of stock commentator is excavated Property distributed intelligence;And based on acquired stock comment data, excavate the viewpoint reliability distributed intelligence of stock commentator. The present invention has merged a variety of Heterogeneous Information Sources, such as stock price timing, stock comment content of text and delivers the stock of stock comment and comment By the historical behavior of member, it is based on the multi-source heterogeneous big data, key feature is analysed in depth and extracted by data mining technology, benefit Stock comment degree of reiability is carried out with these features, to choose good quality stock from massive information, investor can be helped more Add and accurately understand the general trend of market development and stock dynamic, is used for investor or quant.
In one embodiment of the invention, unit 602 is excavated, is adapted for carrying out one of following or a variety of:
All historical stocks of same stock are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines the stock Ticket commentator issues the probability of stock comment data expected to fall for the stock;
All historical stocks of different stocks are directed to based on the same stock commentator in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator publication is expected to rise, and determines stock commentator hair The probability of cloth stock comment data expected to fall;
All historical stocks of same stock are directed to based on the different stock commentators in acquired stock comment data Comment data, determines the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determines that stock is commented The probability of stock comment data expected to fall is issued for the stock by member;
All historical stocks of different stocks are directed to based on the different stock commentators in acquired stock comment data Comment data determines the probability for the stock comment data that publication is expected to rise, and determines and issue the general of stock comment data expected to fall Rate.
In one embodiment of the invention, unit 602 is excavated, suitable for the price timing information according to different stocks, really The same stock commentator in fixed acquired stock comment data is directed to all historical stock comment datas of different stocks In each stock comment data correctness;And according to the correct stock comment data quantity of a stock commentator With the stock comment data quantity of mistake, the viewpoint reliability distribution of stock commentator is determined.
In one embodiment of the invention, unit is excavated, is further adapted for based on same stock commentator to one Each adjacent stock comment data in the stock comment sequence of ticket, extracts stock comment data pair;And suitable for the stock comment based on extraction Data pair count stock commentator and keep the probability of viewpoint and change the probability of viewpoint.
In one embodiment of the invention, unit 602 is excavated, is further adapted for based on same stock commentator to same Each adjacent stock comment data in the stock comment sequence of stock, extracts stock comment data pair;And suitable for the stock based on extraction Data pair are commented, determine that stock commentator changes the probability TSRatio of viewpoint under the premise of viewpoint is correct, and determine the stock Ticket commentator changes the probability FSRatio of viewpoint under the premise of viewpoint mistake.
In one embodiment of the invention, unit 602 is excavated, is further adapted for based on same stock commentator to same Each adjacent stock comment data in the stock comment sequence of stock, extracts stock comment data pair;Suitable for the stock comment number based on extraction According to right, determine that stock commentator keeps viewpoint, and the viewpoint correct probability kept under the premise of viewpoint is correct TCTRatio, and determine that stock commentator changes viewpoint under the premise of viewpoint is correct, and the viewpoint changed is correctly general Rate TSTRatio;And suitable for the stock comment data pair based on extraction, determine that stock commentator protects under the premise of viewpoint mistake Viewpoint, and the correct probability FCTRatio of viewpoint kept are held, and determines the stock commentator under the premise of viewpoint mistake Change viewpoint, and the correct probability FSTRatio of viewpoint changed.
Fig. 7 shows another data mining device signal based on stock comment data of one embodiment of the invention Figure, which includes: acquiring unit 601;Excavate unit 602;Data cleansing unit 701;Query processing unit 702.Wherein Acquiring unit 601 and excavation unit 602 have been elaborated in the embodiment shown in fig. 6, and details are not described herein.
Data cleansing unit 701 is neutral stock suitable for deleting viewpoint polarity from acquired stock comment data Comment data;And/or sequence is commented on less than the stock of preset threshold suitable for deleting length from from acquired stock comment data The corresponding stock comment data of column;
Wherein, stock comment sequence refers to the stock that same stock commentator comments on same stock in different time The combination of comment data.
Query processing unit 702, suitable for receiving the specified viewpoint information inquiry request about stock commentator;And output Result data corresponding with the inquiry request.
In conclusion, based on acquired stock comment data, excavating stock commentator by obtaining stock comment data Viewpoint polarity distribution information;And based on acquired stock comment data, excavate the viewpoint reliability point of stock commentator Cloth information.The present invention has merged a variety of Heterogeneous Information Sources, such as stock price timing, stock comment content of text and delivers stock comment The historical behavior of stock commentator is based on the multi-source heterogeneous big data, analyses in depth by data mining technology and extract key Feature carries out stock comment degree of reiability using these features, can effectively cross noise filtering, filter out from massive information valuable Value, reliable stock comment information, choose good quality stock, investor can be helped more accurately to understand the general trend of market development and stock Dynamically, it is used for investor or quant.This method can be applied not only to the analysis of stock comment information reliability, apply also for Other aspects of financial field, such as Economic situations analysis, stock are precisely recommended, investment combination management and automated transaction.
It should be understood that
Algorithm and display be not inherently related to any certain computer, virtual bench or other equipment provided herein. Various fexible units can also be used together with teachings based herein.As described above, it constructs required by this kind of device Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit requires, abstract and attached drawing) disclosed in each feature can be by providing identical, equivalent, or similar purpose alternative features come generation It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize the typing dress of taking pictures of word content according to an embodiment of the present invention It sets, some or all functions of some or all components in electronic equipment and computer readable storage medium.The present invention Be also implemented as executing method as described herein some or all device or device program (for example, Computer program and computer program product).It is such to realize that program of the invention can store on a computer-readable medium, Or it may be in the form of one or more signals.Such signal can be downloaded from an internet website to obtain, or It is provided on the carrier signal, or is provided in any other form.
For example, Fig. 8 is the structural schematic diagram of the electronic equipment in the embodiment of the present invention.The electronic equipment 800 includes: processing Device 810, and it is stored with the memory 820 for the computer program that can be run on the processor 810.Processor 810, is used for Each step of method in the present invention is executed when executing the computer program in the memory 820.Memory 820 can be all Such as the electronic memory of flash memory, EEPROM (electrically erasable programmable read-only memory), EPROM, hard disk or ROM etc.It deposits Reservoir 820 has the memory space 830 stored for executing the computer program 531 of any method and step in the above method. Computer program 831 can read or be written to this one or more meter from one or more computer program product In calculation machine program product.These computer program products include such as hard disk, compact-disc (CD), storage card or floppy disk etc Program code carrier.Such computer program product is usually computer readable storage medium described in such as Fig. 9.
Fig. 9 is the structural schematic diagram of one of embodiment of the present invention computer readable storage medium.This is computer-readable Storage medium 900 is stored with the computer program 831 for executing steps of a method in accordance with the invention, can be by electronic equipment 800 processor 810 is read, and when computer program 831 is run by electronic equipment 800, the electronic equipment 800 is caused to execute Each step in method described in face, specifically, the calculation procedure 831 of the computer-readable recording medium storage can be with Execute method shown in any of the above-described embodiment.Computer program 831 can be compressed in a suitable form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (10)

1. a kind of data digging method based on stock comment data, wherein this method comprises:
Obtain stock comment data;Wherein, a stock comment data refers to single stock commentator to the single of single stock Comment data;
Based on acquired stock comment data, the viewpoint polarity distribution information of stock commentator is excavated;
And based on acquired stock comment data, excavate the viewpoint reliability distributed intelligence of stock commentator.
2. the method for claim 1, wherein this method further includes stock after the step of obtaining stock comment data The step of comment data is cleaned, specifically includes:
Deleting viewpoint polarity is neutral stock comment data;
And/or
It deletes length and is less than stock comment data corresponding to the stock comment sequence of preset threshold;Wherein, stock comments on sequence Refer to the combination for the stock comment data that same stock commentator comments on same stock in different time.
3. the method for claim 1, wherein a stock comment data includes:
Stock commentator mark, target stock, includes the polar content of viewpoint at the comment time.
4. it is the method for claim 1, wherein described based on acquired stock comment data, excavate stock commentator Viewpoint polarity distribution information include one of following or a variety of:
It is commented on based on the same stock commentator in acquired stock comment data for all historical stocks of same stock Data, determine the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determine that the stock is commented The probability of stock comment data expected to fall is issued for the stock by member;
It is commented on based on the same stock commentator in acquired stock comment data for all historical stocks of different stocks Data, determine the probability for the stock comment data that stock commentator publication is expected to rise, and determine that stock commentator publication is seen The probability for the stock comment data fallen;
It is commented on based on the different stock commentators in acquired stock comment data for all historical stocks of same stock Data, determine the probability for the stock comment data that stock commentator is expected to rise for stock publication, and determine stock commentator The probability of stock comment data expected to fall is issued for the stock;
It is commented on based on the different stock commentators in acquired stock comment data for all historical stocks of different stocks Data determine the probability for the stock comment data that publication is expected to rise, and determine the probability for issuing stock comment data expected to fall.
5. it is the method for claim 1, wherein described based on acquired stock comment data, excavate stock commentator Viewpoint reliability distributed intelligence include:
According to the price timing information of different stocks, the same stock commentator needle in acquired stock comment data is determined To the correctness of each stock comment data in all historical stock comment datas of different stocks;
According to the stock comment data quantity of the correct stock comment data quantity of a stock commentator and mistake, determining should The correct probability of the viewpoint of stock commentator.
6. the method for claim 1, wherein this method further comprises:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, stock comment is extracted Data pair;
Stock comment data pair based on extraction count stock commentator and keep the probability of viewpoint and change the probability of viewpoint.
7. the method for claim 1, wherein this method further comprises:
Based on same stock commentator to each adjacent stock comment data in the stock comment sequence of same stock, stock comment is extracted Data pair;
Stock comment data pair based on extraction determine that stock commentator changes the probability of viewpoint under the premise of viewpoint is correct TSRatio, and determine that stock commentator changes the probability FSRatio of viewpoint under the premise of viewpoint mistake.
8. a kind of data mining device based on stock comment data, wherein the device includes:
Acquiring unit is suitable for obtaining stock comment data;Wherein, a stock comment data refers to single stock commentator to list The single comment data of a stock;
Unit is excavated, suitable for excavating the viewpoint polarity distribution information of stock commentator based on acquired stock comment data;With And suitable for excavating the viewpoint reliability distributed intelligence of stock commentator based on acquired stock comment data.
9. a kind of electronic equipment, which is characterized in that the electronic equipment includes: processor, and being stored with can be on a processor The memory of the computer program of operation;
Wherein, the processor, for when executing the computer program in the memory perform claim require it is any in 1-7 Method described in.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Processor realizes method of any of claims 1-7 when executing.
CN201810942719.5A 2018-08-17 2018-08-17 Data digging method and device based on stock comment data Pending CN109300031A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810942719.5A CN109300031A (en) 2018-08-17 2018-08-17 Data digging method and device based on stock comment data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810942719.5A CN109300031A (en) 2018-08-17 2018-08-17 Data digging method and device based on stock comment data

Publications (1)

Publication Number Publication Date
CN109300031A true CN109300031A (en) 2019-02-01

Family

ID=65165256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810942719.5A Pending CN109300031A (en) 2018-08-17 2018-08-17 Data digging method and device based on stock comment data

Country Status (1)

Country Link
CN (1) CN109300031A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400225A (en) * 2019-07-29 2019-11-01 北京北信源软件股份有限公司 A kind of market value of stock management method
CN115168646A (en) * 2022-05-19 2022-10-11 深圳格隆汇信息科技有限公司 Historical video analysis method and system of financial anchor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400225A (en) * 2019-07-29 2019-11-01 北京北信源软件股份有限公司 A kind of market value of stock management method
CN115168646A (en) * 2022-05-19 2022-10-11 深圳格隆汇信息科技有限公司 Historical video analysis method and system of financial anchor

Similar Documents

Publication Publication Date Title
McNabb et al. Survey of Surveys (SoS)‐mapping the landscape of survey papers in information visualization
CN101408886B (en) Selecting tags for a document by analyzing paragraphs of the document
US9411892B2 (en) System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
CN109299252A (en) The viewpoint polarity classification method and device of stock comment based on machine learning
CN101408887B (en) Recommending terms to specify body space
CN107301171A (en) A kind of text emotion analysis method and system learnt based on sentiment dictionary
CN107291688A (en) Judgement document&#39;s similarity analysis method based on topic model
US10083263B2 (en) Automatic modeling farmer
CN105512285B (en) Adaptive network reptile method based on machine learning
CN109035025A (en) The method and apparatus for evaluating stock comment reliability
CN109684627A (en) A kind of file classification method and device
WO2007078814A2 (en) Apparatus and method for strategy map validation and visualization
CN102194013A (en) Domain-knowledge-based short text classification method and text classification system
CN105550216A (en) Searching method and device of academic research information and excavating method and device of academic research information
CN110008309A (en) A kind of short phrase picking method and device
CN113256383B (en) Recommendation method and device for insurance products, electronic equipment and storage medium
CN112215696A (en) Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
CN110276382A (en) Listener clustering method, apparatus and medium based on spectral clustering
CN109300031A (en) Data digging method and device based on stock comment data
CN108228546A (en) A kind of text feature, device, equipment and readable storage medium storing program for executing
CN107977454A (en) The method, apparatus and computer-readable recording medium of bilingual corpora cleaning
CN109300030A (en) Realize the method and apparatus that equity investment is recommended
CN107908649A (en) A kind of control method of text classification
Fritsche et al. Deciphering professional forecasters' stories: Analyzing a corpus of textual predictions for the German economy
Klosterman Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190201

WD01 Invention patent application deemed withdrawn after publication