CN108764981A - Server, the processing method of text data and storage medium - Google Patents

Server, the processing method of text data and storage medium Download PDF

Info

Publication number
CN108764981A
CN108764981A CN201810469419.XA CN201810469419A CN108764981A CN 108764981 A CN108764981 A CN 108764981A CN 201810469419 A CN201810469419 A CN 201810469419A CN 108764981 A CN108764981 A CN 108764981A
Authority
CN
China
Prior art keywords
text data
market
personal share
time point
share entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810469419.XA
Other languages
Chinese (zh)
Inventor
李海疆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810469419.XA priority Critical patent/CN108764981A/en
Priority to PCT/CN2018/102135 priority patent/WO2019218517A1/en
Publication of CN108764981A publication Critical patent/CN108764981A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls

Abstract

The present invention relates to a kind of server, the processing method of text data and storage medium, this method to include:Various financial text datas are divided into corresponding text object type;The financial text data for analyzing each text object type of each personal share entity under each time point obtains the opinion rating of each financial text data;The quantity of each opinion rating of financial text data under each text object type is counted, the quantity based on each opinion rating after statistics calculates the proportion of each opinion rating;The attribute score value for obtaining each opinion rating calculates the market assessment index of the personal share entity at the time point according to the proportion of the corresponding attribute score value of each opinion rating and each opinion rating;Market assessment index of the personal share entity at each time point is obtained, the market assessment index at each time point is generated into the corresponding market assessment exponential sequence of the personal share entity according to chronological order.The present invention can fully excavate financial text data and obtain accurate market information.

Description

Server, the processing method of text data and storage medium
Technical field
The present invention relates to data analysis technique field more particularly to a kind of server, text data processing method and deposit Storage media.
Background technology
Currently, on each time cross-section, all there are various text datas in each listed company, such as achievement forecast, Financing report, analyst's prediction, company governance etc., the typically only single text of simple analysis obtains accordingly in the prior art Market assessment, however, due to being unable to fully comprising a large amount of market information, the single text of simple analysis in these text datas Excavation obtains accurate market information, can not effectively be instructed company or industry, therefore is carried out to these text datas It fully excavates to obtain accurate market information, becoming has technical problem to be solved.
Invention content
The purpose of the present invention is to provide a kind of server, the processing method of text data and storage mediums, it is intended to fully It excavates financial text data and obtains accurate market information.
To achieve the above object, the present invention provides a kind of server, the server include memory and with the storage The processor of device connection, is stored with the processing system that can be run on the processor, the processing system in the memory Following steps are realized when being executed by the processor:
Various financial text datas are divided into corresponding text object type according to preset classifying rules, wherein text Object type includes achievement type, financing type, company governance type, analyst's type and other types;
Each text object type of each personal share entity under each time point is analyzed using scheduled text analyzing method Financial text data, obtain the corresponding opinion rating of each financial text data;
The quantity of each opinion rating of financial text data under each text object type is counted, after statistics The quantity of each opinion rating calculate the proportion of each opinion rating;
The corresponding attribute score value of each opinion rating is obtained, according to the corresponding attribute score value of each opinion rating and each opinion rating Proportion calculate the market assessment index of the personal share entity at the time point;
Market assessment index of the personal share entity at each time point is obtained, the market assessment index at each time point is pressed The corresponding market assessment exponential sequence of the personal share entity is generated according to chronological order.
Preferably, the opinion rating includes the first estate, the second grade and the tertiary gradient, the attribute of described the first estate Score value is 1, and the attribute score value of second grade is 0, and the attribute score value of the tertiary gradient is -1, the market assessment index =100* [the proportion * (- 1) of the proportion * 0+ tertiary gradient of proportion * second grades of 1+ of the first estate].
Preferably, described to analyze each text of each personal share entity under each time point using scheduled text analyzing method The financial text data of this object type, specifically includes the step of obtaining each financial text data corresponding opinion rating:
Each financial text data is segmented using scheduled participle model, each financial text data is obtained and corresponds to Participle;
Each corresponding segment of financial text data is input to scheduled transformation model, obtains each finance text of output The corresponding term vector of notebook data;
The corresponding term vector of each finance text data is input in scheduled sentiment analysis model, being somebody's turn to do for output is obtained The sentiment analysis result of each sentence in financial text data;
The sentiment analysis of each sentence in the finance text data is counted as a result, being obtained according to the sentiment analysis result counted The corresponding opinion rating of finance text data.
Preferably, when the processing system is executed by the processor, following steps are also realized:
According to scheduled trade classification method by each personal share entity point to corresponding category of employment, it is real to obtain each personal share The newest total market capitalisation of body calculates the corresponding total market capitalisation of industry-by-industry classification according to the newest total market capitalisation of each personal share entity;
The corresponding total market capitalisation meter of category of employment belonging to the newest total market capitalisation of each personal share entity and the personal share entity Calculate the market value proportion of the personal share entity;
The personal share entity is calculated at this according to personal share entity market assessment index at the time point and the market value proportion The industry evaluation index at time point;
Industry evaluation index of the personal share entity at each time point is obtained, the industry evaluation index at each time point is pressed The corresponding industry evaluation exponential sequence of the personal share entity is generated according to chronological order.
To achieve the above object, the present invention also provides a kind of processing method of text data, the processing of the text data Method includes:
Various financial text datas are divided into corresponding text object type, wherein text by S1 according to preset classifying rules This object type includes achievement type, financing type, company governance type, analyst's type and other types;
S2 analyzes each text object class of each personal share entity under each time point using scheduled text analyzing method The financial text data of type obtains the corresponding opinion rating of each financial text data;
S3 counts the quantity of each opinion rating of the financial text data under each text object type, based on system The quantity of each opinion rating after meter calculates the proportion of each opinion rating;
S4 obtains the corresponding attribute score value of each opinion rating, according to the corresponding attribute score value of each opinion rating and each evaluation The proportion of grade calculates the market assessment index of the personal share entity at the time point;
S5 obtains the personal share entity in the market assessment index at each time point, the market assessment at each time point is referred to Number generates the corresponding market assessment exponential sequence of the personal share entity according to chronological order.
Preferably, the opinion rating includes the first estate, the second grade and the tertiary gradient, the attribute of described the first estate Score value is 1, and the attribute score value of second grade is 0, and the attribute score value of the tertiary gradient is -1, the market assessment index =100* [the proportion * (- 1) of the proportion * 0+ tertiary gradient of proportion * second grades of 1+ of the first estate].
Preferably, the step S2, specifically includes:
Each financial text data is segmented using scheduled participle model, each financial text data is obtained and corresponds to Participle;
Each corresponding segment of financial text data is input to scheduled transformation model, obtains each finance text of output The corresponding term vector of notebook data;
The corresponding term vector of each finance text data is input in scheduled sentiment analysis model, being somebody's turn to do for output is obtained The sentiment analysis result of each sentence in financial text data;
The sentiment analysis of each sentence in the finance text data is counted as a result, being obtained according to the sentiment analysis result counted The corresponding opinion rating of finance text data.
Preferably, the processing method of this article notebook data further includes:
According to scheduled trade classification method by each personal share entity point to corresponding category of employment, it is real to obtain each personal share The newest total market capitalisation of body calculates the corresponding total market capitalisation of industry-by-industry classification according to the newest total market capitalisation of each personal share entity;
The corresponding total market capitalisation meter of category of employment belonging to the newest total market capitalisation of each personal share entity and the personal share entity Calculate the market value proportion of the personal share entity;
After the step S4, further include:
The personal share entity is calculated at this according to personal share entity market assessment index at the time point and the market value proportion The industry evaluation index at time point;
Industry evaluation index of the personal share entity at each time point is obtained, the industry evaluation index at each time point is pressed The corresponding industry evaluation exponential sequence of the personal share entity is generated according to chronological order.
Preferably, the processing method of this article notebook data further includes:
Industry evaluation index by the personal share entity for belonging to same industry classification at same time point is added to obtain the sector The market index of classification at the time point;
Market index of the sector classification at each time point is obtained, by the market index at each time point according to time elder generation After be sequentially generated the corresponding market index sequence of the sector classification.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium The step of system, the processing system realizes the processing method of above-mentioned text data when being executed by processor.
The beneficial effects of the invention are as follows:The present invention to each personal share entity each time cross-section different text object classes The financial text data of type is analyzed using scheduled text analyzing method, obtains the evaluation of each financial text data, right The quantity of each opinion rating of financial text data under each text object type is counted and calculates the ratio of each opinion rating Weight, the market assessment index of the personal share entity at the time point is calculated according to the attribute score value and proportion of each opinion rating, according to The market assessment index can obtain the evaluation of market at the time point to the said firm, and the present invention is to financial text data to be divided into Different text object type and in such a way that scheduled text analyzing method is analyzed, it is accurate fully to excavate to obtain Market information, sequentially in time generate market assessment exponential sequence, it can be deduced that variation of the market to the evaluation of the said firm And trend, for carrying out market analysis
Description of the drawings
Fig. 1 is the schematic diagram of the hardware structure of one embodiment of server of the present invention;
Fig. 2 is the flow diagram of the processing method first embodiment of text data of the present invention;
Fig. 3 is the refinement flow diagram of step S2 shown in Fig. 2;
Fig. 4 is the flow diagram of the processing method second embodiment of text data of the present invention.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work The every other embodiment obtained is put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot It is interpreted as indicating or implying its relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical solution Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims Protection domain within.
As shown in fig.1, being the schematic diagram of the hardware structure of one embodiment of server of the present invention, server 1 is a kind of energy Enough according to the instruction for being previously set or storing, the automatic equipment for carrying out numerical computations and/or information processing.The server 1 Can be computer, can also be single network server, multiple network servers composition server group or be based on cloud meter The cloud being made of a large amount of hosts or network server calculated, wherein cloud computing is one kind of Distributed Calculation, loose by a group One super virtual computer of the computer collection composition of coupling.
In the present embodiment, server 1 may include, but be not limited only to, and depositing for connection can be in communication with each other by system bus Reservoir 11, processor 12, network interface 13, memory 11 are stored with the processing system that can be run on the processor 12.It needs to refer to Go out, Fig. 1 illustrates only the server 1 with component 11-13, it should be understood that being not required for implementing all show Component, the implementation that can be substituted is more or less component.
Wherein, memory 11 includes memory and the readable storage medium storing program for executing of at least one type.Inside save as the operation of server 1 Caching is provided;Readable storage medium storing program for executing can be if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memories Deng), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electric erasable can compile Journey read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile Storage medium.In some embodiments, readable storage medium storing program for executing can be the internal storage unit of server 1, such as the server 1 Hard disk;In further embodiments, which can also be the External memory equipment of server 1, such as The plug-in type hard disk being equipped on server 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11 is commonly used in Storage is installed on the processing system in the operating system and types of applications software of server 1, such as storage one embodiment of the invention Program code etc..In addition, memory 11 can be also used for temporarily storing the Various types of data that has exported or will export.
The processor 12 can be in some embodiments central processing unit (Central ProcessingUnit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is commonly used in the control clothes The overall operation of business device 1, such as execute and carry out data interaction with the other equipment or communicate relevant control and processing etc.. In the present embodiment, the processor 12 is used to run the program code stored in the memory 11 or processing data, such as Run processing system etc..
The network interface 13 may include radio network interface or wired network interface, which is commonly used in Communication connection is established between the server 1 and other electronic equipments.
The processing system is stored in memory 11, including it is at least one be stored in it is computer-readable in memory 11 Instruction, at least one computer-readable instruction can be executed by processor device 12, the method to realize each embodiment of the application;With And the function that at least one computer-readable instruction is realized according to its each section is different, can be divided into different logic moulds Block.
In one embodiment, following steps are realized when above-mentioned processing system is executed by the processor 12:
Various financial text datas are divided into corresponding text object type according to preset classifying rules, wherein text Object type includes achievement type, financing type, company governance type, analyst's type and other types;
Wherein, preset classifying rules is that the text object type of the relevant financial text data of achievement is classified as achievement class The text object type for relevant financial text data of financing is classified as financing type, the relevant finance of company governance is literary by type The text object type of notebook data is classified as company governance type, by the text object type of the relevant financial text data of analyst It is classified as analyst's type, the text object type of the financial text data other than above-mentioned 4 type is classified as other types, such as Shown in the following table 1:
Table 1
Wherein, achievement type includes performance report, achievement bulletin, the super expected report of achievement, and financing type includes:Orientation increases Hair, orientation break, and company governance type includes:Senior executive's increase and decrease is held, big shareholder reduces, Ownership Incentive, employee stock ownership, analyst's class Type includes that prediction of profit substantially increases, analyst pays close attention to suddenly, and other types include:Height, which is sent, to be turned, is the adjustment of index constituent stock, relatively early It discloses annual report, go out bulletin for a long time.
Each text object type of each personal share entity under each time point is analyzed using scheduled text analyzing method Financial text data, obtain the corresponding opinion rating of each financial text data;
Wherein, personal share entity is listed company, and each personal share entity will produce some financial textual datas in Each point in time According to time point can be per minute, each hour, every day etc..
In one embodiment, described to analyze each personal share entity under each time point using scheduled text analyzing method Each text object type financial text data, it is the step of obtaining each financial text data corresponding opinion rating, specific to wrap It includes:
Each financial text data is segmented using scheduled participle model, each financial text data is obtained and corresponds to Participle;Each corresponding segment of financial text data is input to scheduled transformation model, obtains each finance text of output The corresponding term vector of notebook data;The corresponding term vector of each finance text data is input in scheduled sentiment analysis model, Obtain the sentiment analysis result of each sentence in the finance text data of output;Count each sentence in the finance text data Sentiment analysis according to the sentiment analysis result counted as a result, obtain the corresponding opinion rating of finance text data.
Wherein, by one trained participle model the text of financial text data segmented, this point Word model is trained neural network participle model, it is therefore preferable to which shot and long term remembers Recognition with Recurrent Neural Network.The training nerve net The process of network participle model includes:1, the word that big scale is poured in is extracted in corpus, wherein model training uses pre- The cutting language material of Microsoft Research in fixed cutting language material, such as classical bakeoff2005.2, by trained part therein It is trained, using part of detecting as final test.3, it (is adopted by comparing the result of neural network participle model input and output It is sequence labelling method) carry out the error of scoring model, if testing effect reaches 0.95 or more, this neural network participle Model training finishes.
Wherein, scheduled transformation model is word2vec models, and word2vec models include three-layer neural network, can be incited somebody to action One vocabulary is shown as term vector, by alphanumeric.The corresponding participle of financial text data is input to word2vec models, is obtained To the corresponding term vector of finance text data.
Wherein, scheduled sentiment analysis model is the short text sentiment analysis model (Deep based on short convolutional neural networks Convolutional Neural Networks for Sentiment Analysis of Short Texts), model master It is the corresponding sentence vector of one sentence text of input to want structure, by two layers of convolutional neural networks (Convolutional Neural Network, CNN) after, it is converted to the vector of a sentence-level, this vector is then input to one In 3 layers of neural network, the correct sentiment analysis result of the sentence is obtained by training.
In one embodiment, sentiment analysis result includes three kinds, for example,:[- 1,0,1], wherein -1 indicates the sentence table The mood reached is negative, passive, and 0 represents the mood of sentence expression represents the view of sentence expression as neutrality partially, 1 It is partially positive.
In addition, the output dimension size of 3 layers of neural network can be adjusted voluntarily, can be it is above-mentioned it is three-dimensional [- 1,0, 1], can also be two-dimensional [- 1,1], value from -1 to 1, be biased to the view that 1 represents then sentence expression be it is partially positive, Be biased to -1 represent the sentence expression mood be it is negative, passive, etc..
Wherein, the opinion rating of financial text data includes the first estate, the second grade and the tertiary gradient, opinion rating Can be favorable comment, in comment and difference is commented.The first estate corresponds to above-mentioned sentiment analysis result 1, and the second grade corresponds to above-mentioned emotion Analysis result 0, corresponding above-mentioned sentiment analysis result -1.Finally the output result of all sentences is merged, is calculated The sum of various sentiment analysis results, if the sum of sentiment analysis result be 1 it is most, financial text data is first Grade, if the sum of sentiment analysis result be 0 it is most, financial text data is the second grade, if sentiment analysis As a result sum be -1 it is most, then financial text data is the tertiary gradient.
The quantity of each opinion rating of financial text data under each text object type is counted, after statistics The quantity of each opinion rating calculate the proportion of each opinion rating;
Wherein, the quantity of each opinion rating of the financial text data under each text object type is counted, including: The quantity of the first estate of financial text data under each text object type, the second grade and the tertiary gradient is counted, It is illustrated with the data instance of company A, as described in Table 2:
Company A The first estate Second grade The tertiary gradient It amounts to
Achievement class 3 1 0 -
Financing class 0 2 0 -
Company governance 1 1 3 -
Analyst 4 1 1 -
Other 2 0 0 -
It is total 10 5 4 19
Generate weight 0.526316 0.263158 0.210526 -
Table 2
In table 2, the quantity of statistics the first estate, the second grade and the tertiary gradient is respectively 10,5,4, then Calculation Estimation Sum=10+5+4=19, the proportion=10/19*100%=52.63% of the first estate, proportion=5/19* of the second grade 100%=26.32%, proportion=4/19*100%=21.05% of the tertiary gradient.
The corresponding attribute score value of each opinion rating is obtained, according to the corresponding attribute score value of each opinion rating and each opinion rating Proportion calculate the market assessment index of the personal share entity at the time point;
Wherein, the attribute score value of the first estate is 1, and the attribute score value of the second grade is 0, the attribute score value of the tertiary gradient Be -1, market assessment index=100* [the proportion * 0+ tertiary gradient of proportion * second grades of 1+ of the first estate proportion * (- 1)].By taking above-mentioned company A as an example, then market assessment index=100* (52.63%*1+26.32%*0+21.05%* (- 1))= 31.58.The evaluation of market at the time point to the said firm can be obtained according to the market assessment index, for market analysis.
Market assessment index of the personal share entity at each time point is obtained, the market assessment index at each time point is pressed The corresponding market assessment exponential sequence of the personal share entity is generated according to chronological order.
In the present embodiment, sequentially in time, corresponding city is generated according to the market assessment index of above-mentioned personal share entity Field evaluation number sequence, obtains the market assessment exponential sequence of the company belonging to the personal share entity, according to the market assessment index Sequence can obtain variation and the trend of evaluation of the market to the said firm, for market analysis.
Compared with prior art, the present invention to each personal share entity each time cross-section different text object types Financial text data is analyzed using scheduled text analyzing method, the evaluation of each financial text data is obtained, to each text The quantity of each opinion rating of financial text data under this object type is counted and calculates the proportion of each opinion rating, root The market assessment index of the personal share entity at the time point is calculated according to the attribute score value and proportion of each opinion rating, according to the market Evaluation number can show that the evaluation of market at the time point to the said firm, the present invention are different to financial text data to be divided into Text object type and in such a way that scheduled text analyzing method is analyzed, can fully excavate to obtain accurate market Information generates market assessment exponential sequence sequentially in time, it can be deduced that variation and trend of the market to the evaluation of the said firm, For carrying out market analysis.
In one embodiment, on the basis of the above embodiments, it is also real when the processing system is executed by the processor Existing following steps:
According to scheduled trade classification method by each personal share entity point to corresponding category of employment, it is real to obtain each personal share The newest total market capitalisation of body calculates the corresponding total market capitalisation of industry-by-industry classification according to the newest total market capitalisation of each personal share entity;According to The corresponding total market capitalisation of category of employment belonging to the newest total market capitalisation of each personal share entity and the personal share entity calculates the personal share entity Market value proportion;It calculates the personal share entity according to personal share entity market assessment index at the time point and the market value proportion and exists The industry evaluation index at the time point;Industry evaluation index of the personal share entity at each time point is obtained, by each time point Industry evaluation index generate the corresponding industry evaluation exponential sequence of the personal share entity according to chronological order.
Wherein, scheduled trade classification method is, for example, ten thousand trade classification method of Shen.It in one embodiment, can be by Shanghai and Shenzhen two All personal share entity divisions in city are following 28 categorys of employment, including:Digging, steel, non-ferrous metal, construction material, is built chemical industry Build decoration, electrical equipment, mechanical equipment, defence and military, automobile, household electrical appliance, textile garment, light industry manufacture, commerce and trade, agriculture Lin Muyu, food and drink, leisure service, medicine bioengineering, public utilities, communications and transportation, real estate, electronics, computer, medium, Communication, bank, non-silver finance, synthesis.
By taking the category of employment where bank as an example, the industry evaluation index for calculating bank includes:First, it is real to extract each personal share The newest total market capitalisation of body is added the newest total market capitalisation of each personal share entity to obtain industry total market capitalisation;Secondly, each personal share entity is calculated Newest total market capitalisation account for the market value proportion of industry total market capitalisation:The newest total market capitalisation of market value proportion=personal share entity/industry total market capitalisation * 100%;Then, it calculates the personal share entity according to personal share entity market assessment index at the time point and the market value proportion and exists The industry evaluation index at the time point:The market value proportion * market assessment indexes of industry evaluation index=personal share entity;Finally, it presses According to above-mentioned method, the industry evaluation index of Each point in time is calculated, by the industry evaluation index at each time point according to the time Sequencing generates the corresponding industry evaluation exponential sequence of the personal share entity.
Pass through the industry evaluation index of above-mentioned personal share entity, it can be deduced that change of the industry to the evaluation of the personal share entity Change, passes through the industry evaluation exponential sequence of personal share entity, it can be deduced that variation and trend of the industry to the evaluation of the personal share entity, For market analysis.
In one embodiment, on the basis of the above embodiments, it is also real when the processing system is executed by the processor Existing following steps:Industry evaluation index by the personal share entity for belonging to same industry classification at same time point is added to obtain the row The market index of industry classification at the time point;Market index of the sector classification at each time point is obtained, by each time point Market index generate the corresponding market index sequence of the sector classification according to chronological order.
Wherein, pass through the market index of above-mentioned category of employment, it can be deduced that variation of the market to the evaluation of the sector is led to Cross the corresponding market index sequence of category of employment, it can be deduced that variation and trend of the market to the evaluation of the sector, for market Analysis.
In addition, the market index of all industries at same time point is summarized, so that it may to obtain market to entire The emotion expression service and view of capital market, for market analysis.
As shown in Fig. 2, Fig. 2 is the flow diagram of one embodiment of processing method of text data of the present invention, text number According to processing method include the following steps:
Various financial text datas are divided into corresponding text object type by step S1 according to preset classifying rules;
Wherein, preset classifying rules is that the text object type of the relevant financial text data of achievement is classified as achievement class The text object type for relevant financial text data of financing is classified as financing type, the relevant finance of company governance is literary by type The text object type of notebook data is classified as company governance type, by the text object type of the relevant financial text data of analyst It is classified as analyst's type, the text object type of the financial text data other than above-mentioned 4 type is classified as other types, such as Shown in above-mentioned table 1.
In table 1, achievement type includes performance report, achievement bulletin, the super expected report of achievement, and financing type includes:It is fixed Break to additional issue, orientation, company governance type includes:Senior executive's increase and decrease is held, big shareholder reduces, Ownership Incentive, employee stock ownership, analysis Teacher's type includes that prediction of profit substantially increases, analyst pays close attention to suddenly, and other types include:Height send turn, index constituent stock adjustment, It is relatively early to disclose annual report, do not go out bulletin for a long time.
Step S2 analyzes each text pair of each personal share entity under each time point using scheduled text analyzing method As the financial text data of type, the corresponding opinion rating of each financial text data is obtained;
Wherein, personal share entity is listed company, and each personal share entity will produce some financial textual datas in Each point in time According to time point can be per minute, each hour, every day etc..
In one embodiment, as shown in figure 3, described analyze each personal share entity every using scheduled text analyzing method The financial text data of each text object type under one time point obtains the step of the corresponding opinion rating of each financial text data Suddenly, it specifically includes:
Step S21 segments each financial text data using scheduled participle model, obtains each financial text The corresponding participle of data;Each corresponding segment of financial text data is input to scheduled transformation model, obtained by step S22 The corresponding term vector of each finance text data of output;Step S23 inputs the corresponding term vector of each finance text data To in scheduled sentiment analysis model, the sentiment analysis result of each sentence in the finance text data of output is obtained;Step S24 counts the sentiment analysis of each sentence in the finance text data as a result, obtaining the gold according to the sentiment analysis result counted Melt the corresponding opinion rating of text data.
Wherein, by one trained participle model the text of financial text data segmented, this point Word model is trained neural network participle model, it is therefore preferable to which shot and long term remembers Recognition with Recurrent Neural Network.The training nerve net The process of network participle model includes:1, the word that big scale is poured in is extracted in corpus, wherein model training uses pre- The cutting language material of Microsoft Research in fixed cutting language material, such as classical bakeoff2005.2, by trained part therein It is trained, using part of detecting as final test.3, it (is adopted by comparing the result of neural network participle model input and output It is sequence labelling method) carry out the error of scoring model, if testing effect reaches 0.95 or more, this neural network participle Model training finishes.
Wherein, scheduled transformation model is word2vec models, and word2vec models include three-layer neural network, can be incited somebody to action One vocabulary is shown as term vector, by alphanumeric.The corresponding participle of financial text data is input to word2vec models, is obtained To the corresponding term vector of finance text data.
Wherein, scheduled sentiment analysis model is the short text sentiment analysis model (Deep based on short convolutional neural networks Convolutional Neural Networks for Sentiment Analysis of Short Texts), model master It is the corresponding sentence vector of one sentence text of input to want structure, by two layers of convolutional neural networks (Convolutional Neural Network, CNN) after, it is converted to the vector of a sentence-level, this vector is then input to one In 3 layers of neural network, the correct sentiment analysis result of the sentence is obtained by training.
In one embodiment, sentiment analysis result includes three kinds, for example,:[- 1,0,1], wherein -1 indicates the sentence table The mood reached is negative, passive, and 0 represents the mood of sentence expression represents the view of sentence expression as neutrality partially, 1 It is partially positive.
In addition, the output dimension size of 3 layers of neural network can be adjusted voluntarily, can be it is above-mentioned it is three-dimensional [- 1,0, 1], can also be two-dimensional [- 1,1], value from -1 to 1, be biased to the view that 1 represents then sentence expression be it is partially positive, Be biased to -1 represent the sentence expression mood be it is negative, passive, etc..
Wherein, the opinion rating of financial text data includes the first estate, the second grade and the tertiary gradient, opinion rating Can be favorable comment, in comment and difference is commented.The first estate corresponds to above-mentioned sentiment analysis result 1, and the second grade corresponds to above-mentioned emotion Analysis result 0, corresponding above-mentioned sentiment analysis result -1.Finally the output result of all sentences is merged, is calculated The sum of various sentiment analysis results, if the sum of sentiment analysis result be 1 it is most, financial text data is first Grade, if the sum of sentiment analysis result be 0 it is most, financial text data is the second grade, if sentiment analysis As a result sum be -1 it is most, then financial text data is the tertiary gradient.
Step S3 counts the quantity of each opinion rating of the financial text data under each text object type, base The quantity of each opinion rating after statistics calculates the proportion of each opinion rating;
Wherein, the quantity of each opinion rating of the financial text data under each text object type is counted, including: The quantity of the first estate of financial text data under each text object type, the second grade and the tertiary gradient is counted, It is illustrated with the data instance of company A, as shown in Table 2 above.
In table 2, the quantity of the first estate, the second grade and the tertiary gradient is respectively 10,5,4, then Calculation Estimation sum =10+5+4=19, the proportion=10/19*100%=52.63% of the first estate, proportion=5/19*100% of the second grade =26.32%, proportion=4/19*100%=21.05% of the tertiary gradient.
Step S4 obtains the attribute score value of each opinion rating, according to the attribute score value of each opinion rating and each opinion rating Proportion calculate the personal share entity corresponding market assessment index at the time point.
Wherein, the attribute score value of the first estate is 1, and the attribute score value of the second grade is 0, the attribute score value of the tertiary gradient Be -1, market assessment index=100* [the proportion * 0+ tertiary gradient of proportion * second grades of 1+ of the first estate proportion * (- 1)].By taking above-mentioned company A as an example, then market assessment index=100* (52.63%*1+26.32%*0+21.05%* (- 1))= 31.58.The evaluation of market at the time point to the said firm can be obtained according to the market assessment index, for market analysis.
Step S5 obtains the personal share entity in the market assessment index at each time point, the market at each time point is commented Valence index generates the corresponding market assessment exponential sequence of the personal share entity according to chronological order.
In the present embodiment, sequentially in time, corresponding city is generated according to the market assessment index of above-mentioned personal share entity Field evaluation number sequence, obtains the market assessment exponential sequence of the company belonging to the personal share entity, according to the market assessment index Sequence can obtain variation and the trend of evaluation of the market to the said firm, for market analysis.
Compared with prior art, the present invention to each personal share entity each time cross-section different text object types Financial text data is analyzed using scheduled text analyzing method, the evaluation of each financial text data is obtained, to each text The quantity of each opinion rating of financial text data under this object type is counted and calculates the proportion of each opinion rating, root The market assessment index of the personal share entity at the time point is calculated according to the attribute score value and proportion of each opinion rating, according to the market Evaluation number can show that the evaluation of market at the time point to the said firm, the present invention are different to financial text data to be divided into Text object type and in such a way that scheduled text analyzing method is analyzed, can fully excavate to obtain accurate market Information generates market assessment exponential sequence sequentially in time, it can be deduced that variation and trend of the market to the evaluation of the said firm, For carrying out market analysis.
In one embodiment, as shown in figure 4, on the basis of the above embodiments, the processing method of this article notebook data is also wrapped It includes:
Step S6 is obtained each according to scheduled trade classification method by each personal share entity point to corresponding category of employment The newest total market capitalisation of personal share entity calculates the corresponding total city of industry-by-industry classification according to the newest total market capitalisation of each personal share entity Value;Step S7, according to the corresponding total market capitalisation of category of employment belonging to the newest total market capitalisation of each personal share entity and the personal share entity Calculate the market value proportion of the personal share entity;Step S8, according to personal share entity market assessment index at the time point and the city Value proportion calculates the industry evaluation index of the personal share entity at the time point;Step S9 obtains the personal share entity in each time The industry evaluation index at each time point is generated the personal share entity according to chronological order and corresponded to by the industry evaluation index of point Industry evaluation exponential sequence.
Wherein, scheduled trade classification method is, for example, ten thousand trade classification method of Shen.It in one embodiment, can be by Shanghai and Shenzhen two All personal share entity divisions in city are following 28 categorys of employment, including:Digging, steel, non-ferrous metal, construction material, is built chemical industry Build decoration, electrical equipment, mechanical equipment, defence and military, automobile, household electrical appliance, textile garment, light industry manufacture, commerce and trade, agriculture Lin Muyu, food and drink, leisure service, medicine bioengineering, public utilities, communications and transportation, real estate, electronics, computer, medium, Communication, bank, non-silver finance, synthesis.
By taking the category of employment where bank as an example, the industry evaluation index for calculating bank includes:First, it is real to extract each personal share The newest total market capitalisation of body is added the newest total market capitalisation of each personal share entity to obtain industry total market capitalisation;Secondly, each personal share entity is calculated Newest total market capitalisation account for the market value proportion of industry total market capitalisation:The newest total market capitalisation of market value proportion=personal share entity/industry total market capitalisation * 100%;Then, it calculates the personal share entity according to personal share entity market assessment index at the time point and the market value proportion and exists The industry evaluation index at the time point:The market value proportion * market assessment indexes of industry evaluation index=personal share entity;Finally, it presses According to above-mentioned method, the industry evaluation index of Each point in time is calculated, by the industry evaluation index at each time point according to the time Sequencing generates the corresponding industry evaluation exponential sequence of the personal share entity.
Pass through the industry evaluation index of above-mentioned personal share entity, it can be deduced that change of the industry to the evaluation of the personal share entity Change, passes through the industry evaluation exponential sequence of personal share entity, it can be deduced that variation and trend of the industry to the evaluation of the personal share entity, For market analysis.
In one embodiment, on the basis of the above embodiments, the processing method of this article notebook data further includes:It will belong to Industry evaluation index of the personal share entity of same industry classification at same time point is added to obtain the sector classification at the time point Market index;Obtain market index of the sector classification at each time point, by the market index at each time point according to when Between sequencing generate the corresponding market index sequence of the sector classification.
Wherein, pass through the market index of above-mentioned category of employment, it can be deduced that variation of the market to the evaluation of the sector is led to Cross the corresponding market index sequence of category of employment, it can be deduced that variation and trend of the market to the evaluation of the sector, for market Analysis.
In addition, the market index of all industries at same time point is summarized, so that it may to obtain market to entire The emotion expression service and view of capital market, for market analysis.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium The step of system, the processing system realizes the processing method of above-mentioned text data when being executed by processor.
The embodiments of the present invention are for illustration only, can not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical scheme of the present invention substantially in other words does the prior art Going out the part of contribution can be expressed in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, clothes Be engaged in device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of server, which is characterized in that the server includes memory and the processor that is connect with the memory, institute The processing system that is stored with and can run on the processor in memory is stated, when the processing system is executed by the processor Realize following steps:
Various financial text datas are divided into corresponding text object type according to preset classifying rules, wherein text object Type includes achievement type, financing type, company governance type, analyst's type and other types;
The gold of each text object type of each personal share entity under each time point is analyzed using scheduled text analyzing method Melt text data, obtains the corresponding opinion rating of each financial text data;
The quantity of each opinion rating of financial text data under each text object type is counted, based on each after statistics The quantity of opinion rating calculates the proportion of each opinion rating;
The corresponding attribute score value of each opinion rating is obtained, according to the corresponding attribute score value of each opinion rating and the ratio of each opinion rating The market assessment index of the re-computation personal share entity at the time point;
Obtain market assessment index of the personal share entity at each time point, by the market assessment index at each time point according to when Between sequencing generate the corresponding market assessment exponential sequence of the personal share entity.
2. server according to claim 1, which is characterized in that the opinion rating includes the first estate, the second grade And the tertiary gradient, the attribute score value of described the first estate is 1, and the attribute score value of second grade is 0, the tertiary gradient Attribute score value is -1, described market assessment index=100* [proportion * 0+ thirds of proportion * the second grades of 1+ of the first estate etc. The proportion * (- 1) of grade].
3. server according to claim 1 or 2, which is characterized in that described to be analyzed using scheduled text analyzing method The financial text data of each text object type of each personal share entity under each time point obtains each financial text data pair It the step of opinion rating answered, specifically includes:
Each financial text data is segmented using scheduled participle model, obtains each financial corresponding point of text data Word;
Each corresponding segment of financial text data is input to scheduled transformation model, obtains each financial textual data of output According to corresponding term vector;
The corresponding term vector of each finance text data is input in scheduled sentiment analysis model, the finance of output is obtained The sentiment analysis result of each sentence in text data;
The sentiment analysis of each sentence in the finance text data is counted as a result, obtaining the gold according to the sentiment analysis result counted Melt the corresponding opinion rating of text data.
4. server according to claim 1 or 2, which is characterized in that when the processing system is executed by the processor, Also realize following steps:
According to scheduled trade classification method by each personal share entity point to corresponding category of employment, each personal share entity is obtained Newest total market capitalisation calculates the corresponding total market capitalisation of industry-by-industry classification according to the newest total market capitalisation of each personal share entity;
The corresponding total market capitalisation of category of employment belonging to the newest total market capitalisation of each personal share entity and the personal share entity calculates should The market value proportion of personal share entity;
The personal share entity is calculated in the time according to personal share entity market assessment index at the time point and the market value proportion The industry evaluation index of point;
Obtain industry evaluation index of the personal share entity at each time point, by the industry evaluation index at each time point according to when Between sequencing generate the corresponding industry evaluation exponential sequence of the personal share entity.
5. a kind of processing method of text data, which is characterized in that the processing method of the text data includes:
Various financial text datas are divided into corresponding text object type, wherein text pair by S1 according to preset classifying rules As type includes achievement type, financing type, company governance type, analyst's type and other types;
S2 analyzes each text object type of each personal share entity under each time point using scheduled text analyzing method Financial text data obtains the corresponding opinion rating of each financial text data;
S3 counts the quantity of each opinion rating of the financial text data under each text object type, after statistics The quantity of each opinion rating calculate the proportion of each opinion rating;
S4 obtains the corresponding attribute score value of each opinion rating, according to the corresponding attribute score value of each opinion rating and each opinion rating Proportion calculate the market assessment index of the personal share entity at the time point;
S5 obtains market assessment index of the personal share entity at each time point, the market assessment index at each time point is pressed The corresponding market assessment exponential sequence of the personal share entity is generated according to chronological order.
6. the processing method of text data according to claim 5, which is characterized in that the opinion rating is including first etc. The attribute score value of grade, the second grade and the tertiary gradient, described the first estate is 1, and the attribute score value of second grade is 0, institute The attribute score value for stating the tertiary gradient is -1, market assessment index=100* [ratios of proportion * second grades of 1+ of the first estate The proportion * (- 1) of the weight * 0+ tertiary gradient].
7. the processing method of text data according to claim 5 or 6, which is characterized in that the step S2, it is specific to wrap It includes:
Each financial text data is segmented using scheduled participle model, obtains each financial corresponding point of text data Word;
Each corresponding segment of financial text data is input to scheduled transformation model, obtains each financial textual data of output According to corresponding term vector;
The corresponding term vector of each finance text data is input in scheduled sentiment analysis model, the finance of output is obtained The sentiment analysis result of each sentence in text data;
The sentiment analysis of each sentence in the finance text data is counted as a result, obtaining the gold according to the sentiment analysis result counted Melt the corresponding opinion rating of text data.
8. the processing method of text data according to claim 5 or 6, which is characterized in that the processing side of this article notebook data Method further includes:
According to scheduled trade classification method by each personal share entity point to corresponding category of employment, each personal share entity is obtained Newest total market capitalisation calculates the corresponding total market capitalisation of industry-by-industry classification according to the newest total market capitalisation of each personal share entity;
The corresponding total market capitalisation of category of employment belonging to the newest total market capitalisation of each personal share entity and the personal share entity calculates should The market value proportion of personal share entity;
After the step S4, further include:
The personal share entity is calculated in the time according to personal share entity market assessment index at the time point and the market value proportion The industry evaluation index of point;
Obtain industry evaluation index of the personal share entity at each time point, by the industry evaluation index at each time point according to when Between sequencing generate the corresponding industry evaluation exponential sequence of the personal share entity.
9. the processing method of text data according to claim 8, which is characterized in that the processing method of this article notebook data, Further include:
Industry evaluation index by the personal share entity for belonging to same industry classification at same time point is added to obtain the sector classification Market index at the time point;
Market index of the sector classification at each time point is obtained, the market index at each time point is suitable according to time order and function Sequence generates the corresponding market index sequence of the sector classification.
10. a kind of computer readable storage medium, which is characterized in that be stored with processing system on the computer readable storage medium System realizes the processing side of the text data as described in any one of claim 5 to 9 when the processing system is executed by processor The step of method.
CN201810469419.XA 2018-05-16 2018-05-16 Server, the processing method of text data and storage medium Withdrawn CN108764981A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810469419.XA CN108764981A (en) 2018-05-16 2018-05-16 Server, the processing method of text data and storage medium
PCT/CN2018/102135 WO2019218517A1 (en) 2018-05-16 2018-08-24 Server, method for processing text data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810469419.XA CN108764981A (en) 2018-05-16 2018-05-16 Server, the processing method of text data and storage medium

Publications (1)

Publication Number Publication Date
CN108764981A true CN108764981A (en) 2018-11-06

Family

ID=64007935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810469419.XA Withdrawn CN108764981A (en) 2018-05-16 2018-05-16 Server, the processing method of text data and storage medium

Country Status (2)

Country Link
CN (1) CN108764981A (en)
WO (1) WO2019218517A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754352A (en) * 2020-06-22 2020-10-09 平安资产管理有限责任公司 Method, device, equipment and storage medium for judging correctness of viewpoint statement
CN114386433A (en) * 2022-01-12 2022-04-22 中国农业银行股份有限公司 Data processing method, device and equipment based on emotion analysis and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022725B (en) * 2015-07-10 2018-04-20 河海大学 A kind of text emotion trend analysis method applied to finance Web fields
CN106022522A (en) * 2016-05-20 2016-10-12 南京大学 Method and system for predicting stocks based on big data published by internet
CN106202181A (en) * 2016-06-27 2016-12-07 苏州大学 A kind of sensibility classification method, Apparatus and system
US20180053255A1 (en) * 2016-08-19 2018-02-22 Noonum System and Method for end to end investment and portfolio management using machine driven analysis of the market against qualifying factors
CN106779149A (en) * 2016-11-21 2017-05-31 洪志令 The visual presentation method that a kind of shares changing tendency predicts the outcome

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754352A (en) * 2020-06-22 2020-10-09 平安资产管理有限责任公司 Method, device, equipment and storage medium for judging correctness of viewpoint statement
CN114386433A (en) * 2022-01-12 2022-04-22 中国农业银行股份有限公司 Data processing method, device and equipment based on emotion analysis and storage medium

Also Published As

Publication number Publication date
WO2019218517A1 (en) 2019-11-21

Similar Documents

Publication Publication Date Title
CN104781837B (en) System and method for forming predictions using event-based sentiment analysis
CN110443458A (en) Methods of risk assessment, device, computer equipment and storage medium
CN109409677A (en) Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
CN104115178A (en) Methods and systems for predicting market behavior based on news and sentiment analysis
Pooley Surveillance publishing
CN107704512A (en) Financial product based on social data recommends method, electronic installation and medium
CN109859052A (en) A kind of intelligent recommendation method, apparatus, storage medium and the server of investment tactics
Li et al. Stock price prediction incorporating market style clustering
CN113342972B (en) Public opinion recognition model training method and system and public opinion risk monitoring method and system
CN112204610A (en) Neural network based electronic content
CN108572988A (en) A kind of house property assessment data creation method and device
CN110109902A (en) A kind of electric business platform recommender system based on integrated learning approach
CN108241867A (en) A kind of sorting technique and device
Yuan et al. Mining emotions of the public from social media for enhancing corporate credit rating
Chen The impact of trade and financial expansion on volatility of real exchange rate
Song et al. Incorporating research reports and market sentiment for stock excess return prediction: a case of mainland china
CN108764981A (en) Server, the processing method of text data and storage medium
Ramzan et al. Impact of asset preferences on firm performance over its life cycle: Is agency theory or neo‐classical theory more relevant?
CN113220885A (en) Text processing method and system
CN110213239B (en) Suspicious transaction message generation method and device and server
Taguchi et al. Constructing equity investment strategies using analyst reports and regime switching models
CN111859946A (en) Method and device for ranking comments and machine-readable storage medium
Yang Can the green credit policy enhance firm export quality? Evidence from China based on the DID model
CN109242690A (en) Finance product recommended method, device, computer equipment and readable storage medium storing program for executing
CN108805603A (en) Marketing activity method for evaluating quality, server and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20181106

WW01 Invention patent application withdrawn after publication