WO2019218517A1 - Server, method for processing text data and storage medium - Google Patents

Server, method for processing text data and storage medium Download PDF

Info

Publication number
WO2019218517A1
WO2019218517A1 PCT/CN2018/102135 CN2018102135W WO2019218517A1 WO 2019218517 A1 WO2019218517 A1 WO 2019218517A1 CN 2018102135 W CN2018102135 W CN 2018102135W WO 2019218517 A1 WO2019218517 A1 WO 2019218517A1
Authority
WO
WIPO (PCT)
Prior art keywords
text data
entity
level
industry
time point
Prior art date
Application number
PCT/CN2018/102135
Other languages
French (fr)
Chinese (zh)
Inventor
李海疆
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019218517A1 publication Critical patent/WO2019218517A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls

Definitions

  • the present application relates to the field of data analysis technologies, and in particular, to a server, a method for processing text data, and a storage medium.
  • each listed company has various text data, such as performance forecast, financing report, analyst forecast, corporate governance, etc.
  • text data such as performance forecast, financing report, analyst forecast, corporate governance, etc.
  • these text data contain a large amount of market information, simple analysis of individual texts can not fully exploit accurate market information, and can not effectively guide the company or industry, so the text data is fully tapped to get accurate Market information has become a technical issue to be resolved.
  • the purpose of the present application is to provide a server, a method for processing text data, and a storage medium, which are intended to fully exploit financial text data to obtain accurate market information.
  • the present application provides a server including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The processor implements the following steps when executed:
  • the various financial text data are classified into corresponding text object types according to a preset classification rule, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
  • the number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
  • the present application further provides a method for processing text data, and the method for processing the text data includes:
  • S1 classifying various financial text data into corresponding text object types according to a preset classification rule, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
  • S5 Obtain a market evaluation index of the stock entity at each time point, and generate a market evaluation index sequence corresponding to the stock entity in a chronological order by the market evaluation index at each time point.
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being implemented by a processor to implement the steps of the text data processing method described above.
  • the method for dividing financial text data into different text object types and analyzing by using a predetermined text analysis method can fully exploit accurate market information and generate market evaluation index sequences in chronological order.
  • the changes and trends of the market's evaluation of the company can be derived for market analysis.
  • FIG. 1 is a schematic diagram of a hardware architecture of an embodiment of a server according to the present application.
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for processing text data according to the present application
  • FIG. 3 is a schematic diagram showing the refinement process of step S2 shown in FIG. 2;
  • FIG. 4 is a schematic flowchart diagram of a second embodiment of a method for processing text data according to the present application.
  • the server 1 is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance.
  • the server 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores a processing system operable on the processor 12. It is to be noted that Figure 1 shows only server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented and that more or fewer components may be implemented instead.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the server 1;
  • the readable storage medium can be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM).
  • a non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like.
  • the readable storage medium may be an internal storage unit of the server 1, such as a hard disk of the server 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the server 1, For example, a plug-in hard disk provided on the server 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, and the like.
  • the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program code for storing the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the server 1, such as performing control and processing related to data interaction or communication with the other devices.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running a processing system or the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
  • the processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application;
  • the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
  • the various financial text data are classified into corresponding text object types according to a preset classification rule, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
  • the default classification rule is to classify the text object type of the performance-related financial text data as the performance type, the text object type of the financing-related financial text data into the financing type, and the text of the financial text data related to the corporate governance.
  • the object type is classified as the corporate governance type
  • the text object type of the financial text data related to the analyst is classified as the analyst type
  • the text object type of the financial text data other than the above four types is classified into other types, as shown in Table 1 below. Show:
  • the performance types include performance report, performance report, and performance exceeding expectations.
  • the types of financing include: private placement and targeted breaks.
  • the types of corporate governance include: executives increase and decrease, shareholder reduction, equity incentives, employee holdings, analysis
  • the types of divisions include a sharp increase in earnings forecasts and sudden concern by analysts.
  • Other types include: high delivery, index component adjustments, early disclosure of annual reports, and long-term announcements.
  • each stock entity will generate some financial text data at each time point.
  • the time points can be every minute, every hour, every day, and so on.
  • the step of analyzing the financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining the evaluation level corresponding to each financial text data specifically includes:
  • Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data; the word segment corresponding to each financial text data is input to a predetermined conversion model, and each financial text data corresponding to the output is obtained.
  • a word vector input a word vector corresponding to each financial text data into a predetermined sentiment analysis model, obtain an sentiment analysis result of each sentence in the output financial text data; and statistically analyze the sentiment analysis result of each statement in the financial text data And obtaining an evaluation level corresponding to the financial text data according to the calculated sentiment analysis result.
  • the text of the financial text data is segmented by an already trained word segmentation model
  • the word segmentation model is a trained neural network segmentation model, preferably a long-term and short-term memory cycle neural network.
  • the process of training the neural network segmentation model includes: 1. Extracting a large number of well-written words from the corpus, wherein the model training uses predetermined segmentation corpora, such as the classic snippet corpus of Microsoft Research in bakeoff2005. 2. Train the training part and use the test part as the final test. 3. By comparing the input and output results of the neural network segmentation model (using the sequence labeling method) to judge the error of the model, if the test effect reaches 0.95 or above, the neural network segmentation model is completed.
  • the predetermined conversion model is the word2vec model
  • the word2vec model includes a three-layer neural network, which can represent a word as a word vector and digitize the text.
  • the word segment corresponding to the financial text data is input to the word2vec model to obtain a word vector corresponding to the financial text data.
  • the predetermined sentiment analysis model is a Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts.
  • the main structure of the model is to input a sentence vector corresponding to a sentence text, after two layers. After the Convolutional Neural Network (CNN), it is transformed into a sentence-level vector, and then the vector is input into a 3-layer neural network, and the correct sentiment analysis result of the sentence is obtained through training.
  • CNN Convolutional Neural Network
  • the sentiment analysis result includes three types, for example: [-1, 0, 1], wherein -1 indicates that the emotion expressed by the sentence is negative and negative, and 0 indicates that the emotion expressed by the sentence is biased. Neutral, 1 means that the expression expressed in the sentence is positive.
  • the output dimension of the 3-layer neural network can be adjusted by itself, which can be the above three-dimensional [-1, 0, 1], or two-dimensional [-1, 1], and its value is from -1 to 1.
  • the preference for 1 means that the expression of the sentence is positive, and the bias of -1 means that the emotion expressed by the sentence is negative, negative, and so on.
  • the evaluation level of the financial text data includes the first level, the second level, and the third level, and the evaluation level may also be a good rating, a middle rating, and a bad rating.
  • the first level corresponds to the above-described sentiment analysis result 1
  • the second level corresponds to the above-described sentiment analysis result 0, corresponding to the above-described sentiment analysis result-1.
  • the output of all sentences is fused together to calculate the total number of sentiment analysis results. If the total number of sentiment analysis results is the largest, the financial text data is the first level, and if the total number of sentiment analysis results is 0. At most, the financial text data is in the second level, and if the total number of sentiment analysis results is -1, the financial text data is in the third level.
  • the number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
  • the statistics of the number of evaluation levels of the financial text data under each text object type include: counting the number of the first level, the second level, and the third level of the financial text data under each text object type, Take the data of company A as an example, as shown in Table 2 below:
  • the first level has an attribute score of 1
  • the second level has an attribute score of 0
  • the third level has an attribute score of -1
  • the market evaluation index 100* [the first level of the proportion *1 + second The proportion of the level *0 + the proportion of the third level * (-1)].
  • a corresponding market evaluation index sequence is generated according to the market evaluation index of the above-mentioned individual entity, and a market evaluation index sequence of the company to which the individual entity belongs is obtained, and according to the market evaluation index sequence, the market pair can be obtained.
  • the present application analyzes the financial text data of different text object types of each stock entity at each time section by using a predetermined text analysis method, and obtains an evaluation of each financial text data.
  • the number of each evaluation level of the financial text data under the text object type is counted and the proportion of each evaluation level is calculated, and the market evaluation index of the individual entity at the time point is calculated according to the attribute score and the specific gravity of each evaluation level, according to the market
  • the evaluation index can be used to evaluate the company's evaluation of the company at that point in time.
  • This application can fully exploit the accurate market information by dividing the financial text data into different text object types and using predetermined text analysis methods.
  • the sequence of market evaluation indexes is generated in chronological order, and the changes and trends of the evaluation of the company by the market can be obtained for market analysis.
  • each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity; according to the latest total of each individual entity Calculating the market value of the entity by calculating the market value and the total market value corresponding to the industry category to which the entity belongs; calculating the industry evaluation index of the entity at the time based on the market evaluation index and the market value of the entity at the time; Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  • the predetermined industry classification method is, for example, the Shenwan industry classification method.
  • all the entities in the Shanghai and Shenzhen stock exchanges can be divided into the following 28 industry categories, including: mining, chemical, steel, non-ferrous metals, building materials, architectural decoration, electrical equipment, mechanical equipment, national defense military, automobile , household appliances, textile and garment, light industry manufacturing, commercial trade, agriculture, forestry, animal husbandry and fishery, food and beverage, leisure services, medical and biological, public utilities, transportation, real estate, electronics, computers, media, communications, banking, non-banking finance, comprehensive .
  • calculating the bank's industry evaluation index includes: first, extracting the latest total market value of each stock entity, adding the latest total market value of each stock entity to the total market value of the industry; second, calculating the individual stock entity
  • the market value of the latest total market capitalization of the total market capitalization: the market value ratio the latest total market value of the individual entity / the total market value of the industry * 100%; then, based on the market evaluation index of the stock entity at that point in time and the market value of the stock entity
  • the sequence of industry evaluation indexes corresponding to the stock entity is generated in sequence.
  • the processing system when executed by the processor, the following steps are further implemented: adding the industry evaluation indexes of the individual entities belonging to the same industry category at the same time point to obtain The market index of the industry category at the time point; obtaining the market index of the industry category at each time point, and generating the market index sequence corresponding to the industry category in chronological order for the market index at each time point.
  • FIG. 2 is a schematic flowchart of an embodiment of a method for processing text data according to the present application.
  • the method for processing text data includes the following steps:
  • Step S1 dividing various financial text data into corresponding text object types according to a preset classification rule
  • the default classification rule is to classify the text object type of the performance-related financial text data as the performance type, the text object type of the financing-related financial text data into the financing type, and the text of the financial text data related to the corporate governance.
  • the object type is classified as the corporate governance type
  • the text object type of the financial text data related to the analyst is classified as the analyst type
  • the text object type of the financial text data other than the above four types is classified into other types, as in Table 1 above. Shown.
  • the performance types include performance reports, performance reports, and performance reports.
  • the types of financing include: private placements and targeted breaks.
  • corporate governance types include: executive increase and decrease, major shareholder reduction, equity incentives, and employee holdings.
  • Shares, analyst types include a sharp increase in earnings forecasts, analysts suddenly concerned, other types include: high delivery, index component adjustments, early disclosure of annual reports, long-term announcements.
  • Step S2 analyzing financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining an evaluation level corresponding to each financial text data;
  • each stock entity will generate some financial text data at each time point.
  • the time points can be every minute, every hour, every day, and so on.
  • the financial text data of each text object type of each stock entity at each time point is analyzed by using a predetermined text analysis method, and the evaluation level corresponding to each financial text data is obtained.
  • the steps include:
  • Step S21 using a predetermined word segmentation model to segment each financial text data to obtain a word segment corresponding to each financial text data; in step S22, input the word segment corresponding to each financial text data into a predetermined conversion model, and obtain each output of the output. a word vector corresponding to the financial text data; in step S23, the word vector corresponding to each financial text data is input into a predetermined sentiment analysis model, and the sentiment analysis result of each sentence in the output financial text data is obtained; step S24, The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
  • the text of the financial text data is segmented by an already trained word segmentation model, which is a trained neural network segmentation model, preferably a long-term and short-term memory cycle neural network.
  • the process of training the neural network segmentation model includes: 1. Extracting a large number of well-written words from the corpus, wherein the model training uses predetermined segmentation corpora, such as the classic snippet corpus of Microsoft Research in bakeoff2005. 2. Train the training part and use the test part as the final test. 3. By comparing the input and output results of the neural network segmentation model (using the sequence labeling method) to judge the error of the model, if the test effect reaches 0.95 or above, the neural network segmentation model is completed.
  • the predetermined conversion model is the word2vec model
  • the word2vec model includes a three-layer neural network, which can represent a word as a word vector and digitize the text.
  • the word segment corresponding to the financial text data is input to the word2vec model to obtain a word vector corresponding to the financial text data.
  • the predetermined sentiment analysis model is a Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts.
  • the main structure of the model is to input a sentence vector corresponding to a sentence text, after two layers. After the Convolutional Neural Network (CNN), it is transformed into a sentence-level vector, and then the vector is input into a 3-layer neural network, and the correct sentiment analysis result of the sentence is obtained through training.
  • CNN Convolutional Neural Network
  • the sentiment analysis result includes three types, for example: [-1, 0, 1], wherein -1 indicates that the emotion expressed by the sentence is negative and negative, and 0 indicates that the emotion expressed by the sentence is biased. Neutral, 1 means that the expression expressed in the sentence is positive.
  • the output dimension of the 3-layer neural network can be adjusted by itself, which can be the above three-dimensional [-1, 0, 1], or two-dimensional [-1, 1], and its value is from -1 to 1.
  • the preference for 1 means that the expression of the sentence is positive, and the bias of -1 means that the emotion expressed by the sentence is negative, negative, and so on.
  • the evaluation level of the financial text data includes the first level, the second level, and the third level, and the evaluation level may also be a good rating, a middle rating, and a bad rating.
  • the first level corresponds to the above-described sentiment analysis result 1
  • the second level corresponds to the above-described sentiment analysis result 0, corresponding to the above-described sentiment analysis result-1.
  • the output of all sentences is fused together to calculate the total number of sentiment analysis results. If the total number of sentiment analysis results is the largest, the financial text data is the first level, and if the total number of sentiment analysis results is 0. At most, the financial text data is in the second level, and if the total number of sentiment analysis results is -1, the financial text data is in the third level.
  • Step S3 the number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
  • the statistics of the number of evaluation levels of the financial text data under each text object type include: counting the number of the first level, the second level, and the third level of the financial text data under each text object type,
  • the data of company A is taken as an example, as shown in Table 2 above.
  • step S4 attribute scores of each evaluation level are obtained, and the market evaluation index corresponding to the stock entity at the time point is calculated according to the attribute score of each evaluation level and the proportion of each evaluation level.
  • the first level has an attribute score of 1
  • the second level has an attribute score of 0
  • the third level has an attribute score of -1
  • the market evaluation index 100* [the first level of the proportion *1 + second The proportion of the level *0 + the proportion of the third level * (-1)].
  • Step S5 Obtain a market evaluation index of the stock entity at each time point, and generate a market evaluation index sequence corresponding to the stock entity in a chronological order for the market evaluation index at each time point.
  • a corresponding market evaluation index sequence is generated according to the market evaluation index of the above-mentioned individual entity, and a market evaluation index sequence of the company to which the individual entity belongs is obtained, and according to the market evaluation index sequence, the market pair can be obtained.
  • the application divides the financial text data into different text object types and analyzes by using a predetermined text analysis method, and can fully extract accurate market information, and generate a market evaluation index sequence in time sequence, which can be obtained from the market. Changes and trends in the company's evaluation for market analysis.
  • the processing method of the text data further includes:
  • Step S6 according to a predetermined industry classification method, each individual entity is divided into corresponding industry categories, obtaining the latest total market value of each individual entity, and calculating the total market value corresponding to each industry category according to the latest total market value of each individual entity; step S7, according to Calculating the market value of the entity by calculating the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs; step S8, calculating the entity according to the market evaluation index and the market value of the stock entity at the time point At the point of time, the industry evaluation index; step S9, obtaining the industry evaluation index of the stock entity at each time point, and generating the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  • the predetermined industry classification method is, for example, the Shenwan industry classification method.
  • all the entities in the Shanghai and Shenzhen stock exchanges can be divided into the following 28 industry categories, including: mining, chemical, steel, non-ferrous metals, building materials, architectural decoration, electrical equipment, mechanical equipment, national defense military, automobile , household appliances, textile and garment, light industry manufacturing, commercial trade, agriculture, forestry, animal husbandry and fishery, food and beverage, leisure services, medical and biological, public utilities, transportation, real estate, electronics, computers, media, communications, banking, non-banking finance, comprehensive .
  • calculating the bank's industry evaluation index includes: first, extracting the latest total market value of each stock entity, adding the latest total market value of each stock entity to the total market value of the industry; second, calculating the individual stock entity
  • the market value of the latest total market capitalization of the total market capitalization: the market value ratio the latest total market value of the individual entity / the total market value of the industry * 100%; then, based on the market evaluation index of the stock entity at that point in time and the market value of the stock entity
  • the sequence of industry evaluation indexes corresponding to the stock entity is generated in sequence.
  • the method for processing the text data further includes: adding the industry evaluation indexes of the individual entities belonging to the same industry category at the same time point to obtain the industry category at the time point.
  • Market index obtain the market index of the industry category at each time point, and generate the market index sequence corresponding to the industry category in chronological order for the market index at each time point.
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being implemented by a processor to implement the steps of the text data processing method described above.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Abstract

The present application relates to a server, a method for processing text data, and a storage medium, the method comprising: sorting various financial text data into corresponding text object types; analyzing financial text data of each text object type of each stock entity at each time point to obtain an evaluation grade of each piece of financial text data; counting the number of each evaluation grade of the financial text data under each text object type, and calculating the proportion of each evaluation grade on the basis of the counted number of each evaluation grade; obtaining attribute scores of each evaluation grade, and calculating a market evaluation index of the stock entities at the time points according to the attribute score corresponding to each evaluation grade and the proportion of each evaluation grade; obtaining a market evaluation index of the stock entities at each time point, and sorting the market evaluation index at each time point in chronological order to generate a market evaluation index sequence corresponding to the stock entities. The present application may fully mine financial text data to obtain accurate market information.

Description

服务器、文本数据的处理方法及存储介质Server, text data processing method and storage medium
优先权申明Priority claim
本申请基于巴黎公约申明享有2018年05月16日递交的申请号为CN201810469419X、名称为“服务器、文本数据的处理方法及存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。The present application is based on the priority of the Chinese Patent Application entitled "Server, Text Data Processing Method and Storage Medium", which is filed on May 16, 2018, with the application number of CN201810469419X, the entire contents of which are hereby incorporated by reference. The way is combined in this application.
技术领域Technical field
本申请涉及数据分析技术领域,尤其涉及一种服务器、文本数据的处理方法及存储介质。The present application relates to the field of data analysis technologies, and in particular, to a server, a method for processing text data, and a storage medium.
背景技术Background technique
目前,在各个时间截面上,每一家上市公司都存在各种文本数据,例如业绩预报、融资报道、分析师预测、公司治理等,现有技术中一般仅仅简单分析单个的文本得出相应的市场评价,然而,由于这些文本数据中包含大量的市场信息,简单分析单个的文本无法充分挖掘得到准确的市场信息,无法对公司或行业进行有效的指导,因此对这些文本数据进行充分挖掘以得到准确的市场信息,成为有待解决的技术问题。At present, in each time section, each listed company has various text data, such as performance forecast, financing report, analyst forecast, corporate governance, etc. In the prior art, generally only a single text is analyzed to obtain the corresponding market. Evaluation, however, because these text data contain a large amount of market information, simple analysis of individual texts can not fully exploit accurate market information, and can not effectively guide the company or industry, so the text data is fully tapped to get accurate Market information has become a technical issue to be resolved.
发明内容Summary of the invention
本申请的目的在于提供一种服务器、文本数据的处理方法及存储介质,旨在充分挖掘金融文本数据得到准确的市场信息。The purpose of the present application is to provide a server, a method for processing text data, and a storage medium, which are intended to fully exploit financial text data to obtain accurate market information.
为实现上述目的,本申请提供一种服务器,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:To achieve the above object, the present application provides a server including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The processor implements the following steps when executed:
按照预设的分类规则将各种金融文本数据分为对应的文本对象类型,其中,文本对象类型包括业绩类型、融资类型、公司治理类型、分析师类型及 其他类型;The various financial text data are classified into corresponding text object types according to a preset classification rule, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级;Using a predetermined text analysis method, analyzing financial text data of each text object type of each stock entity at each time point, and obtaining an evaluation level corresponding to each financial text data;
对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,基于统计后的各评价等级的数量计算各评价等级的比重;The number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
获取各评价等级对应的属性分值,根据各评价等级对应的属性分值及各评价等级的比重计算该个股实体在该时间点的市场评价指数;Obtaining an attribute score corresponding to each evaluation level, and calculating a market evaluation index of the stock entity at the time point according to the attribute score corresponding to each evaluation level and the proportion of each evaluation level;
获取该个股实体在每一时间点的市场评价指数,将每一时间点的市场评价指数按照时间先后顺序生成该个股实体对应的市场评价指数序列。Obtain the market evaluation index of the stock entity at each time point, and generate the market evaluation index sequence corresponding to the stock entity in the chronological order of the market evaluation index at each time point.
为实现上述目的,本申请还提供一种文本数据的处理方法,所述文本数据的处理方法包括:To achieve the above object, the present application further provides a method for processing text data, and the method for processing the text data includes:
S1,按照预设的分类规则将各种金融文本数据分为对应的文本对象类型,其中,文本对象类型包括业绩类型、融资类型、公司治理类型、分析师类型及其他类型;S1, classifying various financial text data into corresponding text object types according to a preset classification rule, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
S2,利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级;S2, analyzing financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining an evaluation level corresponding to each financial text data;
S3,对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,基于统计后的各评价等级的数量计算各评价等级的比重;S3, the number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
S4,获取各评价等级对应的属性分值,根据各评价等级对应的属性分值及各评价等级的比重计算该个股实体在该时间点的市场评价指数;S4, obtaining an attribute score corresponding to each evaluation level, and calculating a market evaluation index of the stock entity at the time point according to the attribute score corresponding to each evaluation level and the proportion of each evaluation level;
S5,获取该个股实体在每一时间点的市场评价指数,将每一时间点的市场评价指数按照时间先后顺序生成该个股实体对应的市场评价指数序列。S5: Obtain a market evaluation index of the stock entity at each time point, and generate a market evaluation index sequence corresponding to the stock entity in a chronological order by the market evaluation index at each time point.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现上述的文本数据的处理方法的步骤。The present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being implemented by a processor to implement the steps of the text data processing method described above.
本申请的有益效果是:本申请对将金融文本数据分为不同的文本对象类型并采用预定的文本分析方法进行分析的方式,能够充分挖掘得到准确的市场信息,按照时间顺序生成市场评价指数序列,可以得出市场对该公司的评价的变化及趋势,供进行市场分析。The beneficial effects of the present application are as follows: the method for dividing financial text data into different text object types and analyzing by using a predetermined text analysis method can fully exploit accurate market information and generate market evaluation index sequences in chronological order. The changes and trends of the market's evaluation of the company can be derived for market analysis.
附图说明DRAWINGS
图1为本申请服务器一实施例的硬件架构的示意图;1 is a schematic diagram of a hardware architecture of an embodiment of a server according to the present application;
图2为本申请文本数据的处理方法第一实施例的流程示意图;2 is a schematic flowchart of a first embodiment of a method for processing text data according to the present application;
图3为图2所示步骤S2的细化流程示意图;FIG. 3 is a schematic diagram showing the refinement process of step S2 shown in FIG. 2;
图4为本申请文本数据的处理方法第二实施例的流程示意图。FIG. 4 is a schematic flowchart diagram of a second embodiment of a method for processing text data according to the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.
参阅图1所示,是本申请服务器一实施例的硬件架构的示意图,服务器1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述服务器1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。Referring to FIG. 1, which is a schematic diagram of a hardware architecture of an embodiment of the server of the present application, the server 1 is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance. The server 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing. A super virtual computer consisting of a group of loosely coupled computers.
在本实施例中,服务器1可包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,存储器11存储有可在处理器12上运行的处理系统。需要指出的是,图1仅示出了具有组件11-13的服务 器1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In the present embodiment, the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores a processing system operable on the processor 12. It is to be noted that Figure 1 shows only server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented and that more or fewer components may be implemented instead.
其中,存储器11包括内存及至少一种类型的可读存储介质。内存为服务器1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是服务器1的内部存储单元,例如该服务器1的硬盘;在另一些实施例中,该非易失性存储介质也可以是服务器1的外部存储设备,例如服务器1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于服务器1的操作系统和各类应用软件,例如存储本申请一实施例中的处理系统的程序代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes a memory and at least one type of readable storage medium. The memory provides a cache for the operation of the server 1; the readable storage medium can be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM). A non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the server 1, such as a hard disk of the server 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the server 1, For example, a plug-in hard disk provided on the server 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, and the like. In this embodiment, the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program code for storing the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述服务器1的总体操作,例如执行与所述其他设备进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行处理系统等。The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the server 1, such as performing control and processing related to data interaction or communication with the other devices. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running a processing system or the like.
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述服务器1与其他电子设备之间建立通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
所述处理系统存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块。The processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application; The at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
在一实施例中,上述处理系统被所述处理器12执行时实现如下步骤:In an embodiment, when the processing system is executed by the processor 12, the following steps are implemented:
按照预设的分类规则将各种金融文本数据分为对应的文本对象类型,其 中,文本对象类型包括业绩类型、融资类型、公司治理类型、分析师类型及其他类型;The various financial text data are classified into corresponding text object types according to a preset classification rule, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
其中,预设的分类规则为将业绩相关的金融文本数据的文本对象类型归为业绩类型、将融资相关的金融文本数据的文本对象类型归为融资类型、将公司治理相关的金融文本数据的文本对象类型归为公司治理类型、将分析师相关的金融文本数据的文本对象类型归为分析师类型、将除了上述4种类型外的金融文本数据的文本对象类型归为其他类型,如下表1所示:The default classification rule is to classify the text object type of the performance-related financial text data as the performance type, the text object type of the financing-related financial text data into the financing type, and the text of the financial text data related to the corporate governance. The object type is classified as the corporate governance type, the text object type of the financial text data related to the analyst is classified as the analyst type, and the text object type of the financial text data other than the above four types is classified into other types, as shown in Table 1 below. Show:
Figure PCTCN2018102135-appb-000001
Figure PCTCN2018102135-appb-000001
表1Table 1
其中,业绩类型包括业绩报告、业绩快报、业绩超预期报道,融资类型包括:定向增发、定向破发,公司治理类型包括:高管增减持、大股东减持、股权激励、员工持股,分析师类型包括盈利预测大幅调升、分析师突然关注,其他类型包括:高送转、指数成分股调整、较早披露年报、长期不出公告。Among them, the performance types include performance report, performance report, and performance exceeding expectations. The types of financing include: private placement and targeted breaks. The types of corporate governance include: executives increase and decrease, shareholder reduction, equity incentives, employee holdings, analysis The types of divisions include a sharp increase in earnings forecasts and sudden concern by analysts. Other types include: high delivery, index component adjustments, early disclosure of annual reports, and long-term announcements.
利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级;Using a predetermined text analysis method, analyzing financial text data of each text object type of each stock entity at each time point, and obtaining an evaluation level corresponding to each financial text data;
其中,个股实体为上市公司,每一个股实体在各个时间点会产生一些金融文本数据,时间点可以是每分钟、每个小时、每一天等等。Among them, the individual stock entities are listed companies, and each stock entity will generate some financial text data at each time point. The time points can be every minute, every hour, every day, and so on.
在一实施例中,所述利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级的步骤,具体包括:In an embodiment, the step of analyzing the financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining the evaluation level corresponding to each financial text data, specifically includes:
利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析结果获取该金融文本数据对应的评价等级。Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data; the word segment corresponding to each financial text data is input to a predetermined conversion model, and each financial text data corresponding to the output is obtained. a word vector; input a word vector corresponding to each financial text data into a predetermined sentiment analysis model, obtain an sentiment analysis result of each sentence in the output financial text data; and statistically analyze the sentiment analysis result of each statement in the financial text data And obtaining an evaluation level corresponding to the financial text data according to the calculated sentiment analysis result.
其中,通过一个已经训练好的分词模型来对金融文本数据的文本进行分词,该分词模型为训练好的神经网络分词模型,优选地为长短期记忆循环神经网络。训练该神经网络分词模型的过程包括:1、从语料库里提取大量表注好的词语,其中,模型训练使用的是预定的切分语料,例如经典的bakeoff2005中的微软研究院的切分语料。2、将其中的训练部分进行训练,将测试部分作为最终的测试。3、通过对比神经网络分词模型输入输出的结果(采用的是序列标注法)来评判模型的误差,如果测试效果达到0.95以上,则这个神经网络分词模型训练完毕。Wherein, the text of the financial text data is segmented by an already trained word segmentation model, and the word segmentation model is a trained neural network segmentation model, preferably a long-term and short-term memory cycle neural network. The process of training the neural network segmentation model includes: 1. Extracting a large number of well-written words from the corpus, wherein the model training uses predetermined segmentation corpora, such as the classic snippet corpus of Microsoft Research in bakeoff2005. 2. Train the training part and use the test part as the final test. 3. By comparing the input and output results of the neural network segmentation model (using the sequence labeling method) to judge the error of the model, if the test effect reaches 0.95 or above, the neural network segmentation model is completed.
其中,预定的转换模型为word2vec模型,word2vec模型包括三层神经网络,可以将一个词表示为词向量,将文字数字化。将金融文本数据对应的分词输入至word2vec模型,得到该金融文本数据对应的词向量。Among them, the predetermined conversion model is the word2vec model, and the word2vec model includes a three-layer neural network, which can represent a word as a word vector and digitize the text. The word segment corresponding to the financial text data is input to the word2vec model to obtain a word vector corresponding to the financial text data.
其中,预定的情感分析模型为基于短卷积神经网络的短文本情感分析模型(Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts),该模型主要结构是输入一个句子文本对应的句向量,经过两层卷积神经网络(Convolutional Neural Network,CNN)后,转化成一个sentence-level的向量,然后将这个向量输入到一个3层的神经网络中,经过训练得到该句子正确的情感分析结果。The predetermined sentiment analysis model is a Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. The main structure of the model is to input a sentence vector corresponding to a sentence text, after two layers. After the Convolutional Neural Network (CNN), it is transformed into a sentence-level vector, and then the vector is input into a 3-layer neural network, and the correct sentiment analysis result of the sentence is obtained through training.
在一实施例中,情感分析结果包括三种,例如为:[-1,0,1],其中-1表示该句子表达的情绪是负面的、消极的,0代表该句子表达的情绪为偏中立的,1代表该句子表达的看法是偏积极的。In an embodiment, the sentiment analysis result includes three types, for example: [-1, 0, 1], wherein -1 indicates that the emotion expressed by the sentence is negative and negative, and 0 indicates that the emotion expressed by the sentence is biased. Neutral, 1 means that the expression expressed in the sentence is positive.
此外,3层的神经网络的输出维度size可以自行调整,可以是上述三维的[-1,0,1],也可以是二维的[-1,1],其取值从-1到1,偏向1代表则该句子表达的看法是偏积极的,偏向-1则代表该句子表达的情绪是负面的、消极的,等。In addition, the output dimension of the 3-layer neural network can be adjusted by itself, which can be the above three-dimensional [-1, 0, 1], or two-dimensional [-1, 1], and its value is from -1 to 1. The preference for 1 means that the expression of the sentence is positive, and the bias of -1 means that the emotion expressed by the sentence is negative, negative, and so on.
其中,金融文本数据的评价等级包括第一等级、第二等级及第三等级,评价等级也可以是好评、中评及差评。第一等级对应上述的情感分析结果1,第二等级对应上述的情感分析结果0,对应上述的情感分析结果-1。最后将所有的句子的输出结果融合在一起,计算各种情感分析结果的总数,如果情感分析结果的总数为1的最多,则金融文本数据的为第一等级,如果情感分析结果的总数为0的最多,则金融文本数据的为第二等级,如果情感分析结果的总数为-1的最多,则金融文本数据的为第三等级。The evaluation level of the financial text data includes the first level, the second level, and the third level, and the evaluation level may also be a good rating, a middle rating, and a bad rating. The first level corresponds to the above-described sentiment analysis result 1, and the second level corresponds to the above-described sentiment analysis result 0, corresponding to the above-described sentiment analysis result-1. Finally, the output of all sentences is fused together to calculate the total number of sentiment analysis results. If the total number of sentiment analysis results is the largest, the financial text data is the first level, and if the total number of sentiment analysis results is 0. At most, the financial text data is in the second level, and if the total number of sentiment analysis results is -1, the financial text data is in the third level.
对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,基于统计后的各评价等级的数量计算各评价等级的比重;The number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
其中,对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,包括:对各文本对象类型下的金融文本数据的第一等级、第二等级及第三等级的数量进行统计,以公司A的数据为例进行说明,如下述表2所示:The statistics of the number of evaluation levels of the financial text data under each text object type include: counting the number of the first level, the second level, and the third level of the financial text data under each text object type, Take the data of company A as an example, as shown in Table 2 below:
公司ACompany A 第一等级First level 第二等级second level 第三等级Third level 总计total
业绩类Performance category 33 11 00 --
融资类Financing 00 22 00 --
公司治理Corporate Governance 11 11 33 --
分析师Analyst 44 11 11 --
其他other 22 00 00 --
合计total 1010 55 44 1919
生成权重Generating weight 0.5263160.526316 0.2631580.263158 0.2105260.210526 --
表2Table 2
在表2中,统计第一等级、第二等级及第三等级的数量分别为10、5、4,则计算评价总数=10+5+4=19,第一等级的比重=10/19*100%=52.63%,第二等级的比重=5/19*100%=26.32%,第三等级的比重=4/19*100%=21.05%。In Table 2, the number of the first level, the second level, and the third level is 10, 5, and 4, respectively, and the total number of evaluations is 10 + 5 + 4 = 19, and the proportion of the first level = 10 / 19 * 100%=52.63%, the specific gravity of the second grade=5/19*100%=26.32%, and the specific gravity of the third grade=4/19*100%=21.05%.
获取各评价等级对应的属性分值,根据各评价等级对应的属性分值及各评价等级的比重计算该个股实体在该时间点的市场评价指数;Obtaining an attribute score corresponding to each evaluation level, and calculating a market evaluation index of the stock entity at the time point according to the attribute score corresponding to each evaluation level and the proportion of each evaluation level;
其中,第一等级的属性分值为1,第二等级的属性分值为0,第三等级的属性分值为-1,市场评价指数=100*[第一等级的比重*1+第二等级的比重*0+第三等级的比重*(-1)]。以上述公司A为例,则市场评价指数=100*(52.63%*1+26.32%*0+21.05%*(-1))=31.58。根据该市场评价指数可以得出市场对该公司在该时间点的评价,供市场分析。The first level has an attribute score of 1, the second level has an attribute score of 0, the third level has an attribute score of -1, and the market evaluation index = 100* [the first level of the proportion *1 + second The proportion of the level *0 + the proportion of the third level * (-1)]. Taking the above company A as an example, the market evaluation index=100*(52.63%*1+26.32%*0+21.05%*(-1))=31.58. According to the market evaluation index, the market's evaluation of the company at that point in time can be obtained for market analysis.
获取该个股实体在每一时间点的市场评价指数,将每一时间点的市场评价指数按照时间先后顺序生成该个股实体对应的市场评价指数序列。Obtain the market evaluation index of the stock entity at each time point, and generate the market evaluation index sequence corresponding to the stock entity in the chronological order of the market evaluation index at each time point.
本实施例中,按照时间顺序,根据上述的个股实体的市场评价指数生成对应的市场评价指数序列,得到该个股实体所属的公司的市场评价指数序列,根据该市场评价指数序列可以得出市场对该公司的评价的变化及趋势,供市场分析。In this embodiment, according to the chronological order, a corresponding market evaluation index sequence is generated according to the market evaluation index of the above-mentioned individual entity, and a market evaluation index sequence of the company to which the individual entity belongs is obtained, and according to the market evaluation index sequence, the market pair can be obtained. The company's evaluation of changes and trends for market analysis.
与现有技术相比,本申请对各个股实体在每一时间截面的不同的文本对象类型的金融文本数据,利用预定的文本分析方法进行分析,得出每一金融文本数据的评价,对各文本对象类型下的金融文本数据的各评价等级的数量进行统计并计算各评价等级的比重,根据各评价等级的属性分值及比重计算该个股实体在该时间点的市场评价指数,根据该市场评价指数可以得出市场对该公司在该时间点的评价,本申请对将金融文本数据分为不同的文本对象类型并采用预定的文本分析方法进行分析的方式,能够充分挖掘得到准确的市场信息,按照时间顺序生成市场评价指数序列,可以得出市场对该公司的评价的变化及趋势,供进行市场分析。Compared with the prior art, the present application analyzes the financial text data of different text object types of each stock entity at each time section by using a predetermined text analysis method, and obtains an evaluation of each financial text data. The number of each evaluation level of the financial text data under the text object type is counted and the proportion of each evaluation level is calculated, and the market evaluation index of the individual entity at the time point is calculated according to the attribute score and the specific gravity of each evaluation level, according to the market The evaluation index can be used to evaluate the company's evaluation of the company at that point in time. This application can fully exploit the accurate market information by dividing the financial text data into different text object types and using predetermined text analysis methods. The sequence of market evaluation indexes is generated in chronological order, and the changes and trends of the evaluation of the company by the market can be obtained for market analysis.
在一实施例中,在上述实施例的基础上,所述处理系统被所述处理器执行时,还实现如下步骤:In an embodiment, based on the foregoing embodiment, when the processing system is executed by the processor, the following steps are further implemented:
按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的总市值计算该个股实体的市值比重;根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。According to the predetermined industry classification method, each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity; according to the latest total of each individual entity Calculating the market value of the entity by calculating the market value and the total market value corresponding to the industry category to which the entity belongs; calculating the industry evaluation index of the entity at the time based on the market evaluation index and the market value of the entity at the time; Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
其中,预定的行业分类方法例如为申万行业分类法。在一实施例中,可以将沪深两市所有个股实体划分为如下28个行业类别,包括:采掘、化工、钢铁、有色金属、建筑材料、建筑装饰、电气设备、机械设备、国防军工、汽车、家用电器、纺织服装、轻工制造、商业贸易、农林牧渔、食品饮料、休闲服务、医药生物、公用事业、交通运输、房地产、电子、计算机、传媒、通信、银行、非银金融、综合。Among them, the predetermined industry classification method is, for example, the Shenwan industry classification method. In an embodiment, all the entities in the Shanghai and Shenzhen stock exchanges can be divided into the following 28 industry categories, including: mining, chemical, steel, non-ferrous metals, building materials, architectural decoration, electrical equipment, mechanical equipment, national defense military, automobile , household appliances, textile and garment, light industry manufacturing, commercial trade, agriculture, forestry, animal husbandry and fishery, food and beverage, leisure services, medical and biological, public utilities, transportation, real estate, electronics, computers, media, communications, banking, non-banking finance, comprehensive .
以银行所在的行业类别为例,计算银行的行业评价指数包括:首先,提取各个股实体的最新总市值,将各个股实体的最新总市值相加得到行业总市值;其次,计算各个股实体的最新总市值占行业总市值的市值比重:市值比重=个股实体的最新总市值/行业总市值*100%;然后,根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数:行业评价指数=个股实体的市值比重*市场评价指数;最后,按照上述的方法,计算各个时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Taking the industry category of the bank as an example, calculating the bank's industry evaluation index includes: first, extracting the latest total market value of each stock entity, adding the latest total market value of each stock entity to the total market value of the industry; second, calculating the individual stock entity The market value of the latest total market capitalization of the total market capitalization: the market value ratio = the latest total market value of the individual entity / the total market value of the industry * 100%; then, based on the market evaluation index of the stock entity at that point in time and the market value of the stock entity The industry evaluation index at this point in time: industry evaluation index = market value of individual stocks * market evaluation index; finally, according to the above method, calculate the industry evaluation index at each time point, and the industry evaluation index at each time point according to time The sequence of industry evaluation indexes corresponding to the stock entity is generated in sequence.
通过上述的个股实体的行业评价指数,可以得出行业对该个股实体的评价的变化,通过个股实体的行业评价指数序列,可以得出行业对该个股实体的评价的变化及趋势,供市场分析。Through the above-mentioned industry evaluation index of individual stock entities, it can be concluded that the industry's evaluation of the stock entity is changed. Through the industry evaluation index sequence of individual stock entities, the changes and trends of the industry's evaluation of the stock entities can be obtained for market analysis. .
在一实施例中,在上述实施例的基础上,所述处理系统被所述处理器执行时,还实现如下步骤:将属于同一行业类别的个股实体在同一时间点的行业评价指数相加得到该行业类别在该时间点的市场指数;获取该行业类别在每一时间点的市场指数,将每一时间点的市场指数按照时间先后顺序生成该行业类别对应的市场指数序列。In an embodiment, on the basis of the foregoing embodiment, when the processing system is executed by the processor, the following steps are further implemented: adding the industry evaluation indexes of the individual entities belonging to the same industry category at the same time point to obtain The market index of the industry category at the time point; obtaining the market index of the industry category at each time point, and generating the market index sequence corresponding to the industry category in chronological order for the market index at each time point.
其中,通过上述的行业类别的市场指数,可以得出市场对该行业的评价的变化,通过行业类别对应的市场指数序列,可以得出市场对该行业的的评价的变化及趋势,供市场分析。Among them, through the market index of the above-mentioned industry categories, it can be concluded that the market's evaluation of the industry changes, and the market index series corresponding to the industry category can be used to derive the changes and trends of the market's evaluation of the industry for market analysis. .
此外,将在同一时间点的所有行业的市场指数进行汇总,就可以得到市场对整个资本市场的情绪表达和看法,供市场分析。In addition, by summarizing the market indices of all industries at the same time point, the market's emotional expression and views on the entire capital market can be obtained for market analysis.
如图2所示,图2为本申请文本数据的处理方法一实施例的流程示意图, 该文本数据的处理方法包括以下步骤:As shown in FIG. 2, FIG. 2 is a schematic flowchart of an embodiment of a method for processing text data according to the present application. The method for processing text data includes the following steps:
步骤S1,按照预设的分类规则将各种金融文本数据分为对应的文本对象类型;Step S1, dividing various financial text data into corresponding text object types according to a preset classification rule;
其中,预设的分类规则为将业绩相关的金融文本数据的文本对象类型归为业绩类型、将融资相关的金融文本数据的文本对象类型归为融资类型、将公司治理相关的金融文本数据的文本对象类型归为公司治理类型、将分析师相关的金融文本数据的文本对象类型归为分析师类型、将除了上述4种类型外的金融文本数据的文本对象类型归为其他类型,如上述表1所示。The default classification rule is to classify the text object type of the performance-related financial text data as the performance type, the text object type of the financing-related financial text data into the financing type, and the text of the financial text data related to the corporate governance. The object type is classified as the corporate governance type, the text object type of the financial text data related to the analyst is classified as the analyst type, and the text object type of the financial text data other than the above four types is classified into other types, as in Table 1 above. Shown.
在表1中,业绩类型包括业绩报告、业绩快报、业绩超预期报道,融资类型包括:定向增发、定向破发,公司治理类型包括:高管增减持、大股东减持、股权激励、员工持股,分析师类型包括盈利预测大幅调升、分析师突然关注,其他类型包括:高送转、指数成分股调整、较早披露年报、长期不出公告。In Table 1, the performance types include performance reports, performance reports, and performance reports. The types of financing include: private placements and targeted breaks. Corporate governance types include: executive increase and decrease, major shareholder reduction, equity incentives, and employee holdings. Shares, analyst types include a sharp increase in earnings forecasts, analysts suddenly concerned, other types include: high delivery, index component adjustments, early disclosure of annual reports, long-term announcements.
步骤S2,利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级;Step S2, analyzing financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining an evaluation level corresponding to each financial text data;
其中,个股实体为上市公司,每一个股实体在各个时间点会产生一些金融文本数据,时间点可以是每分钟、每个小时、每一天等等。Among them, the individual stock entities are listed companies, and each stock entity will generate some financial text data at each time point. The time points can be every minute, every hour, every day, and so on.
在一实施例中,如图3所示,所述利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级的步骤,具体包括:In an embodiment, as shown in FIG. 3, the financial text data of each text object type of each stock entity at each time point is analyzed by using a predetermined text analysis method, and the evaluation level corresponding to each financial text data is obtained. The steps include:
步骤S21,利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;步骤S22,将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;步骤S23,将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;步骤S24,统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析结果获取该金融文本数据对应的评价等级。Step S21, using a predetermined word segmentation model to segment each financial text data to obtain a word segment corresponding to each financial text data; in step S22, input the word segment corresponding to each financial text data into a predetermined conversion model, and obtain each output of the output. a word vector corresponding to the financial text data; in step S23, the word vector corresponding to each financial text data is input into a predetermined sentiment analysis model, and the sentiment analysis result of each sentence in the output financial text data is obtained; step S24, The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
其中,通过一个已经训练好的分词模型来对金融文本数据的文本进行分词,该分词模型为训练好的神经网络分词模型,优选地为长短期记忆循环神 经网络。训练该神经网络分词模型的过程包括:1、从语料库里提取大量表注好的词语,其中,模型训练使用的是预定的切分语料,例如经典的bakeoff2005中的微软研究院的切分语料。2、将其中的训练部分进行训练,将测试部分作为最终的测试。3、通过对比神经网络分词模型输入输出的结果(采用的是序列标注法)来评判模型的误差,如果测试效果达到0.95以上,则这个神经网络分词模型训练完毕。Among them, the text of the financial text data is segmented by an already trained word segmentation model, which is a trained neural network segmentation model, preferably a long-term and short-term memory cycle neural network. The process of training the neural network segmentation model includes: 1. Extracting a large number of well-written words from the corpus, wherein the model training uses predetermined segmentation corpora, such as the classic snippet corpus of Microsoft Research in bakeoff2005. 2. Train the training part and use the test part as the final test. 3. By comparing the input and output results of the neural network segmentation model (using the sequence labeling method) to judge the error of the model, if the test effect reaches 0.95 or above, the neural network segmentation model is completed.
其中,预定的转换模型为word2vec模型,word2vec模型包括三层神经网络,可以将一个词表示为词向量,将文字数字化。将金融文本数据对应的分词输入至word2vec模型,得到该金融文本数据对应的词向量。Among them, the predetermined conversion model is the word2vec model, and the word2vec model includes a three-layer neural network, which can represent a word as a word vector and digitize the text. The word segment corresponding to the financial text data is input to the word2vec model to obtain a word vector corresponding to the financial text data.
其中,预定的情感分析模型为基于短卷积神经网络的短文本情感分析模型(Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts),该模型主要结构是输入一个句子文本对应的句向量,经过两层卷积神经网络(Convolutional Neural Network,CNN)后,转化成一个sentence-level的向量,然后将这个向量输入到一个3层的神经网络中,经过训练得到该句子正确的情感分析结果。The predetermined sentiment analysis model is a Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. The main structure of the model is to input a sentence vector corresponding to a sentence text, after two layers. After the Convolutional Neural Network (CNN), it is transformed into a sentence-level vector, and then the vector is input into a 3-layer neural network, and the correct sentiment analysis result of the sentence is obtained through training.
在一实施例中,情感分析结果包括三种,例如为:[-1,0,1],其中-1表示该句子表达的情绪是负面的、消极的,0代表该句子表达的情绪为偏中立的,1代表该句子表达的看法是偏积极的。In an embodiment, the sentiment analysis result includes three types, for example: [-1, 0, 1], wherein -1 indicates that the emotion expressed by the sentence is negative and negative, and 0 indicates that the emotion expressed by the sentence is biased. Neutral, 1 means that the expression expressed in the sentence is positive.
此外,3层的神经网络的输出维度size可以自行调整,可以是上述三维的[-1,0,1],也可以是二维的[-1,1],其取值从-1到1,偏向1代表则该句子表达的看法是偏积极的,偏向-1则代表该句子表达的情绪是负面的、消极的,等。In addition, the output dimension of the 3-layer neural network can be adjusted by itself, which can be the above three-dimensional [-1, 0, 1], or two-dimensional [-1, 1], and its value is from -1 to 1. The preference for 1 means that the expression of the sentence is positive, and the bias of -1 means that the emotion expressed by the sentence is negative, negative, and so on.
其中,金融文本数据的评价等级包括第一等级、第二等级及第三等级,评价等级也可以是好评、中评及差评。第一等级对应上述的情感分析结果1,第二等级对应上述的情感分析结果0,对应上述的情感分析结果-1。最后将所有的句子的输出结果融合在一起,计算各种情感分析结果的总数,如果情感分析结果的总数为1的最多,则金融文本数据的为第一等级,如果情感分析结果的总数为0的最多,则金融文本数据的为第二等级,如果情感分析结果的总数为-1的最多,则金融文本数据的为第三等级。The evaluation level of the financial text data includes the first level, the second level, and the third level, and the evaluation level may also be a good rating, a middle rating, and a bad rating. The first level corresponds to the above-described sentiment analysis result 1, and the second level corresponds to the above-described sentiment analysis result 0, corresponding to the above-described sentiment analysis result-1. Finally, the output of all sentences is fused together to calculate the total number of sentiment analysis results. If the total number of sentiment analysis results is the largest, the financial text data is the first level, and if the total number of sentiment analysis results is 0. At most, the financial text data is in the second level, and if the total number of sentiment analysis results is -1, the financial text data is in the third level.
步骤S3,对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,基于统计后的各评价等级的数量计算各评价等级的比重;Step S3, the number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
其中,对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,包括:对各文本对象类型下的金融文本数据的第一等级、第二等级及第三等级的数量进行统计,以公司A的数据为例进行说明,如上述表2所示。The statistics of the number of evaluation levels of the financial text data under each text object type include: counting the number of the first level, the second level, and the third level of the financial text data under each text object type, The data of company A is taken as an example, as shown in Table 2 above.
在表2中,第一等级、第二等级及第三等级的数量分别为10、5、4,则计算评价总数=10+5+4=19,第一等级的比重=10/19*100%=52.63%,第二等级的比重=5/19*100%=26.32%,第三等级的比重=4/19*100%=21.05%。In Table 2, the number of the first level, the second level, and the third level are 10, 5, and 4, respectively, and the total number of evaluations is 10 + 5 + 4 = 19, and the specific gravity of the first level = 10 / 19 * 100 %=52.63%, the specific gravity of the second grade=5/19*100%=26.32%, and the specific gravity of the third grade=4/19*100%=21.05%.
步骤S4,获取各评价等级的属性分值,根据各评价等级的属性分值及各评价等级的比重计算该个股实体在该时间点对应的市场评价指数。In step S4, attribute scores of each evaluation level are obtained, and the market evaluation index corresponding to the stock entity at the time point is calculated according to the attribute score of each evaluation level and the proportion of each evaluation level.
其中,第一等级的属性分值为1,第二等级的属性分值为0,第三等级的属性分值为-1,市场评价指数=100*[第一等级的比重*1+第二等级的比重*0+第三等级的比重*(-1)]。以上述公司A为例,则市场评价指数=100*(52.63%*1+26.32%*0+21.05%*(-1))=31.58。根据该市场评价指数可以得出市场对该公司在该时间点的评价,供市场分析。The first level has an attribute score of 1, the second level has an attribute score of 0, the third level has an attribute score of -1, and the market evaluation index = 100* [the first level of the proportion *1 + second The proportion of the level *0 + the proportion of the third level * (-1)]. Taking the above company A as an example, the market evaluation index=100*(52.63%*1+26.32%*0+21.05%*(-1))=31.58. According to the market evaluation index, the market's evaluation of the company at that point in time can be obtained for market analysis.
步骤S5,获取该个股实体在每一时间点的市场评价指数,将每一时间点的市场评价指数按照时间先后顺序生成该个股实体对应的市场评价指数序列。Step S5: Obtain a market evaluation index of the stock entity at each time point, and generate a market evaluation index sequence corresponding to the stock entity in a chronological order for the market evaluation index at each time point.
本实施例中,按照时间顺序,根据上述的个股实体的市场评价指数生成对应的市场评价指数序列,得到该个股实体所属的公司的市场评价指数序列,根据该市场评价指数序列可以得出市场对该公司的评价的变化及趋势,供市场分析。In this embodiment, according to the chronological order, a corresponding market evaluation index sequence is generated according to the market evaluation index of the above-mentioned individual entity, and a market evaluation index sequence of the company to which the individual entity belongs is obtained, and according to the market evaluation index sequence, the market pair can be obtained. The company's evaluation of changes and trends for market analysis.
本申请对将金融文本数据分为不同的文本对象类型并采用预定的文本分析方法进行分析的方式,能够充分挖掘得到准确的市场信息,按照时间顺序生成市场评价指数序列,可以得出市场对该公司的评价的变化及趋势,供进行市场分析。The application divides the financial text data into different text object types and analyzes by using a predetermined text analysis method, and can fully extract accurate market information, and generate a market evaluation index sequence in time sequence, which can be obtained from the market. Changes and trends in the company's evaluation for market analysis.
在一实施例中,如图4所示,在上述实施例的基础上,该文本数据的处理方法,还包括:In an embodiment, as shown in FIG. 4, on the basis of the foregoing embodiment, the processing method of the text data further includes:
步骤S6,按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;步骤S7,根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的总市值计算该个股实体的市值比重;步骤S8,根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;步骤S9,获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Step S6, according to a predetermined industry classification method, each individual entity is divided into corresponding industry categories, obtaining the latest total market value of each individual entity, and calculating the total market value corresponding to each industry category according to the latest total market value of each individual entity; step S7, according to Calculating the market value of the entity by calculating the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs; step S8, calculating the entity according to the market evaluation index and the market value of the stock entity at the time point At the point of time, the industry evaluation index; step S9, obtaining the industry evaluation index of the stock entity at each time point, and generating the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
其中,预定的行业分类方法例如为申万行业分类法。在一实施例中,可以将沪深两市所有个股实体划分为如下28个行业类别,包括:采掘、化工、钢铁、有色金属、建筑材料、建筑装饰、电气设备、机械设备、国防军工、汽车、家用电器、纺织服装、轻工制造、商业贸易、农林牧渔、食品饮料、休闲服务、医药生物、公用事业、交通运输、房地产、电子、计算机、传媒、通信、银行、非银金融、综合。Among them, the predetermined industry classification method is, for example, the Shenwan industry classification method. In an embodiment, all the entities in the Shanghai and Shenzhen stock exchanges can be divided into the following 28 industry categories, including: mining, chemical, steel, non-ferrous metals, building materials, architectural decoration, electrical equipment, mechanical equipment, national defense military, automobile , household appliances, textile and garment, light industry manufacturing, commercial trade, agriculture, forestry, animal husbandry and fishery, food and beverage, leisure services, medical and biological, public utilities, transportation, real estate, electronics, computers, media, communications, banking, non-banking finance, comprehensive .
以银行所在的行业类别为例,计算银行的行业评价指数包括:首先,提取各个股实体的最新总市值,将各个股实体的最新总市值相加得到行业总市值;其次,计算各个股实体的最新总市值占行业总市值的市值比重:市值比重=个股实体的最新总市值/行业总市值*100%;然后,根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数:行业评价指数=个股实体的市值比重*市场评价指数;最后,按照上述的方法,计算各个时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Taking the industry category of the bank as an example, calculating the bank's industry evaluation index includes: first, extracting the latest total market value of each stock entity, adding the latest total market value of each stock entity to the total market value of the industry; second, calculating the individual stock entity The market value of the latest total market capitalization of the total market capitalization: the market value ratio = the latest total market value of the individual entity / the total market value of the industry * 100%; then, based on the market evaluation index of the stock entity at that point in time and the market value of the stock entity The industry evaluation index at this point in time: industry evaluation index = market value of individual stocks * market evaluation index; finally, according to the above method, calculate the industry evaluation index at each time point, and the industry evaluation index at each time point according to time The sequence of industry evaluation indexes corresponding to the stock entity is generated in sequence.
通过上述的个股实体的行业评价指数,可以得出行业对该个股实体的评价的变化,通过个股实体的行业评价指数序列,可以得出行业对该个股实体的评价的变化及趋势,供市场分析。Through the above-mentioned industry evaluation index of individual stock entities, it can be concluded that the industry's evaluation of the stock entity is changed. Through the industry evaluation index sequence of individual stock entities, the changes and trends of the industry's evaluation of the stock entities can be obtained for market analysis. .
在一实施例中,在上述实施例的基础上,该文本数据的处理方法,还包括:将属于同一行业类别的个股实体在同一时间点的行业评价指数相加得到该行业类别在该时间点的市场指数;获取该行业类别在每一时间点的市场指 数,将每一时间点的市场指数按照时间先后顺序生成该行业类别对应的市场指数序列。In an embodiment, on the basis of the foregoing embodiment, the method for processing the text data further includes: adding the industry evaluation indexes of the individual entities belonging to the same industry category at the same time point to obtain the industry category at the time point. Market index; obtain the market index of the industry category at each time point, and generate the market index sequence corresponding to the industry category in chronological order for the market index at each time point.
其中,通过上述的行业类别的市场指数,可以得出市场对该行业的评价的变化,通过行业类别对应的市场指数序列,可以得出市场对该行业的的评价的变化及趋势,供市场分析。Among them, through the market index of the above-mentioned industry categories, it can be concluded that the market's evaluation of the industry changes, and the market index series corresponding to the industry category can be used to derive the changes and trends of the market's evaluation of the industry for market analysis. .
此外,将在同一时间点的所有行业的市场指数进行汇总,就可以得到市场对整个资本市场的情绪表达和看法,供市场分析。In addition, by summarizing the market indices of all industries at the same time point, the market's emotional expression and views on the entire capital market can be obtained for market analysis.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现上述的文本数据的处理方法的步骤。The present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being implemented by a processor to implement the steps of the text data processing method described above.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims (20)

  1. 一种服务器,其特征在于,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:A server, comprising: a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The following steps are implemented during execution:
    按照预设的分类规则将各种金融文本数据分为对应的文本对象类型,其中,文本对象类型包括业绩类型、融资类型、公司治理类型、分析师类型及其他类型;According to a preset classification rule, various financial text data are classified into corresponding text object types, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
    利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级;Using a predetermined text analysis method, analyzing financial text data of each text object type of each stock entity at each time point, and obtaining an evaluation level corresponding to each financial text data;
    对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,基于统计后的各评价等级的数量计算各评价等级的比重;The number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
    获取各评价等级对应的属性分值,根据各评价等级对应的属性分值及各评价等级的比重计算该个股实体在该时间点的市场评价指数;Obtaining an attribute score corresponding to each evaluation level, and calculating a market evaluation index of the stock entity at the time point according to the attribute score corresponding to each evaluation level and the proportion of each evaluation level;
    获取该个股实体在每一时间点的市场评价指数,将每一时间点的市场评价指数按照时间先后顺序生成该个股实体对应的市场评价指数序列。Obtain the market evaluation index of the stock entity at each time point, and generate the market evaluation index sequence corresponding to the stock entity in the chronological order of the market evaluation index at each time point.
  2. 根据权利要求1所述的服务器,其特征在于,所述评价等级包括第一等级、第二等级及第三等级,所述第一等级的属性分值为1,所述第二等级的属性分值为0,所述第三等级的属性分值为-1,所述市场评价指数=100*[第一等级的比重*1+第二等级的比重*0+第三等级的比重*(-1)]。The server according to claim 1, wherein said evaluation level comprises a first level, a second level, and a third level, said first level having an attribute score of 1, said second level of attribute points The value is 0, the attribute score of the third level is -1, the market evaluation index = 100* [the proportion of the first level * 1 + the proportion of the second level * 0 + the proportion of the third level * (- 1)].
  3. 根据权利要求1所述的服务器,其特征在于,所述利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级的步骤,具体包括:The server according to claim 1, wherein said using a predetermined text analysis method analyzes financial text data of each text object type of each stock entity at each time point, and obtains an evaluation corresponding to each financial text data. The steps of the level include:
    利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data;
    将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;Entering a word segment corresponding to each financial text data into a predetermined conversion model, and obtaining a word vector corresponding to each financial text data outputted;
    将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;Inputting a word vector corresponding to each financial text data into a predetermined sentiment analysis model, and obtaining an sentiment analysis result of each sentence in the output financial text data;
    统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析 结果获取该金融文本数据对应的评价等级。The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
  4. 根据权利要求2所述的服务器,其特征在于,所述利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级的步骤,具体包括:The server according to claim 2, wherein said analyzing the financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining corresponding evaluation of each financial text data The steps of the level include:
    利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data;
    将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;Entering a word segment corresponding to each financial text data into a predetermined conversion model, and obtaining a word vector corresponding to each financial text data outputted;
    将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;Inputting a word vector corresponding to each financial text data into a predetermined sentiment analysis model, and obtaining an sentiment analysis result of each sentence in the output financial text data;
    统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析结果获取该金融文本数据对应的评价等级。The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
  5. 根据权利要求1所述的服务器,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:The server according to claim 1, wherein when said processing system is executed by said processor, the following steps are further implemented:
    按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;According to the predetermined industry classification method, each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity;
    根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的总市值计算该个股实体的市值比重;Calculating the market value of the entity based on the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs;
    根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;Calculating the industry evaluation index of the stock entity at the time point according to the market evaluation index of the stock entity at the time point and the market value ratio;
    获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  6. 根据权利要求2所述的服务器,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:The server according to claim 2, wherein when said processing system is executed by said processor, the following steps are further implemented:
    按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;According to the predetermined industry classification method, each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity;
    根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的 总市值计算该个股实体的市值比重;Calculating the market value of the entity based on the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs;
    根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;Calculating the industry evaluation index of the stock entity at the time point according to the market evaluation index of the stock entity at the time point and the market value ratio;
    获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  7. 一种文本数据的处理方法,其特征在于,所述文本数据的处理方法包括:A method for processing text data, characterized in that the processing method of the text data comprises:
    S1,按照预设的分类规则将各种金融文本数据分为对应的文本对象类型,其中,文本对象类型包括业绩类型、融资类型、公司治理类型、分析师类型及其他类型;S1, classifying various financial text data into corresponding text object types according to a preset classification rule, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
    S2,利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级;S2, analyzing financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining an evaluation level corresponding to each financial text data;
    S3,对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,基于统计后的各评价等级的数量计算各评价等级的比重;S3, the number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
    S4,获取各评价等级对应的属性分值,根据各评价等级对应的属性分值及各评价等级的比重计算该个股实体在该时间点的市场评价指数;S4, obtaining an attribute score corresponding to each evaluation level, and calculating a market evaluation index of the stock entity at the time point according to the attribute score corresponding to each evaluation level and the proportion of each evaluation level;
    S5,获取该个股实体在每一时间点的市场评价指数,将每一时间点的市场评价指数按照时间先后顺序生成该个股实体对应的市场评价指数序列。S5: Obtain a market evaluation index of the stock entity at each time point, and generate a market evaluation index sequence corresponding to the stock entity in a chronological order by the market evaluation index at each time point.
  8. 根据权利要求7所述的文本数据的处理方法,其特征在于,所述评价等级包括第一等级、第二等级及第三等级,所述第一等级的属性分值为1,所述第二等级的属性分值为0,所述第三等级的属性分值为-1,所述市场评价指数=100*[第一等级的比重*1+第二等级的比重*0+第三等级的比重*(-1)]。The method of processing text data according to claim 7, wherein the evaluation level comprises a first level, a second level, and a third level, and the attribute level of the first level is 1, the second The attribute score of the level is 0, the attribute score of the third level is -1, the market evaluation index = 100* [the proportion of the first level * 1 + the proportion of the second level * 0 + the third level Specific gravity * (-1)].
  9. 根据权利要求7所述的文本数据的处理方法,其特征在于,所述步骤S2,具体包括:The method of processing text data according to claim 7, wherein the step S2 comprises:
    利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data;
    将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;Entering a word segment corresponding to each financial text data into a predetermined conversion model, and obtaining a word vector corresponding to each financial text data outputted;
    将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;Inputting a word vector corresponding to each financial text data into a predetermined sentiment analysis model, and obtaining an sentiment analysis result of each sentence in the output financial text data;
    统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析结果获取该金融文本数据对应的评价等级。The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
  10. 根据权利要求8所述的文本数据的处理方法,其特征在于,所述步骤S2,具体包括:The method of processing text data according to claim 8, wherein the step S2 comprises:
    利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data;
    将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;Entering a word segment corresponding to each financial text data into a predetermined conversion model, and obtaining a word vector corresponding to each financial text data outputted;
    将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;Inputting a word vector corresponding to each financial text data into a predetermined sentiment analysis model, and obtaining an sentiment analysis result of each sentence in the output financial text data;
    统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析结果获取该金融文本数据对应的评价等级。The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
  11. 根据权利要求7所述的文本数据的处理方法,其特征在于,该文本数据的处理方法,还包括:The method for processing text data according to claim 7, wherein the processing method of the text data further comprises:
    按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;According to the predetermined industry classification method, each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity;
    根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的总市值计算该个股实体的市值比重;Calculating the market value of the entity based on the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs;
    所述步骤S4之后,还包括:After the step S4, the method further includes:
    根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;Calculating the industry evaluation index of the stock entity at the time point according to the market evaluation index of the stock entity at the time point and the market value ratio;
    获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  12. 根据权利要求8所述的文本数据的处理方法,其特征在于,该文本数据的处理方法,还包括:The method for processing text data according to claim 8, wherein the processing method of the text data further comprises:
    按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各 个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;According to the predetermined industry classification method, each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity;
    根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的总市值计算该个股实体的市值比重;Calculating the market value of the entity based on the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs;
    所述步骤S4之后,还包括:After the step S4, the method further includes:
    根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;Calculating the industry evaluation index of the stock entity at the time point according to the market evaluation index of the stock entity at the time point and the market value ratio;
    获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  13. 根据权利要求11或12所述的文本数据的处理方法,其特征在于,该文本数据的处理方法,还包括:The method for processing text data according to claim 11 or 12, wherein the method for processing the text data further comprises:
    将属于同一行业类别的个股实体在同一时间点的行业评价指数相加得到该行业类别在该时间点的市场指数;Adding the industry evaluation indexes of the individual entities belonging to the same industry category at the same time point to obtain the market index of the industry category at that point in time;
    获取该行业类别在每一时间点的市场指数,将每一时间点的市场指数按照时间先后顺序生成该行业类别对应的市场指数序列。Obtain a market index of the industry category at each time point, and generate a market index sequence corresponding to the industry category in a chronological order for the market index at each time point.
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现步骤:A computer readable storage medium, wherein the computer readable storage medium stores a processing system, and when the processing system is executed by the processor, the steps are:
    按照预设的分类规则将各种金融文本数据分为对应的文本对象类型,其中,文本对象类型包括业绩类型、融资类型、公司治理类型、分析师类型及其他类型;According to a preset classification rule, various financial text data are classified into corresponding text object types, wherein the text object types include a performance type, a financing type, a corporate governance type, an analyst type, and other types;
    利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级;Using a predetermined text analysis method, analyzing financial text data of each text object type of each stock entity at each time point, and obtaining an evaluation level corresponding to each financial text data;
    对各文本对象类型下的金融文本数据的各评价等级的数量进行统计,基于统计后的各评价等级的数量计算各评价等级的比重;The number of each evaluation level of the financial text data under each text object type is counted, and the proportion of each evaluation level is calculated based on the number of each evaluation level after the statistics;
    获取各评价等级对应的属性分值,根据各评价等级对应的属性分值及各评价等级的比重计算该个股实体在该时间点的市场评价指数;Obtaining an attribute score corresponding to each evaluation level, and calculating a market evaluation index of the stock entity at the time point according to the attribute score corresponding to each evaluation level and the proportion of each evaluation level;
    获取该个股实体在每一时间点的市场评价指数,将每一时间点的市场评价指数按照时间先后顺序生成该个股实体对应的市场评价指数序列。Obtain the market evaluation index of the stock entity at each time point, and generate the market evaluation index sequence corresponding to the stock entity in the chronological order of the market evaluation index at each time point.
  15. 根据权利要求14所述的计算机可读存储介质,其特征在于,所述评 价等级包括第一等级、第二等级及第三等级,所述第一等级的属性分值为1,所述第二等级的属性分值为0,所述第三等级的属性分值为-1,所述市场评价指数=100*[第一等级的比重*1+第二等级的比重*0+第三等级的比重*(-1)]。The computer readable storage medium of claim 14, wherein the rating level comprises a first level, a second level, and a third level, the first level having an attribute score of 1, the second The attribute score of the level is 0, the attribute score of the third level is -1, the market evaluation index = 100* [the proportion of the first level * 1 + the proportion of the second level * 0 + the third level Specific gravity * (-1)].
  16. 根据权利要求14所述的计算机可读存储介质,其特征在于,所述利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级的步骤,具体包括:The computer readable storage medium according to claim 14, wherein said analyzing the financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining each financial text The steps of the evaluation level corresponding to the data specifically include:
    利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data;
    将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;Entering a word segment corresponding to each financial text data into a predetermined conversion model, and obtaining a word vector corresponding to each financial text data outputted;
    将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;Inputting a word vector corresponding to each financial text data into a predetermined sentiment analysis model, and obtaining an sentiment analysis result of each sentence in the output financial text data;
    统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析结果获取该金融文本数据对应的评价等级。The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
  17. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述利用预定的文本分析方法分析每一个股实体在每一时间点下的各文本对象类型的金融文本数据,得到各金融文本数据对应的评价等级的步骤,具体包括:The computer readable storage medium according to claim 15, wherein said analyzing the financial text data of each text object type of each stock entity at each time point by using a predetermined text analysis method, and obtaining each financial text The steps of the evaluation level corresponding to the data specifically include:
    利用预定的分词模型对每一金融文本数据进行分词,得到每一金融文本数据对应的分词;Each financial text data is segmented by a predetermined word segmentation model to obtain a word segment corresponding to each financial text data;
    将每一金融文本数据对应的分词输入至预定的转换模型,获取输出的每一金融文本数据对应的词向量;Entering a word segment corresponding to each financial text data into a predetermined conversion model, and obtaining a word vector corresponding to each financial text data outputted;
    将每一金融文本数据对应的词向量输入至预定的情感分析模型中,获取输出的该金融文本数据中每一语句的情感分析结果;Inputting a word vector corresponding to each financial text data into a predetermined sentiment analysis model, and obtaining an sentiment analysis result of each sentence in the output financial text data;
    统计该金融文本数据中各语句的情感分析结果,根据所统计的情感分析结果获取该金融文本数据对应的评价等级。The sentiment analysis result of each sentence in the financial text data is counted, and the evaluation level corresponding to the financial text data is obtained according to the statistical sentiment analysis result.
  18. 根据权利要求14所述的计算机可读存储介质,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:The computer readable storage medium of claim 14, wherein when the processing system is executed by the processor, the following steps are further implemented:
    按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各 个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;According to the predetermined industry classification method, each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity;
    根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的总市值计算该个股实体的市值比重;Calculating the market value of the entity based on the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs;
    根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;Calculating the industry evaluation index of the stock entity at the time point according to the market evaluation index of the stock entity at the time point and the market value ratio;
    获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  19. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:The computer readable storage medium of claim 15, wherein when the processing system is executed by the processor, the following steps are further implemented:
    按照预定的行业分类方法将各个个股实体分至对应的行业类别,获取各个个股实体的最新总市值,根据各个个股实体的最新总市值计算各个行业类别对应的总市值;According to the predetermined industry classification method, each individual entity is divided into corresponding industry categories, the latest total market value of each individual entity is obtained, and the total market value corresponding to each industry category is calculated according to the latest total market value of each individual entity;
    根据各个个股实体的最新总市值及该个股实体所属的行业类别对应的总市值计算该个股实体的市值比重;Calculating the market value of the entity based on the latest total market value of each individual entity and the total market value corresponding to the industry category to which the entity belongs;
    根据该个股实体在该时间点的市场评价指数及该市值比重计算该个股实体在该时间点的行业评价指数;Calculating the industry evaluation index of the stock entity at the time point according to the market evaluation index of the stock entity at the time point and the market value ratio;
    获取该个股实体在每一时间点的行业评价指数,将每一时间点的行业评价指数按照时间先后顺序生成该个股实体对应的行业评价指数序列。Obtain the industry evaluation index of the stock entity at each time point, and generate the industry evaluation index sequence corresponding to the stock entity in the chronological order of the industry evaluation index at each time point.
  20. 根据权利要求18或19所述的计算机可读存储介质,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:将属于同一行业类别的个股实体在同一时间点的行业评价指数相加得到该行业类别在该时间点的市场指数;获取该行业类别在每一时间点的市场指数,将每一时间点的市场指数按照时间先后顺序生成该行业类别对应的市场指数序列。A computer readable storage medium according to claim 18 or claim 19, wherein when said processing system is executed by said processor, the step of: realizing individual entities belonging to the same industry category at the same time point in the industry The evaluation index is added to obtain the market index of the industry category at the time point; obtaining the market index of the industry category at each time point, and generating the market index sequence corresponding to the industry category in chronological order for the market index at each time point .
PCT/CN2018/102135 2018-05-16 2018-08-24 Server, method for processing text data and storage medium WO2019218517A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810469419.XA CN108764981A (en) 2018-05-16 2018-05-16 Server, the processing method of text data and storage medium
CN201810469419.X 2018-05-16

Publications (1)

Publication Number Publication Date
WO2019218517A1 true WO2019218517A1 (en) 2019-11-21

Family

ID=64007935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102135 WO2019218517A1 (en) 2018-05-16 2018-08-24 Server, method for processing text data and storage medium

Country Status (2)

Country Link
CN (1) CN108764981A (en)
WO (1) WO2019218517A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754352A (en) * 2020-06-22 2020-10-09 平安资产管理有限责任公司 Method, device, equipment and storage medium for judging correctness of viewpoint statement
CN114386433A (en) * 2022-01-12 2022-04-22 中国农业银行股份有限公司 Data processing method, device and equipment based on emotion analysis and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN106022522A (en) * 2016-05-20 2016-10-12 南京大学 Method and system for predicting stocks based on big data published by internet
CN106202181A (en) * 2016-06-27 2016-12-07 苏州大学 A kind of sensibility classification method, Apparatus and system
CN106779149A (en) * 2016-11-21 2017-05-31 洪志令 The visual presentation method that a kind of shares changing tendency predicts the outcome
US20180053255A1 (en) * 2016-08-19 2018-02-22 Noonum System and Method for end to end investment and portfolio management using machine driven analysis of the market against qualifying factors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN106022522A (en) * 2016-05-20 2016-10-12 南京大学 Method and system for predicting stocks based on big data published by internet
CN106202181A (en) * 2016-06-27 2016-12-07 苏州大学 A kind of sensibility classification method, Apparatus and system
US20180053255A1 (en) * 2016-08-19 2018-02-22 Noonum System and Method for end to end investment and portfolio management using machine driven analysis of the market against qualifying factors
CN106779149A (en) * 2016-11-21 2017-05-31 洪志令 The visual presentation method that a kind of shares changing tendency predicts the outcome

Also Published As

Publication number Publication date
CN108764981A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN104915879B (en) The method and device that social relationships based on finance data are excavated
US10614073B2 (en) System and method for using data incident based modeling and prediction
Oliveira et al. Some experiments on modeling stock market behavior using investor sentiment analysis and posting volume from Twitter
US20130018892A1 (en) Visually Representing How a Sentiment Score is Computed
CN106611375A (en) Text analysis-based credit risk assessment method and apparatus
US20180173790A1 (en) Modifying data structures to indicate derived relationships among entity data objects
CN110263233B (en) Enterprise public opinion library construction method and device, computer equipment and storage medium
He et al. Myopic loss aversion, reference point, and money illusion
CN112634056A (en) Method, equipment and storage medium for rapidly calculating and updating enterprise share right structure
Vidal-Tomás An investigation of cryptocurrency data: The market that never sleeps
WO2019218517A1 (en) Server, method for processing text data and storage medium
Hwang et al. A logistic regression point of view toward loss given default distribution estimation
Corbet et al. Investigating the academic response to cryptocurrencies: Insights from research diversification as separated by journal ranking
Lehecka Have food and financial markets integrated?
TWI814707B (en) Method and system for facilitating financial transactions
WO2019095569A1 (en) Financial analysis method based on financial and economic event on microblog, application server, and computer readable storage medium
Mengelkamp et al. Corporate credit risk analysis utilizing textual user generated content-A Twitter based feasibility study
US11880394B2 (en) System and method for machine learning architecture for interdependence detection
JP2020013229A (en) Device, method and program for calculating default probability
Swartz et al. Relative Value Hedge Funds: A behavioral modeling of hedge fund risk and return factors
Lorca et al. Nonparametric Quantile Regression‐Based Classifiers for Bankruptcy Forecasting
Park et al. Value at risk forecasting for volatility index
Han et al. Investigation of listed companies credit risk assessment based on different learning schemes of BP neural network
CN114138976A (en) Data processing and model training method and device, electronic equipment and storage medium
Bartual et al. Default prediction of Spanish companies. A logistic analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919088

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18919088

Country of ref document: EP

Kind code of ref document: A1