WO2021012684A1 - Method and system for establishing market sentiment monitoring system - Google Patents

Method and system for establishing market sentiment monitoring system Download PDF

Info

Publication number
WO2021012684A1
WO2021012684A1 PCT/CN2020/078384 CN2020078384W WO2021012684A1 WO 2021012684 A1 WO2021012684 A1 WO 2021012684A1 CN 2020078384 W CN2020078384 W CN 2020078384W WO 2021012684 A1 WO2021012684 A1 WO 2021012684A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentiment
emotion
label
degree
market
Prior art date
Application number
PCT/CN2020/078384
Other languages
French (fr)
Chinese (zh)
Inventor
郑晴晓
于洋
程国艮
Original Assignee
中译语通科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中译语通科技股份有限公司 filed Critical 中译语通科技股份有限公司
Publication of WO2021012684A1 publication Critical patent/WO2021012684A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Definitions

  • the invention relates to the technical field of text analysis, in particular to a method and system for establishing a market sentiment monitoring system.
  • the embodiment of the present invention provides a method and system for establishing a market sentiment monitoring system to solve the problems of poor objectivity, low accuracy, and poor timeliness of the existing manual analysis of market sentiment.
  • an embodiment of the present invention provides a method for establishing a market sentiment monitoring system, including:
  • a market emotion monitoring system is constructed; the market emotion monitoring system is used to monitor market emotions.
  • an embodiment of the present invention provides a system for establishing a market sentiment monitoring system, including:
  • the sample acquisition unit is used to acquire a sample text set including multiple sample texts
  • the corpus tagging unit is used to tag any of the sample texts to obtain the emotion label and the emotion degree tag of any sample text;
  • An emotion monitoring degree obtaining unit configured to obtain the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text
  • the system establishment unit is configured to construct a market emotion monitoring system based on the emotion monitoring degree and preset emotion acuity; the market emotion monitoring system is used to monitor market emotions.
  • an embodiment of the present invention provides an electronic device that includes a processor, a communication interface, a memory, and a bus.
  • the processor, the communication interface, and the memory communicate with each other through the bus, and the processor can call logic in the memory. Instructions to perform the steps of the method provided in the first aspect.
  • an embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method provided in the first aspect are implemented.
  • a method and system for establishing a market sentiment monitoring system are obtained by labeling sample texts on corpus to obtain corresponding sentiment labels and sentiment degree labels, and then selecting sentiment monitoring degrees from the sentiment labels of a large number of sample texts to construct a market Emotion detection reflects, helps to achieve objective and accurate market sentiment monitoring, helps investors reduce misjudgments, and improves investment accuracy.
  • FIG. 1 is a schematic flowchart of a method for establishing a market sentiment monitoring system provided by an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a market sentiment monitoring system establishment system provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • Fig. 1 is a schematic flowchart of a method for establishing a market sentiment monitoring system provided by an embodiment of the present invention. As shown in Fig. 1, the method includes:
  • Step 110 Obtain a sample text set including a plurality of sample texts.
  • sample text may be a news report, a market report or a comment on a social platform, etc., which is not specifically limited in the embodiment of the present invention.
  • the sample text set contains a large number of sample texts.
  • Step 120 Perform corpus labeling on any sample text, and obtain the emotion label and emotion degree label of the sample text.
  • emotion analysis is performed on each clause in the sample text, and an emotion label and an emotion degree label for each clause are formed.
  • the emotion label refers to the emotion type obtained after sentiment analysis of the clause
  • the emotion degree label refers to the intensity of the emotion type obtained after the emotion analysis of the clause. From this, the emotion label and emotion degree label of any sample text are actually obtained for the emotion label and emotion degree label of the sentence in the sample text.
  • a corpus labeling model for corpus labeling can also be pre-trained, which can be obtained by training in the following manner: First, a large number of training texts and emotion labels and emotional degree labels corresponding to the training texts are collected; among them, The training text is the text used for the training of the corpus labeling model, and the sentiment label and sentiment degree label of the training text are labelled by professionals. Then, the initial model is trained based on the training text and its emotional label and emotional degree label to obtain the corpus labeling model.
  • the initial model may be a single neural network model, or a combination of multiple neural network models, and the embodiment of the present invention does not specifically limit the type and structure of the initial model.
  • Step 130 Obtain the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text.
  • the sentiment label and sentiment degree label of each sample text can be counted, and then sentiment monitoring degree can be selected from the sentiment label to realize the classification of market sentiment .
  • the emotion monitoring degree is a representative emotion label selected from a large number of emotion labels, and the essence of the emotion monitoring degree is a type of market sentiment.
  • Step 140 Construct a market sentiment monitoring system based on the sentiment monitoring degree and preset sentiment acuity; the market sentiment monitoring system is used to monitor market sentiment.
  • the preset emotional acuity is a preset indicator that reflects the intensity of the emotion corresponding to the emotional monitoring degree.
  • the preset emotional acuity can be divided into 6 levels from 0 to 5, and the 6-level preset emotional acuity Corresponding emotions increase in turn.
  • the market sentiment monitoring system thus constructed includes the sentiment monitoring degree and the preset emotional acuity.
  • the market sentiment detection system is applied to the market sentiment monitoring, and the emotional acuity corresponding to each emotion monitoring degree can be obtained based on the text collection collected in real time.
  • the value of the emotional acuity corresponding to any emotional monitoring degree is within the preset emotional acuity range.
  • the method provided by the embodiment of the present invention obtains the corresponding emotion label and emotion degree label by labeling the sample text, and then selects the emotion monitoring degree from the emotion label of a large number of sample texts, constructs the market emotion detection manifestation, and helps to achieve objectiveness. , Accurate market sentiment monitoring helps investors reduce misjudgments and improve investment accuracy.
  • step 110 specifically includes: determining the data collection method; the data collection method is determined according to the preset collection object, collection frequency, collection content, and object level; based on the data collection method, through web crawlers Collect sample texts from the collection objects to construct a sample text set.
  • the collection object is a text collection object, and the collection object may be a market-related institution website, a financial news website, a social media or a market report published by a research institution, etc., which is not specifically limited in the embodiment of the present invention.
  • Collection frequency is the frequency of text collection. Collection content includes title, text (including pictures and videos), release time and release platform, etc.
  • the object level is used to characterize the importance of the collection object. The object level can be based on the popularity or credibility of the collection object Decided. Each collection object is set with a corresponding collection frequency, collection content and object level.
  • a web crawler (also known as a web spider or web robot) is a program or script that automatically crawls information on the World Wide Web in accordance with preset rules. After determining the data collection method, you can follow the data collection method and grab a sample text from each collection object through a web crawler. Further, the weight for selecting the emotion monitoring degree based on the emotion label and emotion degree label of the sample text captured at the corresponding collection object can also be set according to the object level.
  • the method further includes: preprocessing the sample text set; the preprocessing includes data cleaning and data management.
  • the sample text captured from each collection object may contain a large number of typos, grammatical errors, and stop words, as well as invalid, outdated, or extremely market-related. Weak data, the existence of these problems is likely to interfere with the accuracy of emotion monitoring. Data cleaning can find and correct identifiable errors in the sample text, such as checking the consistency of the data, removing duplicates, and handling invalid and missing values. Data cleaning of the sample text can ensure the accuracy and validity of the sample text.
  • sample text captured from each collection object has very different expression styles. It is necessary to unify each sample text through standardized processing to facilitate sentiment analysis and corpus labeling of each sample text.
  • Data governance refers to the process of changing from using scattered data to using unified master data. Sample texts can be governed through six aspects: metadata, master data, data standards, data models, data quality, and data security, so as to realize the sample text Data standardization.
  • each sample text after preprocessing is applied to the corpus annotation to facilitate the selection of the subsequent emotion monitoring degree and the establishment of the market emotion monitoring system.
  • the method further includes: performing market relevance screening on the sample text set, and deleting sample texts whose market relevance is lower than a preset relevance threshold.
  • sample texts in the sample text set involve various aspects of information content, and there may be some sample texts that are not related to the market. Before applying the text to the selection of sentiment monitoring, it is necessary to filter out those that are not related to the market. Sample text, only the sample text related to the market is retained.
  • the market relevance is used to measure the degree of relevance between the sample text and the market
  • the preset relevance threshold may be a preset threshold for measuring whether the sample text is relevant to the market.
  • Any sample text can be input into the correlation model to obtain the market relevance of the sample text output by the correlation model.
  • the correlation model here can be obtained based on sample text and market correlation training corresponding to the sample text.
  • the method further includes: constructing an emotion dictionary based on the emotion label and the emotion degree label of the sample text.
  • an initial sentiment dictionary and corpus can be constructed according to the sentiment label and sentiment degree label of the sample text, and the initial sentiment dictionary and corpus can be regarded as seed words and seed libraries, and word similarity and new words can be used. Discovery, text similarity and other technologies expand the initial sentiment dictionary and corpus to obtain the sentiment dictionary.
  • step 130 specifically includes: counting the number of clauses of each emotion label and emotion level label based on the emotion label and emotion level label of each sample text; based on each emotion label and emotion level label The number of sentences of the degree label to obtain the emotional monitoring degree.
  • the sentiment label and sentiment degree label of each sentence in the sample text are counted to obtain the corresponding sentiment label and sentiment degree label.
  • the number of clauses are adopted to select the emotion monitoring degree from the emotion label to realize the classification of market sentiment.
  • the method further includes: performing result back-testing and data correction on the emotion monitoring degree.
  • the type and degree of market sentiment can be back-tested, so as to determine whether the current emotion monitoring degree can accurately reflect the emotion of the sample text.
  • the emotion monitoring degree is corrected to ensure that the emotion monitoring degree can be closer to the actual situation of the text emotion.
  • the emotion monitoring degree includes worry, fear, disappointment, panic, despair, hope, optimism, peace of mind, excitement, and excitement.
  • the method for establishing a market sentiment monitoring system specifically includes the following steps:
  • the collection objects include, but are not limited to, domestic and foreign government agency websites, financial news websites, public reports from research institutions, and social media.
  • the object level is divided according to the platform influence of the collected objects.
  • the collected content can cover information such as title, body (including pictures and videos involved in the text), release time, and release platform.
  • sample text is collected.
  • a web crawler goes to each collection object to collect sample text.
  • the sample text involves 65 languages, covering more than 200 countries and regions around the world, and the final sample text size reaches more than 50 million.
  • the data processing here includes data cleaning, data management, text screening and corpus labeling.
  • the text screening is that professionals (including financial professionals, economics majors, psychology majors, linguistics and other professional background personnel) check and filter the sample texts after cleaning and governance one by one, and screen out the sample texts related to the market.
  • Corpus labeling is to perform sentiment labeling on each sample text, obtain the sentiment label and sentiment degree label of the sentence in the sample text, and build a corpus labeling model through machine learning.
  • sentiment labels and sentiment degree labels of the clauses in the sample text sort out related words, words, fixed collocations, and clauses in the clauses, and use techniques such as vocabulary similarity, new word discovery, and text similarity to expand existing emotions Dictionary and corpus.
  • the market sentiment monitoring system mainly includes ten levels of emotional monitoring such as worry, fear, disappointment, panic, despair, hope, optimism, peace of mind, excitement, and excitement and its six levels of emotional acuity.
  • FIG. 2 is a schematic structural diagram of a market sentiment monitoring system establishment system provided by an embodiment of the present invention. As shown in FIG. 2, the system includes a sample acquisition unit 210, a corpus tagging unit 220, and an emotion monitoring degree acquisition unit 230 and system establishment unit 240;
  • the sample obtaining unit 210 is configured to obtain a sample text set including a plurality of sample texts
  • the corpus tagging unit 220 is configured to tag any of the sample texts, and obtain the emotion label and the emotion degree tag of any sample text;
  • the emotion monitoring degree obtaining unit 230 is configured to obtain the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text;
  • the system establishing unit 240 is configured to construct a market emotion monitoring system based on the emotion monitoring degree and preset emotion acuity; the market emotion monitoring system is used to monitor market emotions.
  • the system provided by the embodiment of the present invention obtains the corresponding emotion label and emotion degree label by labeling the sample text, and then selects the emotion monitoring degree from the emotion label of a large number of sample texts, constructs the market emotion detection manifestation, and helps to achieve objectiveness. , Accurate market sentiment monitoring helps investors reduce misjudgments and improve investment accuracy.
  • the sample acquisition unit 210 is specifically configured to:
  • the data collection method is determined according to the preset collection object, collection frequency, collection content and object level;
  • the sample text is collected from the collection object through a web crawler to construct the sample text set.
  • the system further includes a preprocessing unit; the preprocessing unit is used to:
  • Preprocessing the sample text set includes data cleaning and data management.
  • the system further includes a correlation screening unit; the correlation screening unit is used to:
  • the system further includes an emotion dictionary unit; the emotion dictionary unit is specifically used for:
  • An emotion dictionary is constructed based on the emotion label and the emotion degree label of the sample text.
  • the emotion monitoring degree obtaining unit 230 is specifically configured to:
  • the system further includes a correction unit; the correction unit is used for:
  • the emotion monitoring degree includes worry, fear, disappointment, panic, despair, hope, optimism, peace of mind, excitement and excitement.
  • FIG. 3 is a schematic diagram of the physical structure of an electronic device provided by an embodiment of the present invention.
  • the electronic device may include: a processor 301, a communications interface 302, a memory 303, and communications The bus 304, wherein the processor 301, the communication interface 302, and the memory 303 communicate with each other through the communication bus 304.
  • the processor 301 can call a computer program stored in the memory 303 and run on the processor 301 to execute the method for establishing a market sentiment monitoring system provided by the foregoing embodiments, for example, including: obtaining a sample text set including multiple sample texts Corpus tagging any one of the sample texts to obtain the emotion label and emotion level label of any one of the sample texts; obtaining the emotion monitoring degree based on the emotion label and emotion level label of each sample text; based on The emotion monitoring degree and the preset emotion acuity constitute a market emotion monitoring system; the market emotion monitoring system is used to monitor market emotions.
  • the above-mentioned logical instructions in the memory 303 can be implemented in the form of a software functional unit and when sold or used as an independent product, they can be stored in a computer readable storage medium.
  • the technical solutions of the embodiments of the present invention can be embodied in the form of software products in essence or parts that contribute to the prior art or parts of the technical solutions, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
  • the embodiment of the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program is implemented when executed by a processor to execute the method for establishing a market sentiment monitoring system provided by the foregoing embodiments, for example, including : Obtain a sample text set including a plurality of sample texts; perform corpus labeling on any of the sample texts, and obtain the emotion label and the emotion degree label of any sample text; the emotion label based on each sample text And the emotional degree label to obtain the emotional monitoring degree; based on the emotional monitoring degree and preset emotional acuity, a market sentiment monitoring system is constructed; the market sentiment monitoring system is used to monitor market sentiment.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
  • each implementation manner can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solutions can be embodied in the form of software products, which can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., include a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method and system for establishing a market sentiment monitoring system. The method comprises: acquiring a sample text set comprising a plurality of pieces of sample text (110); performing corpus labeling on any one piece of sample text, and acquiring a sentiment label and a sentiment degree label of the any one piece of sample text (120); on the basis of the sentiment label and the sentiment degree label of each piece of sample text, acquiring a sentiment monitoring degree (130); and on the basis of the sentiment monitoring degree and preset sentiment acuity, constructing a market sentiment monitoring system (140), the market sentiment monitoring system being used to monitor market sentiments. In the described method and system, corresponding sentiment labels and sentiment degree labels are obtained by performing corpus labeling on sample text, and then the sentiment monitoring degree is selected from among the sentiment labels of a large amount of sample text to construct a market sentiment detection system, which helps to implement objective and accurate market sentiment monitoring, and helps investors reduce erroneous determinations and improve investment correctness.

Description

市场情绪监测体系建立方法和系统Method and system for establishing market sentiment monitoring system 技术领域Technical field
本发明涉及文本分析技术领域,尤其涉及一种市场情绪监测体系建立方法和系统。The invention relates to the technical field of text analysis, in particular to a method and system for establishing a market sentiment monitoring system.
背景技术Background technique
对市场形势的正确判断,是投资者进行有效投资的重要前提。而市场情绪作为影响市场形势的重要因素,极大程度上影响着市场形势判断的准确性。如何准确地统计、判断市场情绪,一直是人们关注的问题。The correct judgment of the market situation is an important prerequisite for investors to make effective investments. As an important factor affecting the market situation, market sentiment greatly affects the accuracy of market situation judgments. How to accurately count and judge market sentiment has always been an issue of concern.
之前,投资者主要通过浏览新闻资讯和参考研究报告进行国家政策分析和行业研究,再综合技术分析等方式判断市场形势,进行投资。由于信息的匮乏,极难在有限的信息中挖掘市场情绪。In the past, investors mainly conducted national policy analysis and industry research by browsing news information and referencing research reports, and then comprehensive technical analysis and other methods to judge the market situation and invest. Due to the lack of information, it is extremely difficult to dig market sentiment from the limited information.
随着互联网的快速发展,市场相关的各类新闻和信息层出不穷,内容涉及方方面面,数量也呈爆炸式增长。市场信息的及时性和多样性在为投资者提供丰富的市场情绪监测素材的同时,也导致投资者很难在短时间内从海量信息中准确判断市场情绪。且人为进行市场情绪的统计判断效率低下,得出的市场情绪也带有极大的主观性,准确度低,时效性差,无法满足投资者的需求。With the rapid development of the Internet, all kinds of market-related news and information emerge in an endless stream. The content covers all aspects, and the number is also exploding. The timeliness and diversity of market information not only provide investors with rich market sentiment monitoring materials, but also make it difficult for investors to accurately judge market sentiment from massive amounts of information in a short time. Moreover, the efficiency of artificial statistical judgment of market sentiment is low, and the market sentiment obtained is also highly subjective, with low accuracy and poor timeliness, and cannot meet the needs of investors.
如何建立市场情绪监测体系,从而实现高效、准确、客观的市场情绪监测,成为了人们亟待解决的问题。How to establish a market sentiment monitoring system to achieve efficient, accurate and objective market sentiment monitoring has become an urgent problem for people to solve.
发明内容Summary of the invention
本发明实施例提供一种市场情绪监测体系建立方法和系统,用以解决现有的人工分析市场情绪客观性差、准确率低且时效性差的问题。The embodiment of the present invention provides a method and system for establishing a market sentiment monitoring system to solve the problems of poor objectivity, low accuracy, and poor timeliness of the existing manual analysis of market sentiment.
第一方面,本发明实施例提供一种市场情绪监测体系建立方法,包括:In the first aspect, an embodiment of the present invention provides a method for establishing a market sentiment monitoring system, including:
获取包括多个样本文本的样本文本集;Obtain a sample text set including multiple sample texts;
对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签;Corpus tagging any of the sample texts, and obtaining the emotion label and the emotion degree label of any sample text;
基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度;Obtaining the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text;
基于所述情绪监测度和预设情绪敏锐度,构建市场情绪监测体系;所述市场情绪监测体系用于监测市场情绪。Based on the emotion monitoring degree and preset emotion acuity, a market emotion monitoring system is constructed; the market emotion monitoring system is used to monitor market emotions.
第二方面,本发明实施例提供一种市场情绪监测体系建立系统,包括:In the second aspect, an embodiment of the present invention provides a system for establishing a market sentiment monitoring system, including:
样本获取单元,用于获取包括多个样本文本的样本文本集;The sample acquisition unit is used to acquire a sample text set including multiple sample texts;
语料标注单元,用于对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签;The corpus tagging unit is used to tag any of the sample texts to obtain the emotion label and the emotion degree tag of any sample text;
情绪监测度获取单元,用于基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度;An emotion monitoring degree obtaining unit, configured to obtain the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text;
体系建立单元,用于基于所述情绪监测度和预设情绪敏锐度,构建市场情绪监测体系;所述市场情绪监测体系用于监测市场情绪。The system establishment unit is configured to construct a market emotion monitoring system based on the emotion monitoring degree and preset emotion acuity; the market emotion monitoring system is used to monitor market emotions.
第三方面,本发明实施例提供一种电子设备,包括处理器、通信接口、存储器和总线,其中,处理器,通信接口,存储器通过总线完成相互间的通信,处理器可以调用存储器中的逻辑指令,以执行如第一方面所提供的方法的步骤。In a third aspect, an embodiment of the present invention provides an electronic device that includes a processor, a communication interface, a memory, and a bus. The processor, the communication interface, and the memory communicate with each other through the bus, and the processor can call logic in the memory. Instructions to perform the steps of the method provided in the first aspect.
第四方面,本发明实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面所提供的方法的步骤。In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method provided in the first aspect are implemented.
本发明实施例提供的一种市场情绪监测体系建立方法和系统,通过对样本文本进行语料标注得到对应的情绪标签和情绪程度标签,进而从大量样本文本的情绪标签中选取情绪监测度,构建市场情绪检测体现,有助于实现客观、准确的市场情绪监测,帮助投资者降低误判,提高投资正确性。According to an embodiment of the present invention, a method and system for establishing a market sentiment monitoring system are obtained by labeling sample texts on corpus to obtain corresponding sentiment labels and sentiment degree labels, and then selecting sentiment monitoring degrees from the sentiment labels of a large number of sample texts to construct a market Emotion detection reflects, helps to achieve objective and accurate market sentiment monitoring, helps investors reduce misjudgments, and improves investment accuracy.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为本发明实施例提供的市场情绪监测体系建立方法的流程示意图;FIG. 1 is a schematic flowchart of a method for establishing a market sentiment monitoring system provided by an embodiment of the present invention;
图2为本发明实施例提供的市场情绪监测体系建立系统的结构示意图;2 is a schematic structural diagram of a market sentiment monitoring system establishment system provided by an embodiment of the present invention;
图3为本发明实施例提供的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
市场信息的及时性和多样性在为投资者提供丰富的市场情绪监测素材的同时,也导致投资者很难在短时间内从海量信息中准确判断市场情绪。且人为进行市场情绪的统计判断效率低下,客观性和准确度低,时效性差,无法满足投资者的需求。对此,本发明实施例提供一种市场情绪监测体系建立方法。图1为本发明实施例提供的市场情绪监测体系建立方法的流程示意图,如图1所示,该方法包括:The timeliness and diversity of market information not only provide investors with rich market sentiment monitoring materials, but also make it difficult for investors to accurately judge market sentiment from massive amounts of information in a short time. Moreover, the artificial statistical judgment of market sentiment is inefficient, low in objectivity and accuracy, and poor in timeliness, which cannot meet the needs of investors. In this regard, the embodiment of the present invention provides a method for establishing a market sentiment monitoring system. Fig. 1 is a schematic flowchart of a method for establishing a market sentiment monitoring system provided by an embodiment of the present invention. As shown in Fig. 1, the method includes:
步骤110,获取包括多个样本文本的样本文本集。Step 110: Obtain a sample text set including a plurality of sample texts.
此处,样本文本可以是新闻报道,也可以是市场报告或者社交平台的评论等,本发明实施例对此不作具体限定。样本文本集中包含大量样本文本。Here, the sample text may be a news report, a market report or a comment on a social platform, etc., which is not specifically limited in the embodiment of the present invention. The sample text set contains a large number of sample texts.
步骤120,对任一样本文本进行语料标注,获取该样本文本的情绪标签和情绪程度标签。Step 120: Perform corpus labeling on any sample text, and obtain the emotion label and emotion degree label of the sample text.
具体地,针对任一样本文本,对该样本文本中的每一分句进行情绪分析,并形成针对每一分句的情绪标签和情绪程度标签。针对任一分句,情绪标签是指针对该分句进行情绪分析后得到的情绪类型,情绪程度标签是指对该分 句进行情绪分析后得到的情绪类型的强烈程度。由此得到任一样本文本的情绪标签和情绪程度标签,实际上是针对样本文本中分句的情绪标签和情绪程度标签。Specifically, for any sample text, emotion analysis is performed on each clause in the sample text, and an emotion label and an emotion degree label for each clause are formed. For any clause, the emotion label refers to the emotion type obtained after sentiment analysis of the clause, and the emotion degree label refers to the intensity of the emotion type obtained after the emotion analysis of the clause. From this, the emotion label and emotion degree label of any sample text are actually obtained for the emotion label and emotion degree label of the sentence in the sample text.
另外,在执行步骤120之前,还可以预先训练得到用于语料标注的语料标注模型,具体可通过如下方式训练得到:首先,收集大量训练文本和训练文本对应的情绪标签和情绪程度标签;其中,训练文本是用于语料标注模型训练的文本,训练文本的情绪标签和情绪程度标签是由专业人员标注得到的。随即基于训练文本及其情绪标签和情绪程度标签对初始模型进行训练,从而得到语料标注模型。其中,初始模型可以是单一神经网络模型,也可以是多个神经网络模型的组合,本发明实施例不对初始模型的类型和结构作具体限定。In addition, before step 120 is performed, a corpus labeling model for corpus labeling can also be pre-trained, which can be obtained by training in the following manner: First, a large number of training texts and emotion labels and emotional degree labels corresponding to the training texts are collected; among them, The training text is the text used for the training of the corpus labeling model, and the sentiment label and sentiment degree label of the training text are labelled by professionals. Then, the initial model is trained based on the training text and its emotional label and emotional degree label to obtain the corpus labeling model. The initial model may be a single neural network model, or a combination of multiple neural network models, and the embodiment of the present invention does not specifically limit the type and structure of the initial model.
步骤130,基于每一样本文本的情绪标签和情绪程度标签,获取情绪监测度。Step 130: Obtain the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text.
具体地,在得到每一样本文本的情绪标签和情绪程度标签后,可以对每一样本文本的情绪标签和情绪程度标签进行统计,进而从情绪标签中选取情绪监测度,实现市场情绪的种类划分。此处,情绪监测度是从大量情绪标签中选取得到的具有代表性的情绪标签,情绪监测度的实质是一种市场情绪类型。情绪监测度有多个,且不同情绪监测度能够表征积极或消极程度各异的情绪类型。Specifically, after obtaining the sentiment label and sentiment degree label of each sample text, the sentiment label and sentiment degree label of each sample text can be counted, and then sentiment monitoring degree can be selected from the sentiment label to realize the classification of market sentiment . Here, the emotion monitoring degree is a representative emotion label selected from a large number of emotion labels, and the essence of the emotion monitoring degree is a type of market sentiment. There are multiple levels of emotion monitoring, and different levels of emotion monitoring can represent different types of emotions with different degrees of positive or negative.
步骤140,基于情绪监测度和预设情绪敏锐度,构建市场情绪监测体系;市场情绪监测体系用于监测市场情绪。Step 140: Construct a market sentiment monitoring system based on the sentiment monitoring degree and preset sentiment acuity; the market sentiment monitoring system is used to monitor market sentiment.
具体地,预设情绪敏锐度是预先设定的用于反映情绪监测度对应的情绪强烈程度的指标,例如预设情绪敏锐度可以分为0至5共6级,6级预设情绪敏锐度对应的情绪依次增强。由此构建的市场情绪监测体系包含情绪监测度和预设情绪敏锐度,将市场情绪检测体系应用于市场情绪监测,可以基于实时采集的文本集获取每一情绪监测度对应的情绪敏锐度。此处,任一情绪监测度对应的情绪敏锐度的取值在预设情绪敏锐度的范围内。Specifically, the preset emotional acuity is a preset indicator that reflects the intensity of the emotion corresponding to the emotional monitoring degree. For example, the preset emotional acuity can be divided into 6 levels from 0 to 5, and the 6-level preset emotional acuity Corresponding emotions increase in turn. The market sentiment monitoring system thus constructed includes the sentiment monitoring degree and the preset emotional acuity. The market sentiment detection system is applied to the market sentiment monitoring, and the emotional acuity corresponding to each emotion monitoring degree can be obtained based on the text collection collected in real time. Here, the value of the emotional acuity corresponding to any emotional monitoring degree is within the preset emotional acuity range.
本发明实施例提供的方法,通过对样本文本进行语料标注得到对应的情 绪标签和情绪程度标签,进而从大量样本文本的情绪标签中选取情绪监测度,构建市场情绪检测体现,有助于实现客观、准确的市场情绪监测,帮助投资者降低误判,提高投资正确性。The method provided by the embodiment of the present invention obtains the corresponding emotion label and emotion degree label by labeling the sample text, and then selects the emotion monitoring degree from the emotion label of a large number of sample texts, constructs the market emotion detection manifestation, and helps to achieve objectiveness. , Accurate market sentiment monitoring helps investors reduce misjudgments and improve investment accuracy.
基于上述实施例,该方法中,步骤110具体包括:确定数据采集方法;数据采集方法是根据预先设定的采集对象、采集频率、采集内容和对象等级确定的;基于数据采集方法,通过网络爬虫从采集对象中采集样本文本,构建样本文本集。Based on the above embodiment, in the method, step 110 specifically includes: determining the data collection method; the data collection method is determined according to the preset collection object, collection frequency, collection content, and object level; based on the data collection method, through web crawlers Collect sample texts from the collection objects to construct a sample text set.
具体地,采集对象是进行文本采集的对象,采集对象可以是市场相关机构网站、财经新闻网站、社交媒体或研究机构公开的市场报告等,本发明实施例对此不作具体限定。采集频率是进行文本采集的频率,采集内容包括标题、正文(含图片、视频)、发布时间和发布平台等,对象等级用于表征采集对象的重要程度,对象等级可以根据采集对象的热度或者公信力决定。每一采集对象均设置有对应的采集频率、采集内容和对象等级。Specifically, the collection object is a text collection object, and the collection object may be a market-related institution website, a financial news website, a social media or a market report published by a research institution, etc., which is not specifically limited in the embodiment of the present invention. Collection frequency is the frequency of text collection. Collection content includes title, text (including pictures and videos), release time and release platform, etc. The object level is used to characterize the importance of the collection object. The object level can be based on the popularity or credibility of the collection object Decided. Each collection object is set with a corresponding collection frequency, collection content and object level.
网络爬虫(又称网页蜘蛛、网络机器人),是一种按照预先设定的规则,自动地抓取万维网信息的程序或者脚本。在确定数据采集方法后,可以按照数据采集方法,通过网络爬虫从每一采集对象处抓取样本文本。进一步地,还可以根据对象等级设定在基于对应的采集对象处抓取的样本文本的情绪标签和情绪程度标签选取情绪监测度时的权重大小。A web crawler (also known as a web spider or web robot) is a program or script that automatically crawls information on the World Wide Web in accordance with preset rules. After determining the data collection method, you can follow the data collection method and grab a sample text from each collection object through a web crawler. Further, the weight for selecting the emotion monitoring degree based on the emotion label and emotion degree label of the sample text captured at the corresponding collection object can also be set according to the object level.
基于上述任一实施例,该方法中,步骤110和步骤120之间还包括:对样本文本集进行预处理;预处理包括数据清洗和数据治理。Based on any of the foregoing embodiments, in the method, between step 110 and step 120, the method further includes: preprocessing the sample text set; the preprocessing includes data cleaning and data management.
具体地,从各个采集对象处抓取得到的样本文本,尤其是从社交媒体上抓取得到的样本文本,可能存在大量的错字、语法错和停止词,以及无效、过时或者与市场关联性极弱的数据,这些问题的存在极可能干扰情绪监测度选取的准确性。数据清洗能够发现并纠正样本文本中可识别的错误,例如检查数据的一致性、去重、处理无效值和缺失值等。通过对样本文本进行数据清洗,能够保证样本文本的准确性和有效性。Specifically, the sample text captured from each collection object, especially the sample text captured from social media, may contain a large number of typos, grammatical errors, and stop words, as well as invalid, outdated, or extremely market-related. Weak data, the existence of these problems is likely to interfere with the accuracy of emotion monitoring. Data cleaning can find and correct identifiable errors in the sample text, such as checking the consistency of the data, removing duplicates, and handling invalid and missing values. Data cleaning of the sample text can ensure the accuracy and validity of the sample text.
此外,从各个采集对象处抓取得到的样本文本,表述风格迥异,需要通过标准化的处理对各个样本文本进行统一,以便于对各个样本文本进行情绪 分析和语料标注。数据治理是指从使用零散数据变为使用统一主数据的过程,可以通过元数据、主数据、数据标准、数据模型、数据质量和数据安全六个方面对样本文本进行治理,从而实现样本文本的数据标准化。In addition, the sample text captured from each collection object has very different expression styles. It is necessary to unify each sample text through standardized processing to facilitate sentiment analysis and corpus labeling of each sample text. Data governance refers to the process of changing from using scattered data to using unified master data. Sample texts can be governed through six aspects: metadata, master data, data standards, data models, data quality, and data security, so as to realize the sample text Data standardization.
在针对每一样本文本进行预处理后,将预处理后的每一样本文本应用于语料标注以便于后续情绪监测度的选取和市场情绪监测体系的建立。After preprocessing for each sample text, each sample text after preprocessing is applied to the corpus annotation to facilitate the selection of the subsequent emotion monitoring degree and the establishment of the market emotion monitoring system.
基于上述任一实施例,该方法中,步骤110和步骤120之间还包括:对样本文本集进行市场相关性筛选,删除市场相关性低于预设相关性阈值的样本文本。Based on any of the foregoing embodiments, in the method, between step 110 and step 120, the method further includes: performing market relevance screening on the sample text set, and deleting sample texts whose market relevance is lower than a preset relevance threshold.
具体地,样本文本集中的样本文本中涉及到各个方面的信息内容,其中可能存在部分样本文本与市场无关,在将文本应用于情绪监测度的选取之前,需要滤除样本文本集中与市场无关的样本文本,仅保留与市场相关的样本文本。此处,市场相关性用于衡量样本文本与市场的相关程度,预设相关性阈值可以是预先设定的衡量样本文本是否与市场相关的阈值。Specifically, the sample texts in the sample text set involve various aspects of information content, and there may be some sample texts that are not related to the market. Before applying the text to the selection of sentiment monitoring, it is necessary to filter out those that are not related to the market. Sample text, only the sample text related to the market is retained. Here, the market relevance is used to measure the degree of relevance between the sample text and the market, and the preset relevance threshold may be a preset threshold for measuring whether the sample text is relevant to the market.
可以将任一样本文本输入到相关性模型中,获取相关性模型输出的该样本文本的市场相关性。此处的相关性模型可以是基于样本文本以及样本文本对应的市场相关性训练得到的。Any sample text can be input into the correlation model to obtain the market relevance of the sample text output by the correlation model. The correlation model here can be obtained based on sample text and market correlation training corresponding to the sample text.
基于上述任一实施例,该方法中,步骤120之后还包括:基于样本文本的情绪标签和情绪程度标签构建情绪词典。Based on any of the foregoing embodiments, in the method, after step 120, the method further includes: constructing an emotion dictionary based on the emotion label and the emotion degree label of the sample text.
具体地,在步骤120之后,还可以根据样本文本的情绪标签和情绪程度标签,构建初始情绪词典和语料库,并将初始情绪词典和语料库视为种子词及种子库,利用词汇相似度、新词发现、文本相似度等技术扩展初始情绪词典和语料库,由此得到情绪词典。Specifically, after step 120, an initial sentiment dictionary and corpus can be constructed according to the sentiment label and sentiment degree label of the sample text, and the initial sentiment dictionary and corpus can be regarded as seed words and seed libraries, and word similarity and new words can be used. Discovery, text similarity and other technologies expand the initial sentiment dictionary and corpus to obtain the sentiment dictionary.
基于上述任一实施例,该方法中,步骤130具体包括:基于每一样本文本的情绪标签和情绪程度标签,统计每一情绪标签和情绪程度标签的分句数量;基于每一情绪标签和情绪程度标签的分句数量,获取情绪监测度。Based on any of the foregoing embodiments, in the method, step 130 specifically includes: counting the number of clauses of each emotion label and emotion level label based on the emotion label and emotion level label of each sample text; based on each emotion label and emotion level label The number of sentences of the degree label to obtain the emotional monitoring degree.
具体地,在得到每一样本文本中分句的情绪标签和情绪程度标签后,对每一样本文本中分句的情绪标签和情绪程度标签进行统计,得到每一情绪标签和情绪程度标签对应的分句数量。进而基于每一情绪标签和情绪程度标签 的分句数量,采用帕累托定律或其他数据统计方法,从情绪标签中选取情绪监测度,实现市场情绪的种类划分。Specifically, after obtaining the sentiment label and sentiment degree label of each sentence in the sample text, the sentiment label and sentiment degree label of each sentence in the sample text are counted to obtain the corresponding sentiment label and sentiment degree label. The number of clauses. Furthermore, based on the number of clauses of each emotion label and emotion degree label, Pareto's law or other statistical methods are adopted to select the emotion monitoring degree from the emotion label to realize the classification of market sentiment.
基于上述任一实施例,该方法中,步骤130与步骤140之间还包括:对情绪监测度进行结果回测和数据修正。Based on any of the foregoing embodiments, in the method, between step 130 and step 140, the method further includes: performing result back-testing and data correction on the emotion monitoring degree.
具体地,在得到情绪监测度后,可以对市场情绪的类型和程度进行回测,从而判断当前的情绪监测度是否能够准确反映样本文本的情绪。Specifically, after obtaining the emotion monitoring degree, the type and degree of market sentiment can be back-tested, so as to determine whether the current emotion monitoring degree can accurately reflect the emotion of the sample text.
如果判断获知当前的情绪监测度不能准确反映样本文本的情绪,则对情绪监测度进行修正,从而保证情绪监测度能够更加贴近文本情绪的实际情况。If it is judged that the current emotion monitoring degree cannot accurately reflect the emotion of the sample text, the emotion monitoring degree is corrected to ensure that the emotion monitoring degree can be closer to the actual situation of the text emotion.
基于上述任一实施例,该方法中,情绪监测度包括担忧、害怕、失望、恐慌、绝望、希望、乐观、安心、振奋和亢奋。Based on any of the foregoing embodiments, in the method, the emotion monitoring degree includes worry, fear, disappointment, panic, despair, hope, optimism, peace of mind, excitement, and excitement.
具体地,上述十级情绪监测度的释义和对应的情绪敏感度如下表所示:Specifically, the interpretation of the above ten levels of emotional monitoring and the corresponding emotional sensitivity are shown in the following table:
Figure PCTCN2020078384-appb-000001
Figure PCTCN2020078384-appb-000001
基于上述任一实施例,市场情绪监测体系建立方法,具体包括如下步骤:Based on any of the foregoing embodiments, the method for establishing a market sentiment monitoring system specifically includes the following steps:
首先,确定采集对象及其对应的采集频率、采集内容和对象等级,并由此确定数据采集方法。此处,采集对象包括但不限于国内外政府机构网站、财经新闻网站、研究机构公开报告和社交媒体等。对象等级是根据采集对象 的平台影响力划分的。采集内容可以涵盖标题、正文(含文本中涉及的图片、视频等)、发布时间、发布平台等信息。First, determine the collection object and its corresponding collection frequency, collection content and object level, and thus determine the data collection method. Here, the collection objects include, but are not limited to, domestic and foreign government agency websites, financial news websites, public reports from research institutions, and social media. The object level is divided according to the platform influence of the collected objects. The collected content can cover information such as title, body (including pictures and videos involved in the text), release time, and release platform.
随后,进行样本文本的采集。基于数据采集方法,通过网络爬虫前往各个采集对象采集样本文本。此处,样本文本涉及65种语言,覆盖全球200多个国家和地区,最终的样本文本规模达到5000余万条。Subsequently, the sample text is collected. Based on the data collection method, a web crawler goes to each collection object to collect sample text. Here, the sample text involves 65 languages, covering more than 200 countries and regions around the world, and the final sample text size reaches more than 50 million.
接着,对样本文本进行数据处理。此处的数据处理包括数据清洗、数据治理、文本筛选和语料标注。其中,文本筛选是由专业人员(包括金融专业、经济学专业、心理学专业、语言学等专业背景人员)对清洗治理之后的样本文本进行逐一核查筛选,筛选出与市场相关的样本文本。语料标注是对每一样本文本进行情绪标注,得到样本文本中分句的情绪标签和情绪程度标签,并通过机器学习构建语料标注模型。Next, perform data processing on the sample text. The data processing here includes data cleaning, data management, text screening and corpus labeling. Among them, the text screening is that professionals (including financial professionals, economics majors, psychology majors, linguistics and other professional background personnel) check and filter the sample texts after cleaning and governance one by one, and screen out the sample texts related to the market. Corpus labeling is to perform sentiment labeling on each sample text, obtain the sentiment label and sentiment degree label of the sentence in the sample text, and build a corpus labeling model through machine learning.
基于样本文本中分句的情绪标签和情绪程度标签,对分句中相关字、词、固定搭配、分句进行梳理,并利用词汇相似度、新词发现、文本相似度等技术扩展现有情绪词典及语料库。Based on the sentiment labels and sentiment degree labels of the clauses in the sample text, sort out related words, words, fixed collocations, and clauses in the clauses, and use techniques such as vocabulary similarity, new word discovery, and text similarity to expand existing emotions Dictionary and corpus.
基于语料标注模型对大量样本文本进行语料标注,获取大量样本文本的情绪标签和情绪程度标签。Based on the corpus labeling model, a large number of sample texts are corpus labelled, and the sentiment labels and sentiment degree labels of a large number of sample texts are obtained.
对语料标注得到的大量样本文本的情绪标签和情绪程度标签进行数据统计,在此基础上,结合帕累托定律或其他数据统计方法,确定情绪监测度。Perform data statistics on the sentiment labels and sentiment degree labels of a large number of sample texts marked by the corpus, and on this basis, combined with Pareto's law or other data statistics methods to determine the sentiment monitoring degree.
此后,对情绪监测度进行结果回测和数据修正,基于最终的情绪监测度,以及预设情绪敏锐度构建市场情绪监测体系。市场情绪监测体系主要包括担忧、害怕、失望、恐慌、绝望、希望、乐观、安心、振奋、亢奋等十级情绪监测度和及其六级情绪敏锐度。After that, the results of the emotional monitoring degree were back-tested and the data revised, and the market emotion monitoring system was constructed based on the final emotional monitoring degree and the preset emotional acuity. The market sentiment monitoring system mainly includes ten levels of emotional monitoring such as worry, fear, disappointment, panic, despair, hope, optimism, peace of mind, excitement, and excitement and its six levels of emotional acuity.
基于上述任一实施例,图2为本发明实施例提供的市场情绪监测体系建立系统的结构示意图,如图2所示,该系统包括样本获取单元210、语料标注单元220、情绪监测度获取单元230和体系建立单元240;Based on any of the foregoing embodiments, FIG. 2 is a schematic structural diagram of a market sentiment monitoring system establishment system provided by an embodiment of the present invention. As shown in FIG. 2, the system includes a sample acquisition unit 210, a corpus tagging unit 220, and an emotion monitoring degree acquisition unit 230 and system establishment unit 240;
其中,样本获取单元210用于获取包括多个样本文本的样本文本集;Wherein, the sample obtaining unit 210 is configured to obtain a sample text set including a plurality of sample texts;
语料标注单元220用于对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签;The corpus tagging unit 220 is configured to tag any of the sample texts, and obtain the emotion label and the emotion degree tag of any sample text;
情绪监测度获取单元230用于基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度;The emotion monitoring degree obtaining unit 230 is configured to obtain the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text;
体系建立单元240用于基于所述情绪监测度和预设情绪敏锐度,构建市场情绪监测体系;所述市场情绪监测体系用于监测市场情绪。The system establishing unit 240 is configured to construct a market emotion monitoring system based on the emotion monitoring degree and preset emotion acuity; the market emotion monitoring system is used to monitor market emotions.
本发明实施例提供的系统,通过对样本文本进行语料标注得到对应的情绪标签和情绪程度标签,进而从大量样本文本的情绪标签中选取情绪监测度,构建市场情绪检测体现,有助于实现客观、准确的市场情绪监测,帮助投资者降低误判,提高投资正确性。The system provided by the embodiment of the present invention obtains the corresponding emotion label and emotion degree label by labeling the sample text, and then selects the emotion monitoring degree from the emotion label of a large number of sample texts, constructs the market emotion detection manifestation, and helps to achieve objectiveness. , Accurate market sentiment monitoring helps investors reduce misjudgments and improve investment accuracy.
基于上述任一实施例,该系统中,样本获取单元210具体用于:Based on any of the foregoing embodiments, in the system, the sample acquisition unit 210 is specifically configured to:
确定数据采集方法;所述数据采集方法是根据预先设定的采集对象、采集频率、采集内容和对象等级确定的;Determine the data collection method; the data collection method is determined according to the preset collection object, collection frequency, collection content and object level;
基于所述数据采集方法,通过网络爬虫从所述采集对象中采集所述样本文本,构建所述样本文本集。Based on the data collection method, the sample text is collected from the collection object through a web crawler to construct the sample text set.
基于上述任一实施例,该系统还包括预处理单元;预处理单元用于:Based on any of the above embodiments, the system further includes a preprocessing unit; the preprocessing unit is used to:
对所述样本文本集进行预处理;所述预处理包括数据清洗和数据治理。Preprocessing the sample text set; the preprocessing includes data cleaning and data management.
基于上述任一实施例,该系统还包括相关性筛选单元;相关性筛选单元用于:Based on any of the above embodiments, the system further includes a correlation screening unit; the correlation screening unit is used to:
对所述样本文本集进行市场相关性筛选,删除市场相关性低于预设相关性阈值的所述样本文本。Perform market relevance screening on the sample text set, and delete the sample texts whose market relevance is lower than a preset relevance threshold.
基于上述任一实施例,该系统还包括情绪词典单元;情绪词典单元具体用于:Based on any of the above embodiments, the system further includes an emotion dictionary unit; the emotion dictionary unit is specifically used for:
基于所述样本文本的情绪标签和情绪程度标签构建情绪词典。An emotion dictionary is constructed based on the emotion label and the emotion degree label of the sample text.
基于上述任一实施例,情绪监测度获取单元230具体用于:Based on any of the foregoing embodiments, the emotion monitoring degree obtaining unit 230 is specifically configured to:
基于每一所述样本文本的所述情绪标签和情绪程度标签,统计每一所述情绪标签和所述情绪程度标签的分句数量;Based on the sentiment label and sentiment degree label of each of the sample texts, count the number of sentences of each sentiment label and the sentiment degree label;
基于所述每一情绪标签和情绪程度标签的分句数量,获取所述情绪监测度。Obtain the emotion monitoring degree based on the number of clauses of each emotion label and emotion degree label.
基于上述任一实施例,该系统还包括修正单元;修正单元用于:Based on any of the above embodiments, the system further includes a correction unit; the correction unit is used for:
对所述情绪监测度进行结果回测和数据修正。Perform result back-testing and data correction on the emotional monitoring degree.
基于上述任一实施例,该系统中,所述情绪监测度包括担忧、害怕、失望、恐慌、绝望、希望、乐观、安心、振奋和亢奋。Based on any of the foregoing embodiments, in the system, the emotion monitoring degree includes worry, fear, disappointment, panic, despair, hope, optimism, peace of mind, excitement and excitement.
图3为本发明实施例提供的电子设备的实体结构示意图,如图3所示,该电子设备可以包括:处理器(processor)301、通信接口(Communications Interface)302、存储器(memory)303和通信总线304,其中,处理器301,通信接口302,存储器303通过通信总线304完成相互间的通信。处理器301可以调用存储在存储器303上并可在处理器301上运行的计算机程序,以执行上述各实施例提供的市场情绪监测体系建立方法,例如包括:获取包括多个样本文本的样本文本集;对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签;基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度;基于所述情绪监测度和预设情绪敏锐度,构建市场情绪监测体系;所述市场情绪监测体系用于监测市场情绪。FIG. 3 is a schematic diagram of the physical structure of an electronic device provided by an embodiment of the present invention. As shown in FIG. 3, the electronic device may include: a processor 301, a communications interface 302, a memory 303, and communications The bus 304, wherein the processor 301, the communication interface 302, and the memory 303 communicate with each other through the communication bus 304. The processor 301 can call a computer program stored in the memory 303 and run on the processor 301 to execute the method for establishing a market sentiment monitoring system provided by the foregoing embodiments, for example, including: obtaining a sample text set including multiple sample texts Corpus tagging any one of the sample texts to obtain the emotion label and emotion level label of any one of the sample texts; obtaining the emotion monitoring degree based on the emotion label and emotion level label of each sample text; based on The emotion monitoring degree and the preset emotion acuity constitute a market emotion monitoring system; the market emotion monitoring system is used to monitor market emotions.
此外,上述的存储器303中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 303 can be implemented in the form of a software functional unit and when sold or used as an independent product, they can be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present invention can be embodied in the form of software products in essence or parts that contribute to the prior art or parts of the technical solutions, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
本发明实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的市场情绪监测体系建立方法,例如包括:获取包括多个样本文本的样本文本集;对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签;基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度;基于所述情绪监测度和预设情绪敏锐度,构建市场情绪监测体 系;所述市场情绪监测体系用于监测市场情绪。The embodiment of the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by a processor to execute the method for establishing a market sentiment monitoring system provided by the foregoing embodiments, for example, including : Obtain a sample text set including a plurality of sample texts; perform corpus labeling on any of the sample texts, and obtain the emotion label and the emotion degree label of any sample text; the emotion label based on each sample text And the emotional degree label to obtain the emotional monitoring degree; based on the emotional monitoring degree and preset emotional acuity, a market sentiment monitoring system is constructed; the market sentiment monitoring system is used to monitor market sentiment.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that each implementation manner can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions can be embodied in the form of software products, which can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., include a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the foregoing embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

  1. 一种市场情绪监测体系建立方法,其特征在于,包括:A method for establishing a market sentiment monitoring system, which is characterized in that it includes:
    获取包括多个样本文本的样本文本集;Obtain a sample text set including multiple sample texts;
    对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签;Corpus tagging any of the sample texts, and obtaining the emotion label and the emotion degree label of any sample text;
    基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度;Obtaining the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text;
    基于所述情绪监测度和预设情绪敏锐度,构建市场情绪监测体系;所述市场情绪监测体系用于监测市场情绪。Based on the emotion monitoring degree and preset emotion acuity, a market emotion monitoring system is constructed; the market emotion monitoring system is used to monitor market emotions.
  2. 根据权利要求1所述的市场情绪监测体系建立方法,其特征在于,所述获取包括多个样本文本的样本文本集,具体包括:The method for establishing a market sentiment monitoring system according to claim 1, wherein said acquiring a sample text set including a plurality of sample texts specifically includes:
    确定数据采集方法;所述数据采集方法是根据预先设定的采集对象、采集频率、采集内容和对象等级确定的;Determine the data collection method; the data collection method is determined according to the preset collection object, collection frequency, collection content and object level;
    基于所述数据采集方法,通过网络爬虫从所述采集对象中采集所述样本文本,构建所述样本文本集。Based on the data collection method, the sample text is collected from the collection object through a web crawler to construct the sample text set.
  3. 根据权利要求1所述的市场情绪监测体系建立方法,其特征在于,所述对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签,之前还包括:The method for establishing a market sentiment monitoring system according to claim 1, wherein the corpus labeling of any of the sample texts to obtain the sentiment label and sentiment degree label of any of the sample texts further comprises:
    对所述样本文本集进行预处理;所述预处理包括数据清洗和数据治理。Preprocessing the sample text set; the preprocessing includes data cleaning and data management.
  4. 根据权利要求1所述的市场情绪监测体系建立方法,其特征在于,所述对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签,之前还包括:The method for establishing a market sentiment monitoring system according to claim 1, wherein the corpus labeling of any of the sample texts to obtain the sentiment label and sentiment degree label of any of the sample texts further comprises:
    对所述样本文本集进行市场相关性筛选,删除市场相关性低于预设相关性阈值的所述样本文本。Perform market relevance screening on the sample text set, and delete the sample texts whose market relevance is lower than a preset relevance threshold.
  5. 根据权利要求1所述的市场情绪监测体系建立方法,其特征在于,所述对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签,之后还包括:The method for establishing a market sentiment monitoring system according to claim 1, wherein the corpus labeling any one of the sample texts to obtain the sentiment label and sentiment degree label of the any sample text, and then further comprises:
    基于所述样本文本的情绪标签和情绪程度标签构建情绪词典。An emotion dictionary is constructed based on the emotion label and the emotion degree label of the sample text.
  6. 根据权利要求1所述的市场情绪监测体系建立方法,其特征在于,所述基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度,具体包括:The method for establishing a market sentiment monitoring system according to claim 1, wherein the obtaining the sentiment monitoring degree based on the sentiment label and sentiment degree label of each of the sample texts specifically comprises:
    基于每一所述样本文本的所述情绪标签和情绪程度标签,统计每一所述情绪标签和所述情绪程度标签的分句数量;Based on the sentiment label and sentiment degree label of each of the sample texts, count the number of sentences of each sentiment label and the sentiment degree label;
    基于所述每一情绪标签和情绪程度标签的分句数量,获取情绪监测度。The emotion monitoring degree is obtained based on the number of sentences of each emotion label and emotion degree label.
  7. 根据权利要求1所述的市场情绪监测体系建立方法,其特征在于,所述基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度,之后还包括:The method for establishing a market sentiment monitoring system according to claim 1, wherein said obtaining the sentiment monitoring degree based on said sentiment label and sentiment degree label of each said sample text further comprises:
    对所述情绪监测度进行结果回测和数据修正。Perform result back-testing and data correction on the emotional monitoring degree.
  8. 根据权利要求1至7中任一项所述的市场情绪监测体系建立方法,其特征在于,所述情绪监测度包括担忧、害怕、失望、恐慌、绝望、希望、乐观、安心、振奋和亢奋。The method for establishing a market sentiment monitoring system according to any one of claims 1 to 7, wherein the sentiment monitoring degree includes worry, fear, disappointment, panic, despair, hope, optimism, peace of mind, excitement and excitement.
  9. 一种市场情绪监测体系建立系统,其特征在于,包括:A system for establishing a market sentiment monitoring system, which is characterized in that it includes:
    样本获取单元,用于获取包括多个样本文本的样本文本集;The sample acquisition unit is used to acquire a sample text set including multiple sample texts;
    语料标注单元,用于对任一所述样本文本进行语料标注,获取所述任一样本文本的情绪标签和情绪程度标签;The corpus tagging unit is used to tag any of the sample texts to obtain the emotion label and the emotion degree tag of any sample text;
    情绪监测度获取单元,用于基于每一所述样本文本的所述情绪标签和情绪程度标签,获取情绪监测度;An emotion monitoring degree obtaining unit, configured to obtain the emotion monitoring degree based on the emotion label and the emotion degree label of each sample text;
    体系建立单元,用于基于所述情绪监测度和预设情绪敏锐度,构建市场情绪监测体系;所述市场情绪监测体系用于监测市场情绪。The system establishment unit is configured to construct a market emotion monitoring system based on the emotion monitoring degree and preset emotion acuity; the market emotion monitoring system is used to monitor market emotions.
  10. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至8任一项所述的市场情绪监测体系建立方法的步骤。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor executes the program as described in any one of claims 1 to 8. The steps of the method for establishing a market sentiment monitoring system are described.
PCT/CN2020/078384 2019-07-23 2020-03-09 Method and system for establishing market sentiment monitoring system WO2021012684A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910666290.6A CN110400173A (en) 2019-07-23 2019-07-23 Market sentiment monitoring system method for building up and system
CN201910666290.6 2019-07-23

Publications (1)

Publication Number Publication Date
WO2021012684A1 true WO2021012684A1 (en) 2021-01-28

Family

ID=68325745

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/078384 WO2021012684A1 (en) 2019-07-23 2020-03-09 Method and system for establishing market sentiment monitoring system

Country Status (2)

Country Link
CN (1) CN110400173A (en)
WO (1) WO2021012684A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400173A (en) * 2019-07-23 2019-11-01 中译语通科技股份有限公司 Market sentiment monitoring system method for building up and system
CN112559731B (en) * 2020-12-17 2024-01-02 中译语通科技股份有限公司 Market emotion monitoring method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013000611A1 (en) * 2013-01-16 2014-07-17 i-market GmbH Automatic method for recognizing brochures, catalogs or prospectus on websites of organizations, involves detecting and storing source code of to-be examined website by crawler or selecting source code completely or partially from database
CN106293074A (en) * 2016-07-29 2017-01-04 维沃移动通信有限公司 A kind of Emotion identification method and mobile terminal
CN108717406A (en) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 Text mood analysis method, device and storage medium
CN109325165A (en) * 2018-08-29 2019-02-12 中国平安保险(集团)股份有限公司 Internet public opinion analysis method, apparatus and storage medium
CN109325860A (en) * 2018-08-29 2019-02-12 中国科学院自动化研究所 Network public-opinion detection method and system for overseas investment Risk-warning
CN109697472A (en) * 2018-12-28 2019-04-30 杭州翼兔网络科技有限公司 One seed mood incorporates method into
CN110400173A (en) * 2019-07-23 2019-11-01 中译语通科技股份有限公司 Market sentiment monitoring system method for building up and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101971087B1 (en) * 2017-09-21 2019-04-22 (주)뉴시스 Displaying method for market sentiment index information and online stock dealing service system
CN109145302A (en) * 2018-08-30 2019-01-04 南京都宁大数据科技有限公司 Large agricultural product investor fear mood Measurement Method based on semantic text

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013000611A1 (en) * 2013-01-16 2014-07-17 i-market GmbH Automatic method for recognizing brochures, catalogs or prospectus on websites of organizations, involves detecting and storing source code of to-be examined website by crawler or selecting source code completely or partially from database
CN106293074A (en) * 2016-07-29 2017-01-04 维沃移动通信有限公司 A kind of Emotion identification method and mobile terminal
CN108717406A (en) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 Text mood analysis method, device and storage medium
CN109325165A (en) * 2018-08-29 2019-02-12 中国平安保险(集团)股份有限公司 Internet public opinion analysis method, apparatus and storage medium
CN109325860A (en) * 2018-08-29 2019-02-12 中国科学院自动化研究所 Network public-opinion detection method and system for overseas investment Risk-warning
CN109697472A (en) * 2018-12-28 2019-04-30 杭州翼兔网络科技有限公司 One seed mood incorporates method into
CN110400173A (en) * 2019-07-23 2019-11-01 中译语通科技股份有限公司 Market sentiment monitoring system method for building up and system

Also Published As

Publication number Publication date
CN110400173A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
Tartir et al. Semantic sentiment analysis in Arabic social media
Lex et al. Measuring the quality of web content using factual information
CN113837531A (en) Product quality problem finding and risk assessment method based on network comments
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
KR102019207B1 (en) Apparatus and method for assessing data quality for text analysis
WO2021012684A1 (en) Method and system for establishing market sentiment monitoring system
CN109918648B (en) Rumor depth detection method based on dynamic sliding window feature score
CN106776672A (en) Technology development grain figure determines method
CN112435651A (en) Quality evaluation method for automatic voice data annotation
WO2021114634A1 (en) Text annotation method, device, and storage medium
CN106951917A (en) The intelligent classification system and method for a kind of lymthoma histological type
CN110189170A (en) Market sentiment analysis method and system
CN116882494B (en) Method and device for establishing non-supervision knowledge graph oriented to professional text
CN108021595B (en) Method and device for checking knowledge base triples
CN112580350A (en) Appeal analysis method and device, electronic equipment and storage medium
CN112528028A (en) Investment and financing information mining method and device, electronic equipment and storage medium
CN116976321A (en) Text processing method, apparatus, computer device, storage medium, and program product
CN115689282A (en) Import and export food safety risk monitoring and early warning system
WO2022126718A1 (en) Method and system for monitoring market emotion
CN105786929A (en) Information monitoring method and device
CN111581533B (en) Method and device for identifying state of target object, electronic equipment and storage medium
CN110728131A (en) Method and device for analyzing text attribute
US20210216721A1 (en) System and method to quantify subject-specific sentiment
CN114138942A (en) Violation detection method based on text emotional tendency
CN112561714A (en) NLP technology-based underwriting risk prediction method and device and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20843988

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20843988

Country of ref document: EP

Kind code of ref document: A1