CN113222471A - Asset wind control method and device based on new media data - Google Patents

Asset wind control method and device based on new media data Download PDF

Info

Publication number
CN113222471A
CN113222471A CN202110623218.2A CN202110623218A CN113222471A CN 113222471 A CN113222471 A CN 113222471A CN 202110623218 A CN202110623218 A CN 202110623218A CN 113222471 A CN113222471 A CN 113222471A
Authority
CN
China
Prior art keywords
data
new media
enterprise
media
market value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110623218.2A
Other languages
Chinese (zh)
Other versions
CN113222471B (en
Inventor
苏秦
孙佰清
房岳
鲍鑫
王璧
刘莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Xian Jiaotong University
Original Assignee
Harbin Institute of Technology
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology, Xian Jiaotong University filed Critical Harbin Institute of Technology
Priority to CN202110623218.2A priority Critical patent/CN113222471B/en
Publication of CN113222471A publication Critical patent/CN113222471A/en
Application granted granted Critical
Publication of CN113222471B publication Critical patent/CN113222471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Biophysics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)

Abstract

The invention relates to enterprise market risk monitoring, in particular to an asset wind control method and equipment based on new media data, wherein the asset wind control method based on the new media data comprises the following steps: acquiring financial data, new media public opinion data and transaction data of an enterprise from a server; step two: preprocessing the financial data and the transaction data, inducing the new media public opinion data according to sources and event main bodies, and preprocessing the public opinion data; step three: and inputting financial data, transaction data and data corresponding to the new media flow matrix of the monitored enterprise, and predicting whether the market value fluctuation exceeds a safety range in a future fixed time period through a trained model. And if the fluctuation is higher than the safety range, sending out an early warning signal.

Description

Asset wind control method and device based on new media data
Technical Field
The invention relates to enterprise market risk monitoring, in particular to an asset wind control method and equipment based on new media data.
Background
The new media mainly refers to a spreading form for providing information and entertainment services to users by using digital technology and network technology through Internet, broadband local area network, wireless communication network, satellite and other channels, and terminals such as computer, mobile phone, digital television and the like. Because the new media is closer to the public example and the content screening focuses on the public demand, the spreading efficiency and the influence capability of the enterprise public sentiment in the new media are obviously improved compared with the traditional media era. In recent years, enterprise public opinion monitoring in new media has continuously increased importance to enterprise public relation departments, and has important significance for enterprise marketing, brand modeling, crisis handling, market value management and the like. The effective identification of public opinion content can remarkably improve the recognition of public companies to public praise, identify risk elements in enterprise development, know the appeal of interest relatives and finally realize the risk control requirement on the market value of enterprises.
Although existing market monitoring enables risk monitoring through enterprise market data, the data dimensions used are very limited and the prediction of accidents is not sufficient.
In addition, public opinion monitoring system product can realize data acquisition, content storage and inquiry and the basic analysis based on natural language processing technique, and the analysis achievement that its provided contains: the emotion level, the topic volume, the topic life and the like, but the achievement of the method is lack of theoretical support based on finance and propaganda, and the method is difficult to provide targeted suggestions for enterprise managers according to the requirements of enterprise market value management.
Disclosure of Invention
The invention aims to provide an asset wind control method and equipment based on new media data, which can realize the collection, induction and potential risk identification of new media public opinion data of an enterprise, thereby providing market value fluctuation prediction based on a current public opinion result for the enterprise, and better and more effectively assisting the enterprise to prevent the market value risk.
The purpose of the invention is realized by the following technical scheme:
a method for windmilling assets based on new media data, the method comprising the steps of:
the method comprises the following steps: financial data, new media public opinion data and transaction data of the enterprise are obtained from the server.
Step two: setting a safety range of daily market value fluctuation according to enterprise wind control requirements, and marking enterprise market value risks; if the market value fluctuation is set to 1 in the safety range, otherwise, the market value fluctuation is set to 0. Preprocessing financial data and transaction data, inducing new media public opinion data according to sources and event main bodies, and preprocessing the public opinion data; further, calculating the flow of various public sentiment contents, calculating the intensity of fine-grained sentiment (such as good, happy, grief, anger, frightened and sad) corresponding to the contents according to the sentiment vocabulary ontology library, and integrating the flow of various types of contents and the sentiment intensity to construct a new media flow matrix of an enterprise; integrating the marked enterprise market risk level, financial data, transaction data and flow and emotion data corresponding to the new media flow matrix into an asset wind control data set, wherein the asset wind control data set comprises a training set and a testing set; and inputting the training set into a deep neural network for training.
Step three: and inputting financial data, transaction data and data corresponding to the new media flow matrix of the monitored enterprise, and predicting whether the market value fluctuation exceeds a safety range in a future fixed time period through a trained model. And if the fluctuation is higher than the safety range, sending out an early warning signal.
In the first step, in the determination of the enterprise market value fluctuation safety threshold, one embodiment of the patent adopts two indexes of short-term stock price fluctuation and short-term special risk of the enterprise. Based on two market value risk measures of an enterprise in the past year, the short-term stock price fluctuation of the upper 25% quantiles in the distribution of the past year and the corresponding value of the short-term special risk of the enterprise are taken as the upper limit of the enterprise market value fluctuation safety range.
In the second step, the preprocessing operation performed on the financial data and the transaction data comprises:
and updating and warehousing financial data of the enterprises according to the financial seasons of the enterprises, wherein the financial data comprises key indexes of the enterprises, such as the number of control accrued projects, the total asset profitability, the company asset scale, the number of employees, the property of property rights, the mobile liability and the like.
And updating and warehousing the transaction data of the enterprises according to the days, wherein the transaction data comprises key indexes such as enterprise market value, hand-changing rate, daily profit rate and the like.
Reforming the data according to the day, filling missing data by adopting a moving average method, and constructing a multi-dimensional feature vector according to the characteristics of the financial data and the characteristics of the transaction data
Figure BDA0003100889220000021
Wherein f corresponds to financial data index and N corresponds to1A dimension; t corresponds to the index of transaction data, N2And (4) each dimension.
In the second step, the data sources are classified into the following five categories according to the public sentiment data: official media, mainstream commercial media, influential financial self-media, high-influential non-financial self-media, and ordinary self-media;
the self-media influence score is obtained by scoring according to indexes such as the number of vermicelli, the content updating frequency, the average reading amount of the content and the like;
official media correspond to official controlled media represented by 'Chinese securities newspaper', 'securities daily newspaper', 'securities hour newspaper' and 'Shanghai securities newspaper', and new media data correspond to media accounts operated by the mechanisms and mechanisms under the flags.
The mainstream commercial media corresponds to media of market-oriented type such as 'Chinese operating newspaper', first finance and economics daily newspaper ', economic observation newspaper' and '21 st century economic report', and the new media data corresponds to media account numbers operated by the mechanisms and the mechanisms under the flags.
The other media accounts correspond to the self-media account, and comprise the self-operated media account of the enterprise.
The step of calculating the influence of the self-media comprises the following steps:
(1) and updating the number of fans on each new media platform day by day according to the self media name.
(2) And calculating the average updating frequency of each platform corresponding to each day and the average reading amount of the updated content in the last week by combining the account number groups of the same-name self-media in each new media.
(3) Summarizing the updated contents according to the genres, namely three types of texts, audios and videos, calculating the number of the daily updates, and constructing the feature vectors of the influence as follows:
mediai,t=<fansi,t,frequencyi,t,ave_volumei,t,text_frequencyi,t,audio_frequencyi,t,video_frequencyi,twhere i corresponds to business and t corresponds to time update frequency.
(4) Based on the indexes, the self-media is divided into 2 types according to the density of feature distribution by adopting a DBSCAN clustering algorithm, the accounts which contain higher fans, have higher updating frequency and correspond to high reading amount correspond to high influence categories, and otherwise, the accounts are low influence categories.
The step of calculating whether the self-media belongs to the financial self-media with high influence comprises the following steps:
(1) and (4) summarizing the content of the media which belongs to the high-influence category in the last month, removing stop words and performing word segmentation, wherein the analysis result corresponding to the jieba word segmentation program package is used in the embodiment.
(2) And establishing a keyword lexicon by using various enterprise performance and financial indexes appearing in annual newspaper disclosures of each enterprise, and comparing word segmentation results of the media contents with the keyword lexicon. If the number of the overlapped keywords exceeds 20% of the number of the keywords of the self-media content word segmentation result, and the total number of the overlapped keywords exceeds 10% of the total word frequency of the self-media content word segmentation result, the self-media is defined as the self-media of the financial category.
(3) And updating whether the self media belongs to the financial category or not month by month according to the flow.
The updating of the source comprises the following calculation steps:
for new self-media account numbers appearing in the data acquisition process, firstly calculating the self-media influence, and if the new self-media account numbers do not accord with the high influence category, only summarizing the new self-media account numbers into common self-media; if the high-influence category is met, whether the financial self-media belongs to the financial self-media or not is further judged, if the judgment condition is met, the high-influence financial self-media is defined, otherwise, the high-influence non-financial self-media is defined.
In the second step, the preprocessing of the new media data comprises the following calculation processes:
(1) and removing stop words from the new media content and performing word segmentation. And sequencing the word segmentation results according to the word frequency.
(2) The total number of keywords belonging to the business performance and financial analysis categories is calculated according to the word segmentation results, and is defined as L1.
(3) The total number of keywords belonging to the high-level management and main staff actions is calculated according to the word segmentation result, and the number of the keywords is defined as L2.
(4) And calculating the total number of the keywords belonging to the enterprise marketing service according to the word segmentation result, and defining the number of the keywords to be L3.
If L1, L2, L3 all equal to 0, then the new media content does not belong to any category, and the content is removed and not divided into new media traffic matrix for further operation.
If L1 is higher than the sum of L2 and L3, the content is determined to belong to the business performance and financial analysis category; similarly, if L2 is higher than the sum of L1 and L3, the content is determined to belong to the high-traffic and main personnel action category; if L3 is higher than the sum of L1 and L2, the content is deemed to belong to the business marketing category.
In the preprocessing process, a word bank used for dividing the new media content needs to be constructed, and the construction process of the word bank is as follows:
(1) aiming at keywords in enterprise performance and financial analysis categories, based on documents such as policies and specification requirements corresponding to enterprise disclosure information issued by other monitoring organizations such as a certificate and a prison, the documents are composed of keywords corresponding to financial information in a quarterly report and an annual report text of a listed company;
(2) keywords aiming at actions of high management and main personnel correspond to company disclosure files on the market, particularly enterprise main personnel such as enterprise high-level management personnel, board of president members and the like published in the yearbook;
(3) aiming at the key words of the enterprise marketing business, the key products and businesses which bring business income for the enterprise in the season newspaper and annual newspaper of listed companies are formed.
In the preprocessing process, the word stock used for dividing the new media content needs to be updated, and the updating process is as follows:
(1) and for new nouns appearing in the new media content, if the documents such as policies and specification requirements corresponding to enterprise disclosure information issued by other monitoring organizations such as the self-identification and supervision are classified into enterprise performance and financial analysis categories.
(2) And (4) changing main personnel published by the enterprise, classifying the names of the new people into high administration and main personnel categories, and replacing the corresponding people of the original job.
(3) Corresponding to the keywords of the enterprise marketing service, if the new nouns correspond to the nicknames and alias names of the stakeholders to the enterprises, main business, products and the like, such as the Mizuoyu corresponding to the Miha tour, the new nouns are directly expanded into the word stock corresponding to the category of the enterprise marketing service.
In summary, the frame of the new media traffic matrix corresponds to 5 sources and 3 content categories of the new media content, and further constructs traffic and emotional intensity involved in the matrix.
In the second step, the calculation of the emotion intensity of the new media data corresponds to the following process:
(1) matching various segmented emotional vocabularies by combining a natural language processing technology based on an Ekman emotional ontology word bank developed by university of the major connecting technology according to the content in the new media flow matrix and the localization requirement, and calculating the emotional vocabulary frequency e corresponding to each emotional category of elements in each matrixi,j,kWherein e isi,j,kE, i corresponds to the media content source, j corresponds to the media content category, and k corresponds to the emotion type (including seven types of good, happy, sade, anger, frightened, feared, and sade).
(2) The emotional vocabulary frequency corresponding to each emotional category of the elements in each matrix accounts for the total number of the emotional vocabularies of the corresponding content, and the emotional intensity ed corresponding to the elements in each matrix is obtainedi,j,kWherein:
Figure BDA0003100889220000061
and in conclusion, the construction of a new media flow matrix is completed.
Further, step 2 also includes training the neural network model. Multidimensional eigenvectors for traffic and emotion data corresponding to enterprise financial data, transaction data and new media traffic matrix
Figure BDA0003100889220000062
The representation is performed by using a long short term memory model (LSTM) in a neural network model. The model comprises an input gate input, an output gate output, a forgetting gate for and an internal memory unit memory. Further, the loss function is set as a logarithmic loss function, which is expressed as follows: l (Y, P (Y | M)) -log2And P (Y | M), wherein Y corresponds to the label of whether the market value fluctuation range exceeds the safety range, and P (Y | M) corresponds to the result of model prediction.
Using the current input MtAnd H passed by the last statet-1And combining activation function splicing and training to obtain the following four states:
input gate
Figure BDA0003100889220000063
Information gating
Figure BDA0003100889220000064
Forgetting to gate
Figure BDA0003100889220000065
Output of
Figure BDA0003100889220000066
If z isoAnd when the loss function error corresponding to the real result is the minimum, selecting the model at the moment as a trained LSTM network, and storing the weight matrix corresponding to each type of unit.
As one implementation mode, the step of classifying the public opinion data according to sources and event subjects, constructing a new media traffic matrix of an enterprise, and calculating the traffic data and emotional tendency in the matrix comprises:
and verifying the source of the daily new media public opinion data according to a main body source library built by the system, and simultaneously updating the information of the main media in the main body source library according to five types of the official media, the main stream commercial media, the influential financial self-media, other types of high-influential self-media and the common self-media.
Removing stop words from the new media data, segmenting words, and summarizing main events corresponding to the public sentiments to enterprise performance and financial analysis according to segmentation results; and (4) integrating the source division results to complete the construction of a new media flow matrix. Meanwhile, new vocabularies appearing in the participles are rearranged, and a word bank corresponding to the event main body is updated.
Calculating the flow of the public sentiment data of each subset in the matrix, determining the proportion of each self sentiment tendency by using a natural language processing method, and warehousing the data according to the sequence from a main body source to an event main body and from the flow to the flow corresponding to various sentiment intensities.
And model parameters of enterprise market value fluctuation prediction are calculated, the long-short term memory neural network model (LSTM) is trained by utilizing the financial data, the transaction data and the characteristics corresponding to the new media flow matrix in each group of enterprise sample data and the data label of whether the market value fluctuation is in a safety range, the characteristic relation between a plurality of data characteristics and the data label is learned by the long-short term memory neural network model (LSTM), and the trained long-short term memory neural network model (LSTM) is obtained.
An asset wind control device based on new media data comprises a processing module, a storage module and a market value fluctuation early warning device solidified in the storage module.
The city value fluctuation early warning device comprises an acquisition unit for acquiring a training data set and a training unit for training the neural network model by using the training data set.
The market value fluctuation early warning device further comprises a testing unit for testing the testing sample by using the trained neural network model, and an optimizing unit for optimizing whether the market value fluctuation is in the safety range label or not and whether the market value fluctuation is really in the safety range label difference or not according to the testing result.
An asset wind control readable storage medium based on new media data, the readable storage medium having stored therein a computer program which, when run on a computer, the computer performs a method such as enterprise market risk monitoring based on new media data.
The asset wind control method and the asset wind control equipment based on the new media data have the beneficial effects that:
the invention relates to an asset wind control method of new media data, wherein the convenience of information transmission, and the wide range of audiences and topics of new media bring opportunities and challenges to the strategies of enterprise image management, marketing business and the like. Further, the content in the new media is delivered to the stakeholders, which also generates market pressure for the enterprise, resulting in market value fluctuations. Therefore, the integration of new media content with enterprise financial data and market value data can more widely cover the main external risk sources faced by the current enterprises; the neural network model is used for predicting whether the market value fluctuation exceeds a risk threshold value, so that the feedback of an enterprise can be effectively assisted; in addition, the new media flow matrix designed by the patent can be used for knowing the new media influence of the enterprise on various subjects and creator bodies from the source, and further providing a targeted analysis strategy for subsequent risk blocking work of the enterprise.
The market value fluctuation early warning device can be applied to asset wind control method equipment of new media data and is used for executing all steps in an enterprise market value risk monitoring method based on the new media data.
The asset wind control equipment for the new media data can acquire financial data, market value data and various new media data corresponding to an enterprise at the current moment, and then predict whether the market value fluctuation range of the day corresponding to the current moment exceeds a safety range, so that the accuracy and reliability of the enterprise in predicting the market value risk can be improved.
Drawings
The invention is described in further detail below with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a schematic diagram of a city value fluctuation warning device according to the present invention;
FIG. 2 is a first schematic diagram of an asset wind control method of the new media data of the present invention;
FIG. 3 is a second schematic view of the asset wind control method of the new media data of the present invention;
FIG. 4 is a schematic view of an asset wind control device for new media data of the present invention.
In the figure:
an electronic device 10;
a processing module 11;
a storage module 12;
a market value fluctuation early warning device 100;
an acquisition unit 110;
a training unit 120.
Detailed Description
The invention is described in further detail below with reference to figures 1 to 4.
An asset wind control device of new media data, the enterprise market risk monitoring method based on new media data can be applied to the electronic device 10, and the electronic device 10 executes or implements the steps of the method.
The method in conjunction with fig. 2 and 3 may include the following steps:
step S210, a training data set is obtained, wherein the training data set comprises a plurality of groups of sample data, each group of sample data comprises enterprise financial data, market value data and new media data corresponding to a plurality of time sequences, the data of part of groups of sample data in the plurality of groups of sample data comprises the enterprise data characteristics and data labels obtained by judging whether the enterprise market value fluctuation range in the historical data is in the market value fluctuation threshold safety range, namely the enterprise market value fluctuation range is in the market value fluctuation threshold safety range and corresponds to the data label 1, otherwise, the enterprise market value fluctuation range is 0;
step S220, training the neural network model by using the training data set to obtain the trained neural network model, and predicting whether the market value fluctuation range corresponding to the target time range after the current time is in a safe range.
In the above embodiment, the sample data of the training data set includes the enterprise public sentiment data acquired based on the new media channel, so that diversity and real-time performance of the sample data can be enriched, which is beneficial to improving accuracy and reliability of whether the city value fluctuation range predicted by the trained neural network model is in the city value fluctuation threshold safety range, and solving the problem that the accuracy and reliability of the prediction of the neural network model are low because the sample data is only based on the city value data.
The individual steps of the process are explained in detail below, as follows:
in step S210, the training data set is a data set prepared before training the neural network model. The training data set may be stored in the electronic device 10 or the training data set may be stored in another device from which the electronic device 10 may retrieve. The number of sample groups included in the training data set is usually large, and can be set according to actual conditions.
In each group of sample data, the data characteristics are enterprise market value data and new media content data which are acquired by the acquisition network at different time sequences. The data tag may be whether the market value fluctuation range corresponding to the time point different from the data feature is in the market value fluctuation threshold safety range, if so, the data tag corresponds to a data tag 1, otherwise, the data tag is 0.
And constructing a new media flow matrix based on new media data corresponding to the enterprise, removing stop words from the new media content, and segmenting words. And sequencing the word segmentation results according to the word frequency. The total number of keywords belonging to the business performance and financial analysis categories is calculated according to the word segmentation results and is defined as L1. The total number of keywords attributed to high-ranking and main person actions is calculated according to the word segmentation result, and is defined as L2. The total number of keywords belonging to the enterprise marketing service is calculated according to the word segmentation result, and is defined as L3. If L1, L2, L3 are all equal to 0, the new media content does not belong to any category and is not classified into a new media traffic matrix for further operation. If L1 is higher than the sum of L2 and L3, the content is determined to belong to the business performance and financial analysis category; similarly, if L2 is higher than the sum of L1 and L3, the content is determined to belong to the high-traffic and main personnel action category; if L3 is higher than the sum of L1 and L2, the content is deemed to belong to the business marketing category. For new self-media account numbers appearing in the data acquisition process, firstly calculating the self-media influence, and if the new self-media account numbers do not accord with the high influence category, only summarizing the new self-media account numbers into common self-media; if the high-influence category is met, whether the financial self-media belongs to the financial self-media or not is further judged, if the judgment condition is met, the high-influence financial self-media is defined, otherwise, the high-influence non-financial self-media is defined. And calculating the number of texts corresponding to each element in the new media flow matrix and the emotional tendency corresponding to the texts. The emotional tendency calculation method comprises the following steps: and according to the content in the new media flow matrix, matching the Ekman emotion body word bank developed by the university of the major connecting technology according to the localization requirement with various emotion words after word segmentation in sequence, and calculating the emotion word frequency corresponding to each emotion category of the elements in each matrix. And further, the proportion of the emotional vocabulary frequency corresponding to each emotional category of the elements in each matrix to the total number of the emotional vocabularies of the corresponding content is obtained, and the emotional tendency corresponding to the elements in each matrix is obtained.
In the present embodiment, step S210 may include sub-steps S211 to S212 as follows:
substep S211, acquiring multiple groups of data through a sliding window from the enterprise finance, market value and new media content data sets acquired at the appointed acquisition frequency, wherein each group of data comprises multiple continuous values of the acquired time sequence;
and a substep S212, for each group of the enterprise financial data, the market value data and the new media content data, when the market value fluctuation range is higher than the set market value fluctuation threshold safety range, oversampling is carried out on the samples.
In this embodiment, the electronic device 10 may periodically collect the market value data and the new media content data of the enterprise at a designated collection frequency to form an enterprise financial, market value and new media content data set. The number of the collected enterprises can be determined according to actual conditions, and can be one or more associated enterprises.
In the corporate financial, market value and new media content data set, the corporate market value and new media content data correspond to a corresponding time series, which may be understood as a timestamp at which the corporate market value and new media content data was collected. And then acquiring historical data corresponding to each group of enterprise data tags from the enterprise finance data, the market value data and the new media content data through a sliding window.
Further examine examples of training data. For example, taking 30 days as a sliding window and days as an acquisition frequency, the prediction time range is also set as days, that is, the corporate finance, the market value and the new media content data of the past 30 days and the market value fluctuation range of the 30 th day are enough to be in the market value fluctuation threshold safety range as data tags; generally, the year is set as the time range of the training data, and the time window is slid to obtain the training data set.
Of course, in other real-time manners, the length of the sliding window, the acquisition frequency, the prediction time range, and the time range corresponding to the training data may be set according to the actual situation, and are not specifically limited herein.
And amplifying the sample effect when the city value fluctuation range is higher than the set city value fluctuation threshold safety range, and oversampling the sample effect. For example, the oversampling ratio is set to be 1:2, that is, the number of samples needing early warning is doubled, the description capability of the model on such samples is improved, the limitation of data imbalance is tried to be relaxed, and the effectiveness and reliability of the training result are improved.
In step S220, after the training data set is obtained, each group of sample data in the training data set may be directly utilized to train the neural network model. The neural network model may be, but is not limited to, a deep neural network model, an artificial neural network model. The neural network model can comprise an input layer, a circulation layer and a full connection layer and is used for learning and training each group of sample data, so that the trained neural network model can be obtained.
In step S220, after the training data set is obtained, each group of sample data in the training data set may be directly used to train the long-term and short-term memory neural network model. The long-short term memory neural network model is used as a variant of the cyclic neural network model, can comprise an input layer, a cyclic layer and a full connection layer, and is used for learning and training each group of sample data, so that the trained long-short term memory neural network model can be obtained.
In this embodiment, step S220 may include: and training the long-short term memory neural network model by using the plurality of data features and the data labels in each group of sample data, so that the long-short term memory neural network model learns the feature relationship between the plurality of data features and the data labels, and the trained long-short term memory neural network model is obtained. Multidimensional eigenvectors for traffic and emotion data corresponding to enterprise financial data, transaction data and new media traffic matrix
Figure BDA0003100889220000121
And (4) representing by adopting a long-short term memory model in the neural network model. The model comprises an input gate input, an output gate output, a forgetting gate for and an internal memory unit memory. Further, the loss function is set as a logarithmic loss function, which is expressed as follows: l (Y, P (Y | M)) -log2P (Y | M), wherein Y corresponds to the label of whether the market value fluctuation range exceeds the safety range, and P (Y | M) corresponds to the result of model prediction; wherein the long-short term memory neural network model is LSTM; a recurrent neural network model, RNN;
using the current input MtAnd H passed by the last statet-1And combining activation function splicing and training to obtain the following four states:
input gate
Figure BDA0003100889220000122
Information gating
Figure BDA0003100889220000123
Forgetting to gate
Figure BDA0003100889220000124
Output of
Figure BDA0003100889220000125
If z isoAnd when the loss function error corresponding to the real result is the minimum, selecting the model at the moment as a trained LSTM network, and storing the weight matrix corresponding to each type of unit.
Understandably, when the neural network model is trained, after a plurality of data features and data labels in each group of sample data are input into the neural network model, an input layer, a circulation layer and a full connection layer in the neural network model can learn and train the data features and the data labels, so that the feature relationship between the data features and the data labels in each group is obtained, and the neural network model has the capability of predicting whether the market value fluctuation range of the next time sequence or other time points is in a safety range according to the data features, so that the trained neural network model can be obtained.
As an optional implementation, after step S210, the method may further include a step of performing test optimization on the neural network model, for example, after step S210, the method may further include:
testing the trained neural network model according to a test sample to obtain a test result, wherein the test sample comprises a plurality of test data characteristics with continuous time sequences and a test data label, and the test result comprises a market value fluctuation range corresponding to the time sequence of the test data label;
and optimizing the neural network model through a preset loss function in the neural network model according to whether the market value fluctuation range in the test result is in a safety range and the difference value between the market value fluctuation range and the true level of the test data label, so as to obtain the neural network model for predicting the market value fluctuation range.
The trained neural network model can be tested and optimized by adjusting the loss function and the implementation modes of different neural network models, and the accuracy and the reliability of the neural network model for flow prediction can be improved.
Referring to fig. 3, after obtaining the trained deep neural network model, the method may further include a step of predicting traffic data of the network by using the neural network model. For example, after step S220, the method may further include steps S230 and S240, as follows:
step S230, acquiring enterprise finance, market value and new media content data which are correspondingly required in a preset time period before the current moment;
step S240, inputting the enterprise finance, the market value and the new media content data into the trained neural network model, and predicting, by the neural network model, according to the enterprise finance, the market value and the new media content data corresponding to the plurality of time sequences, whether the market value fluctuation range of the target time after the current time is in a safe range.
In the present embodiment, the current time can be understood as a time at which market value fluctuation range prediction needs to be performed for a future target time. The target time is one time or a plurality of different times after the current time, and can be set according to the actual situation. The preset time period may be determined according to actual conditions, and may be a time period of 1 hour, 3 hours, 5 hours, and the like, where the preset time period is not particularly limited. The target time may be the next time sequence after the current time, or the time corresponding to the specified duration after the current time, and may be determined according to the actual situation. The specified time length can be set according to actual conditions, and is not particularly limited herein. Therefore, the method is beneficial to enterprises to flexibly set the target time according to the actual situation so as to predict the target time interval.
As shown in fig. 4, the city value fluctuation warning apparatus 100 may be applied to the electronic device 10 for executing the steps of the method. The market value fluctuation early warning device 100 comprises at least one software functional module which can be stored in the storage module 12 in the form of software or Firmware or is solidified in an Operating System, OS, of the electronic device 10. The processing module 11 is used for executing executable modules stored in the storage module 12, such as software functional modules and computer programs included in the market value fluctuation warning apparatus 100.
The market value fluctuation early warning apparatus 100 may include an obtaining unit 110 and a training unit 120, and the executed operation content may be as follows:
an obtaining unit 110, configured to obtain a training data set, where the training data set includes multiple groups of sample data, each group of sample data includes enterprise financial, market value, and new media data corresponding to multiple time sequences, and the data of some groups of sample data in the multiple groups of sample data includes data characteristics and data tags obtained after comparing a market value fluctuation range with a safety range;
a training unit 120, configured to train the neural network model by using the training data set, to obtain a trained neural network model, and configured to predict market value fluctuation range data of a target time after the current time.
Optionally, the obtaining unit 110 may further be configured to:
acquiring a plurality of groups of enterprise finance, market value and new media data sets from enterprise finance, market value and new media data sets acquired at a specified acquisition frequency through a sliding window, wherein each group of data comprises a plurality of values of continuous acquired time sequence;
amplifying the sample effect when the city value fluctuation range is higher than the set city value fluctuation threshold safety range, and oversampling the sample effect;
optionally, the training unit 120 may further be configured to: and training the neural network model by using the plurality of data features and the data labels in each group of sample data, so that the neural network model learns the feature relationship between the plurality of data features and the data labels to obtain the trained neural network model.
Optionally, the market value fluctuation warning apparatus 100 may further include a testing unit and an optimizing unit. The testing unit is used for testing the trained neural network model according to a testing sample to obtain a testing result, the testing sample and the training sample are similar to each other and comprise a plurality of testing data characteristics and testing data labels with continuous time sequences, and the testing result comprises a flow value corresponding to the time sequence of the testing data labels. The optimization unit is used for adjusting the loss function and the like according to the characteristics of the trained enterprises or enterprise groups.
Optionally, the market value fluctuation warning apparatus 100 may further include a prediction unit. The obtaining unit 110 may be further configured to obtain the corresponding required corporate finance, market value and new media content data within a preset time period before the current time. And the prediction unit is used for inputting the enterprise finance, the market value and the new media content data into the trained neural network model, and the neural network model predicts whether the market value fluctuation range of the target time after the current time is in a safety range according to the enterprise finance, the market value and the new media content data corresponding to the plurality of time sequences.
To sum up, the asset wind control method for new media data provided by the embodiments of the present application includes: acquiring a training data set, wherein the training data set comprises a plurality of groups of sample data, each group of sample data comprises enterprise financial, market value and new media data corresponding to a plurality of time sequences, and part of groups of sample prediction data in the plurality of groups of sample data comprises data characteristics and data labels obtained by comparing a market value fluctuation range with a safety range; and training the neural network model by utilizing the training data set to obtain the trained neural network model which is used for predicting data of a target moment after the current moment. In the scheme, the sample data of the training data set also comprises enterprise finance, market value and new media data corresponding to a plurality of time sequences, and part of group sample prediction data in a plurality of groups of sample data comprises data characteristics and data labels obtained based on comparison of a market value fluctuation range and a safety range. Therefore, the dimensionality of the sample data can be enriched, so that the accuracy and the reliability of the urban value range predicted by the trained neural network model can be improved, and the problem of low accuracy and reliability of the prediction of the neural network model due to single sample data acquisition is solved; it should be noted that the above embodiments may be spliced with each other or all may be combined together for use.

Claims (10)

1. An asset wind control method based on new media data is characterized in that: the method comprises the following steps:
the method comprises the following steps: acquiring financial data, new media public opinion data and transaction data of an enterprise from a server;
step two: preprocessing financial data and transaction data, inducing new media public opinion data according to sources and event main bodies, and preprocessing the public opinion data;
step three: inputting financial data, transaction data and data corresponding to the new media flow matrix corresponding to the monitored enterprise, predicting whether the fluctuation of the market value exceeds a safety range in a future fixed time period through a trained model, and if the fluctuation is higher than the safety range, sending out an early warning signal.
2. The method of claim 1, wherein the new media data based asset is programmed by: and the second step also comprises the steps of preprocessing the public sentiment data, calculating the flow of various public sentiment contents, calculating the fine-grained sentiment intensity corresponding to the contents according to the sentiment vocabulary ontology library, integrating the flow and the sentiment intensity of various types of contents to construct a new media flow matrix of an enterprise, integrating the marked market risk level of the enterprise, the financial data, the transaction data and the flow and the sentiment data corresponding to the new media flow matrix into an asset wind control data set, wherein the asset wind control data set comprises a training set and a testing set, and inputting the training set into a deep neural network for training.
3. The method of claim 1, wherein the new media data based asset is programmed by: the event body corresponds to three types: enterprise performance and financial analysis; high management and major personnel action; and (4) enterprise marketing business.
4. The method of claim 1, wherein the new media data based asset is programmed by: the sources of the new media public opinion data correspond to five types: official media, mainstream commercial media, influential financial self-media, high-influential non-financial self-media, and ordinary self-media.
5. The method of claim 1, wherein the new media data based asset is programmed by: the traffic data and emotional tendencies involved in the new media traffic matrix are computed using the Ekman emotion model.
6. The method of claim 5, wherein the new media data based asset is programmed by: the step of calculating the flow data and emotional tendency in the matrix comprises the following steps:
the method comprises the following steps: verifying the source of the daily new media public opinion data according to a main body source library built by the system, dividing the data according to five types of official media, mainstream commercial media, influential financial self-media, other types of high-influential self-media and common self-media, and updating the information of the self-media main body in the main body source library;
step two: removing stop words from the new media data, segmenting words, and summarizing main events corresponding to the public sentiments to enterprise performance and financial analysis according to segmentation results; high management and main personnel action and enterprise marketing business, and gathering source division results to complete the construction of a new media flow matrix; meanwhile, new vocabularies appearing in the participles are rearranged, and a word bank corresponding to the event main body is updated;
step three: calculating the flow of the new media public opinion data of each subset in the matrix, determining each own various emotion tendency ratios by using a natural language processing method, and warehousing the data according to the sequence from a main body source to an event main body and from the flow to the various emotion ratios corresponding to the flow.
7. The method of claim 6, wherein the new media data based asset is programmed by: the model parameter calculation of enterprise market value fluctuation prediction corresponds to the following steps: and training the long-short term memory neural network model by using the financial data, the transaction data and the characteristics corresponding to the new media flow matrix in each group of enterprise sample data and the data label indicating whether the market value fluctuation is in a safe range, and obtaining the characteristic relation between the long-short term memory neural network model and a plurality of data characteristics and data labels for learning to obtain the trained long-short term memory neural network model.
8. An asset wind control device based on new media data, comprising a processing module (11) and a storage module (12), characterized in that: the system also comprises a market value fluctuation early warning device (100) solidified in the storage module (12).
9. An asset wind control device based on new media data according to claim 8, characterized in that: the market value fluctuation warning device (100) comprises an acquisition unit (110) for acquiring a training data set and a training unit (120) for training a neural network model by using the training data set.
10. An asset wind control device based on new media data according to claim 9, characterized in that: the device also comprises a testing unit used for testing the test sample by using the trained neural network model, and an optimizing unit used for optimizing the difference value between the safety range label and the market value label according to whether the market value fluctuation in the test result is in the safety range label or not and whether the market value fluctuation is really in the safety range label or not.
CN202110623218.2A 2021-06-04 2021-06-04 Asset wind control method and device based on new media data Active CN113222471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110623218.2A CN113222471B (en) 2021-06-04 2021-06-04 Asset wind control method and device based on new media data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110623218.2A CN113222471B (en) 2021-06-04 2021-06-04 Asset wind control method and device based on new media data

Publications (2)

Publication Number Publication Date
CN113222471A true CN113222471A (en) 2021-08-06
CN113222471B CN113222471B (en) 2023-06-06

Family

ID=77082726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110623218.2A Active CN113222471B (en) 2021-06-04 2021-06-04 Asset wind control method and device based on new media data

Country Status (1)

Country Link
CN (1) CN113222471B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511190A (en) * 2021-12-31 2022-05-17 上海华鑫股份有限公司 Visual analysis system and analysis method for second-level market reevaluation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001080143A1 (en) * 2000-04-17 2001-10-25 Marketocracy Inc. Internet-based system for identification, measurement and ranking of investment portfolio management, and operation of a fund supermarket, including 'best investor' managed funds
US20090018891A1 (en) * 2003-12-30 2009-01-15 Jeff Scott Eder Market value matrix
CN109583738A (en) * 2018-11-22 2019-04-05 第创业证券股份有限公司 A kind of device and method for bond risk control
CN109598623A (en) * 2018-12-11 2019-04-09 国家电网有限公司 A kind of financial product future profits data predication method, apparatus and system
CN109992704A (en) * 2019-03-12 2019-07-09 青岛格兰德信用管理咨询有限公司 A kind of enterprise's public sentiment monitoring system and method based on shot and long term Memory Neural Networks
WO2020000847A1 (en) * 2018-06-25 2020-01-02 中译语通科技股份有限公司 News big data-based method and system for monitoring and analyzing risk perception index
CN111738856A (en) * 2020-06-24 2020-10-02 四川长虹电器股份有限公司 Stock public opinion investment decision analysis method and device
CN112115331A (en) * 2020-09-21 2020-12-22 朱彤 Capital market public opinion monitoring method based on distributed web crawler and NLP
WO2021103492A1 (en) * 2019-11-28 2021-06-03 福建亿榕信息技术有限公司 Risk prediction method and system for business operations

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001080143A1 (en) * 2000-04-17 2001-10-25 Marketocracy Inc. Internet-based system for identification, measurement and ranking of investment portfolio management, and operation of a fund supermarket, including 'best investor' managed funds
US20090018891A1 (en) * 2003-12-30 2009-01-15 Jeff Scott Eder Market value matrix
WO2020000847A1 (en) * 2018-06-25 2020-01-02 中译语通科技股份有限公司 News big data-based method and system for monitoring and analyzing risk perception index
CN109583738A (en) * 2018-11-22 2019-04-05 第创业证券股份有限公司 A kind of device and method for bond risk control
CN109598623A (en) * 2018-12-11 2019-04-09 国家电网有限公司 A kind of financial product future profits data predication method, apparatus and system
CN109992704A (en) * 2019-03-12 2019-07-09 青岛格兰德信用管理咨询有限公司 A kind of enterprise's public sentiment monitoring system and method based on shot and long term Memory Neural Networks
WO2021103492A1 (en) * 2019-11-28 2021-06-03 福建亿榕信息技术有限公司 Risk prediction method and system for business operations
CN111738856A (en) * 2020-06-24 2020-10-02 四川长虹电器股份有限公司 Stock public opinion investment decision analysis method and device
CN112115331A (en) * 2020-09-21 2020-12-22 朱彤 Capital market public opinion monitoring method based on distributed web crawler and NLP

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QUN ZHUGE等: "LSTM Neural Network with Emotional Analysis for Prediction of Stock Price", 《ENGINEERING LETTERS》, pages 1 - 9 *
王曰芬;王怡;: "网络舆情演化与上市公司股价变动的关系研究", 文献与数据学报, no. 01 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511190A (en) * 2021-12-31 2022-05-17 上海华鑫股份有限公司 Visual analysis system and analysis method for second-level market reevaluation

Also Published As

Publication number Publication date
CN113222471B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
US11093568B2 (en) Systems and methods for content management
CN111125525B (en) Personalized transformation correction strategy recommendation system for prisoner and operation method thereof
CN111709575A (en) Academic achievement prediction method based on C-LSTM
US11250513B2 (en) Computer implemented system for generating assurance related planning process and documents for an entity and method thereof
CN110929797A (en) Personnel capacity quantitative evaluation method
US11675750B2 (en) User generated tag collection system and method
US20220318522A1 (en) User-centric and event sensitive predictive text summary
CN110111083A (en) A kind of system based on deep learning building occupational planning
CN109492097B (en) Enterprise news data risk classification method
CN117236647B (en) Post recruitment analysis method and system based on artificial intelligence
Fu et al. A sentiment-aware trading volume prediction model for P2P market using LSTM
CN115481827A (en) Method for intelligently matching supply and demand of innovative entrepreneurship service
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN115641101A (en) Intelligent recruitment method, device and computer readable medium
Senarathne et al. Automate traditional interviewing process using natural language processing and machine learning
CN113222471B (en) Asset wind control method and device based on new media data
Palshikar et al. Automatic Shortlisting of Candidates in Recruitment.
CN113361911A (en) New media content delivery method and equipment based on asset wind control
CN114862006A (en) Social work service scheme automatic generation method and device based on artificial intelligence
Liu et al. Influence mechanism of students' learning enthusiasm based on educational big data
CN111798217A (en) Data analysis system and method
CN116756347B (en) Semantic information retrieval method based on big data
Ngo et al. Exploration and integration of job portals in Vietnam
CN114491034B (en) Text classification method and intelligent device
CN117150245B (en) Enterprise intelligent diagnosis information generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant