CN103559207A - Financial behavior analyzing system based on social media calculation - Google Patents

Financial behavior analyzing system based on social media calculation Download PDF

Info

Publication number
CN103559207A
CN103559207A CN201310469922.2A CN201310469922A CN103559207A CN 103559207 A CN103559207 A CN 103559207A CN 201310469922 A CN201310469922 A CN 201310469922A CN 103559207 A CN103559207 A CN 103559207A
Authority
CN
China
Prior art keywords
user
analysis
data
microblogging
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310469922.2A
Other languages
Chinese (zh)
Inventor
秦谦
宋阳秋
常凯斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Mingtong Tech Co Ltd
Original Assignee
Jiangsu Mingtong Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Mingtong Tech Co Ltd filed Critical Jiangsu Mingtong Tech Co Ltd
Priority to CN201310469922.2A priority Critical patent/CN103559207A/en
Publication of CN103559207A publication Critical patent/CN103559207A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a financial behavior analyzing system based on social media calculation. The financial behavior analyzing system is characterized by comprising three modules of reptiles, databases and indexes, and an analyzer. The reptiles are in charge of acquiring data. The databases are divided into two parts of structured data and unstructured data. A global ID is set from each user and each microblog according to acquired data information when the indexes are established so as to perform align and retrieval on information in different databases. The analyzer is the core of the system and comprises six sub-modules of topic analysis, entity recognition, gesture recognition, message tracking, sentiment analysis and community cluster analysis. By the aid of the economic financial behavior analyzing system based on the social media calculation, user information can be acquired effectively and accurately, so that user data can be archived and arranged completely, user databases can be established, and information push which users concerned about can be provided to the users according to the user databases.

Description

A kind of banking operation analytic system based on social media computation
Technical field
The present invention relates to a kind of economy and finance behavioural analysis system based on social media, affiliated computer software application field.
Background technology
Along with the development of Web2.0, increasing people wish to express freely on the internet the viewpoint of oneself.These viewpoints can be to deliver or reprint a piece of news or news, or the comment to certain news item, can be also the leading off of certain mood of oneself.Traditional discussion version, BBS, blog cannot meet the impact of high speed information gradually.Under this background, microblogging, as a kind of novel social media, more and more attracts the concern of the user on internet, has formed great use viscosity and coverage rate.Thus, these magnanimity have ageing data and have brought huge chance and challenge.
First, in data, contain large chance greatly.The Paul Hawtin of Wall Street Derwent Capital Markets company utilizes computer program to analyze the whole world 3.4 hundred million Twitter account messages, and then judgement common people mood, determines how to process the stock of millions of dollar of meter in hand according to analysis result.In addition, hedge fund is according to shopping website client comment and analysis enterprise product condition of sales; Bank infers employment rate according to job hunting website post quantity; Investment institution collects and analyzes marketing enterprises statement and finds cause of bankruptcy.The election contest team of US President Obama also leaves a message according to more crucial each state voter's of selection Twitter, the hobby of real-time analysis voter to presidential candidate; Researchist also attempts to predict that by the means of machine learning certain user on Twitter is the Democratic Party or republicanism political parties and groups.The institution cooperations such as Google and U.S. CDC, according to sick epidemic disease propagation states such as influenzas in netizen's search content analysis global range.United Nations Organisations is distributed on the promotional advertisement of internet according to supermarket, Latin America, judgement currency inflation tendency etc.In the social media in internet, containing a large amount of valuable information of tool and resource, can automatically therefrom identify these resources of discovery and will bring a large amount of new industry and chance.
The second, mass data and due to microblogging deliver word number limit and ageing, data analysis and process has been proposed to very large challenge.Twitter, Facebook, Google and Bing produce hundreds of data to thousands of Terabyte every day, how effectively to process these data data analysis has been proposed to huge challenge.A large amount of information exchanges is crossed text, image, sound and is gone on record, and therefore effectively analyzing and translate and form content that machine can understand becomes one of problem that computer scientist pays close attention to most.Especially, the information in internet has 80% all to come from text.Therefore, machine is read and is understood and more and more by people, paid close attention to.For example, the founder Tom Mitchell of CMU machine learning system is taught in January, 2010 and has initiated a project that machine is read: Never ending language learning(NELL), its object is just the useful knowledge of Automatic Extraction in texts a large amount of from internet.
For the short text of delivering in microblogging, more difficult with respect to traditional long article analysis.The literary style of short text is more arbitrarily with fuzzy.Therefore how from short text, to extract Useful Information and knowledge, even user's emotion, viewpoint are more difficult.Meanwhile, have more ageingly, we can not preserve all information.Therefore, necessary information extraction and integration can bring more effective storage and recall precision.
internet data excavates
In text, we can, according to the rank of content of the discussions and pattern, be summed up as several classes of topic, entity, action and message.
Topic is the expression-form of the superiors.News, blog and microblogging can be discussed to some specific events.As the discussing warmly of the selling greatly of iphone, US presidential election, Noah's ark and Han Han, Sanlu milk powder melamine event etc., can cause discussion, comment and the reprinting of microblogging.For this class problem, if can find in time problem the statistical dependence temperature of public concern from a large amount of texts, can help us to identify user's the focus of attention.
Entity is the fundamental element in language performance, common as name, place name, exabyte etc.In financial field, we are also concerned about crucial time, place, stock name, amount of increase number percent, bond rate, inputoutput amount of funds etc.For certain specific financial product, we pay close attention to its relevant people (as CEO, directors, has the people of gordian technique etc.), product, the corresponding company of upstream and downstream industry etc.Only have and excavate fast and accurately entity, we could more further analyze, as corresponding topic temperature and stock correlativity, people's emotion and degree of correlation of certain stock etc.
Action is defined as the relation between entity at this.Ru“ Apple changes CEO ", " blasting in certain chemical plant ", " Japanese tsunami " etc.Action is comprised of tlv triple: it has a main body ,Ru“ Apple "; There is a target, as " CEO "; And with a verb, two entity relationships are got up.Action both can have been portrayed certain crucial event, can portray again a kind of collective behavior popular on internet.Such as, on internet, there is a lot of people to express oneself wish, as " I want to buy an iphone ", " I want to buy an ipad ", " I feel a friend's mobile phone is fine see " etc.If people can be summed up out in the wish of association area, can be more clearly by dynamically studying popular consumption and economic behaviour on research internet.
Message (Meme) is defined as " a bit of words of being carried by not stall " at this.In microblogging, there is the language of a large amount of ageing news, comment, famous sayings of famous figures, philosophical implications, even certain interesting picture or passage are ceaselessly reprinted.In this project, we are concerned about that ageing news and comment are to the reaction of economic data and retroaction more.Therefore, express statistic is also identified these efficiency and effects of not stopping the news of being reprinted and commenting on raising analysis greatly effectively.
the classification of emotion viewpoint and identification
In social media, the viewpoint that people often can be to certain event representation oneself.Tendency and the emotion of these viewpoints can be correctly identified, the mood deflection of people in community network can be better analyzed, and the collective response to particular event.Such as, to Sanlu milk powder melamine event, everybody often can use " indignation ", and " degeneration of morality " etc. vocabulary is described; To the message of Japanese tsunami, may use " terror ", words such as " sympathies " is described.For company, can use " creative " in addition, " winner ", " having an optimistic view of ", words such as " risky " is described; Famous person to certain company uses " having leading capacity ", and " envying ", " study ", " swindle ", words such as " quagmires " is described.These words are also not exclusively adjectives, but they have expressed people to the mood of specific people or event and tendency in different levels.Therefore, need to judge the implication that word is expressed with more discrimination technology.When these emotions and viewpoint pool together by different people, represented the whole tendentiousness of market or public opinion.In community Media, we need more on collective (population) level, even prediction is analyzed, judged to these emotions.
community Clustering analysis in microblogging
For the analysis of social media, except will be on text level to its carry out topic, entity, message, emotion viewpoint that behavior is relevant with them is excavated, also in different aspects, to it, summarize and conclude.One of them important aspect is exactly to analyze the community in social media (community).Community can be the locational crowd of certain specific geographic, can be also the crowd with identical job specification, can be also the crowd with common interest topic.For these emerging specific social media of microblogging, the crowd of geographic position and identical interest seems even more important.People in these crowds can influence each other, and their viewpoint also can be with constellation effect.To different regions or there is people's compartment analysis of same interest hobby, will provide the behavioural analysis result of segmentation more.For have same interest crowd analyze, the also result of fabric analysis accurately.For example, within the time period that melamine event occurs, for the crowd of common this event of concern, analyze, can filter much the noise about other events of melamine.In addition, the topic that the people who pays close attention to this event is paid close attention at ordinary times dissects, and can more detailed analysis be which class people tends to this class event to discuss.
the feasibility analysis of microblog users behavioural analysis
Below, we simply introduce the participant of domestic stock market and the participant of microblogging first, and then the feasibility of brief analysis microblog users banking operation once analysis.
● participant in the market
The participant of China Stock Markets mostly is medium and small investors.According to the statistics of 2002, the investor of A share market was only institutional investor below 20%, and the overseas investor's of B thigh assets total amount is less than 2.5% of A-share.According in April, 2012 China securities registration and settlement company statistics, show up-to-date statistics, circulation market value is in the account ratio below 100,000 yuan up to 85%, and market value surpasses 97% especially in the account ratio below 500,000 yuan.By contrast, within 1996, to 2002 annual datas, show, institutional investor's quantity of Japan and American market is stabilized in the 40%-50% of the total quantity of opening an account.Excessive just because of a large amount of casual households and the proportion of medium and small investors in market, the more fluctuation of A share market performance is violent, the behavior of one or more individualities is relatively little on the impact of whole market, the effect that market manifestation is cluster, and popular mood is also more partial to irrational.
● microblogging participant
Along with development and the online idea of internet are day by day rooted in the hearts of the people, increasing people's choice for use the Internet community exchange fast and easily, sharing information and mood.Microblogging arises at the historic moment under this overall situation.By the end of in March, 2011 ,Jin Sina and Tengxun's microblogging just respectively have more than 100,000,000 users.In these users, there is a large amount of users to wish to express the viewpoint of oneself.As shown in Figure 1, there is 46.4% user can be ready very much to express oneself viewpoint and emotion, read other people microblogging; There is 16.2% user can follow the tracks of other people microblogging, and participate in discussion; There is 16.4% user not too to follow the tracks of other people microblogging, but be happy to hot issue to discuss; Also have 21% user not publish an article, but can read other people microblogging.Visible, on microblogging, user's liveness is very high, has 80% user more or less to participate in discussion, and contacts with other people.Have statistics to show, 89.4% user is ready to the friend recommendation friend of oneself; There is 47% user can forward (Re-tweet) microblogging.On so powerful user base, can say that user behavior on microblogging is at every moment reflecting the every aspect of Chinese society's economic activity.
According to another data, show, user sends out in microblogging, has 12.1% for actual effect news; Have 15.3% for there being better content information; Have 26.8% for joke and humorous information; Have 27.4% for famous sayings of famous figures.In all users, about 1970, the user of birth has the relevant news of 38% concern finance; About 1980, the user of birth has 33.9% concern money article; Nineteen ninety, the user of birth had 22.8% concern money article.Visible, how from microblogging, effectively to excavate news and the message that finance is relevant, will produce very large economy and social value.
● market effective and behavior finance
Neoclassicism finance is followed market effective hypothesis.For example, think that the stock price on market has reflected its inherent value, the fluctuation of price is completely random.Yet, the behavior finance rising afterwards is thought, the market price of security has more than by security inherent value and is determined, is also subject to a great extent the impact of investor's subject behavior, i.e. investor sentiment and behavior has significant impact to the price decision of securities market and change thereof.The limitation of efficient market hypothesis is not only suspected and disclosed to behavior finance, and emphasize the impact of the mood in market on the market behavior.At present, increasing hedge fund is used computing machine read news data and conclude the business.Industry based on huge and demand, the news agencies such as Bloomberg News, Dow Jones and Thomson Reuters all have accepted to obtain with computer software the idea of data, and have started to provide service, help Wall Street client's automatic screening news.
In recent years, growing along with social media, people's behavior and mood and more and more and more and more faster being reflected in social media such as Facebook, Twitter, Weibo to the viewpoint view of timeliness news.The sense of smell of Wall Street sensitivity also guides corresponding exchange company that sight has been turned to social media.According to the Aite Group of financial service consultant firm statistics, within 2009, there are the professional social media of use of 35% professional exchange company as helping one of means of its decision-making.Along with development and the evolution in market, within 2011, use company's ratio of social media to rise to 46%.Wherein having 19%(2009 is 36%) exchange company think that social media can effectively follow the trail of market sentiment; 9%(2009 is 21%) exchange company declare that they have used social media to help them to distinguish the difference with other companies; In addition, 6%(2009 is 16%) company claim social media to help them to promote achievement.Although 2011 annual datas show, use and in the exchange company of social media, think and can therefrom find that the ratio that new viewpoint maybe can promote company performance reduced to some extent compared with 2009, but the information in social media more and more Wei Gengduo company understand and use, and then do not form one of sign of distinguishing certain or certain class specialty exchange company.Under this background, correct extraction and use also seem more important of social media, meanwhile, is also containing the chance of a large amount of generation economic benefits.
Summary of the invention
Goal of the invention: the object of the invention is in order to solve the deficiency of current data analysis system, a kind of economy and finance behavioural analysis system based on social media is provided.
Technical scheme: the economy and finance behavioural analysis system based on social media of the present invention, its objective is such realization,
An economy and finance behavioural analysis system for social media, system is mainly comprised of three major types module: reptile (Crawler), database and index (Database/Indexer), analyzer (Analyzer).
data acquisition and processing (DAP)
● reptile
Reptile is mainly responsible for data acquisition.Data source is divided into two parts.First is economic target and time series.Economic target comprises the financial data of country, place and company.Country monthly all can announce crucial economic data in per season, and these economic datas can be used for coordinating people's comment analyzing social economy's behavior.Time related sequence comprises the banking indexs such as the main stock in market, commodity, bond, the exchange rate, the share price of concrete company etc.External general data source is the companies such as Bloomberg News (Bloomberg), Dow Jones (Dow Jones) and Thomson Reuters (Thomson Reuters); Domestic Sina's finance and economics, large wisdom and the sequence etc. of comprising.
Second portion is microblogging data.Microblogging provides API to facilitate user to carry out orientation and captures.For this reason, we need to keep one directed to capture list, comprise crucial user (and good friend), main listed company, and Related product, and the relevant keyword of economic activity etc.For microblogging, also have the important information of a class, be exactly the link information between user, label (hashtag) and reprinting.Therefore,, for the data that capture, relevant link and reprinting also will be included.
● database and index
Database is divided into two parts, structural data and unstructured data.Structural data comprises principal economic indicators, time series, financial statement etc.These data are used MySQL storage.Unstructured data comprises topic, entity of microblogging text and mark etc.This part information can coordinate MySQL to realize index by Lucene.Lucene is good at text resume inverted list index, can let us retrieves easily that microblogging and has delivered the information of certain keyword and comment.MySQL is used for topic, entity, action and the message of mark to retrieve.Therefore we can detect to having the microblogging of identical ID the information in various territories.
Topic: topic is set up index with label.For whole microblogging data, we provide fixing some large category information.For every microblogging data, we mark its classification information.In addition, microblogging can belong to multiclass, and therefore for topic territory, we need to set up the mapping of one-to-many.
Entity: entity comprises the fixedly noun phrase in name, place name, mechanism's name etc. and some common-use words.For entity, we need to mark classification and the entity title of entity, and recording user ID and microblogging ID.
Action: for action, we need to mark triplet information, < main body, moves, target >, and recording user ID and microblogging ID.
Message: if the message of reprinting, we need to store its user ID of being reprinted, microblogging ID etc.
According to above information, when setting up index, we set Yi Ge Global ID to each user and every microblogging, with this, information in disparate databases are alignd and are retrieved.
● analyzer
Analyzer is the core of system, comprises 6 submodules, respectively: topic analysis, Entity recognition, action recognition, message tracking, sentiment analysis and Community Clustering analysis.
Topic analysis is the comparatively coarse semantic analysis in upper strata.Topic is the classification problem of the multi-level many labels of multi-angle.We can become economy, politics, physical culture, amusement, education etc. by microblogging Data classification; Also news messages can be divided into home or overseas news.The microblogging relevant to economic society activity can be screened accordingly.We can further be categorized into economic class microblogging data macro economic analysis comment, stock analysis, company's comment etc.In addition, we can also divide some specific topics, for example, find out melamine event, microblogging that Japanese tsunami event is relevant etc.
Entity analysis and action analysis are comparatively thin a kind of semantic analyses.We carry out entity and semantic analysis to every microblogging, detect the synonym of entity and the cluster of action.On this basis we can provide corresponding entity and action the time series that forms of frequency, these time serieses form the basis of our Future Data service and expert system.
The message of carrying for not stall, the number of times that first we can reprint message is organized into time series; Secondly, the subgraph with sequential that we carry user's formation of this message by not stall stores, and is convenient to migration and the evolution of interest in futures analysis internet.
Sentiment analysis is used for the vocabulary with emotion in identifiable language, and we can combine result and other modules of this module output, realize and have the sentiment analysis of assembling meaning.
Community Clustering analysis provides user clustering.Cluster can, according to different semantemes and linguistic context, also can connect with reprinting to be connected and analyze according to the good friend between user.Different clusters gives people the not ipsilateral to data understanding.Our cluster module will be held assembly and disassembly very much.
data, services and expert system
The technology that we not only provide above-mentioned data to capture, analyze, and the data that can analyze out based on these provide some services.Data, services in our system and expert system provide more professional knowledge and information pushing for user.We introduce the concrete function of this part in detail at this.
● data, services
Data, services comprises the content of the following aspects.
Market sentiment index: we,, by allly carrying out sentiment analysis with the relevant microbloggings of socio-economic activity every day, obtain a market sentiment index, and announce every day, to improve influence power.
Critical event detects: to critical event in microblogging, especially accident detects, for user provides early warning and prompting the very first time.
Personage's liveness, key person excavate: based on the excavation of topic and event is excavated most active people in wherein discussing.By dispatch, add up, the rank that the equifrequent statistics of temperature provides focus personage is reprinted and replied to article.
User profile statistics and prediction: age, sex, interest, position: for everyone who occurs, carry out the not statistics of ipsilateral attribute in topic.We can obtain some information by the interface in open platform, and we can be excavated some attribute and be predicted by each user's dispatch content.
Time series correlation analysis: for topic, entity, action and message, and their corresponding emotion index, we can set up a time series.Between these time serieses and important economic target, stock and index thereof, can excavate some correlativitys.We provide user index or certain the maximally related text time series of stock for analysis.
Network evolution is analyzed: for different topics, we provide different network evolution analyses, and for example network is big or small, statistical property of structure etc.These network evolution results also can effectively get Useful Information when helping user to the social economy behavioural analysis of internet.
● expert system
Expert system is to have gathered a series of suggestions and solution that our all analytical technologies provide.At this, we provide three concrete examples.
The many empty judgements in stock market: we are by the statistics to historical data, and the motion that can obtain those crucial entities, action, message and the emotion Hui He stock market that they are relevant produces correlativity.Such as, stock index itself has represented the mood in market, on microblogging, the mood of people's dispatch has also reflected the popular attitude to market in some sense.If therefore a lot of people are in the much cities of microblogging, stock market has great probability rise to calculate by historical data so, and then can give some suggestions for investment of user.
Practical work automatic analysis after dish: analyze for the later result of closing the same day, by excavating historical data, find the critical event that likely affects tendency on the same day.For example " apple changes CEO " and the people evaluation to new CEO; And for example " tsunami occurs in Japan " and corresponding emotion index etc., can supply customer analysis as the event of summing up tendency on the same day.
Network is discussed analysis warmly: the topic of discussing warmly for network carries out analysis and prediction.For example both sides argue certain hot issue, and then judge both sides' emotion index, prediction which side can be won etc.For example: " issue of millet mobile phone " event is analyzed, and whether prediction both sides' argument and millet mobile phone can be successful.
Beneficial effect: a kind of economy and finance behavioural analysis system based on social media of the present invention can be collected user profile efficiently and accurately, thereby user data is carried out to comparatively complete filing, arrangement, set up user information database, the message push that provides user to pay close attention to user according to user's information bank.
Embodiment
In order to deepen the understanding of the present invention, below in conjunction with embodiment, the invention will be further described, and this embodiment only, for explaining the present invention, does not form limiting the scope of the present invention.
Economy and finance behavioural analysis system based on social media of the present invention, system is mainly comprised of three major types module: reptile (Crawler), database and index (Database/Indexer), analyzer (Analyzer).
data acquisition and processing (DAP)
● reptile
Reptile is mainly responsible for data acquisition.Data source is divided into two parts.First is economic target and time series.Economic target comprises the financial data of country, place and company.Country monthly all can announce crucial economic data in per season, and these economic datas can be used for coordinating people's comment analyzing social economy's behavior.Time related sequence comprises the banking indexs such as the main stock in market, commodity, bond, the exchange rate, the share price of concrete company etc.External general data source is the companies such as Bloomberg News (Bloomberg), Dow Jones (Dow Jones) and Thomson Reuters (Thomson Reuters); Domestic Sina's finance and economics, large wisdom and the sequence etc. of comprising.
Second portion is microblogging data.Microblogging provides API to facilitate user to carry out orientation and captures.For this reason, we need to keep one directed to capture list, comprise crucial user (and good friend), main listed company, and Related product, and the relevant keyword of economic activity etc.For microblogging, also have the important information of a class, be exactly the link information between user, label (hashtag) and reprinting.Therefore,, for the data that capture, relevant link and reprinting also will be included.
● database and index
Database is divided into two parts, structural data and unstructured data.Structural data comprises principal economic indicators, time series, financial statement etc.These data are used MySQL storage.Unstructured data comprises topic, entity of microblogging text and mark etc.This part information can coordinate MySQL to realize index by Lucene.Lucene is good at text resume inverted list index, can let us retrieves easily that microblogging and has delivered the information of certain keyword and comment.MySQL is used for topic, entity, action and the message of mark to retrieve.Therefore we can detect to having the microblogging of identical ID the information in various territories:
Topic: topic is set up index with label.For whole microblogging data, we provide fixing some large category information.For every microblogging data, we mark its classification information.In addition, microblogging can belong to multiclass, and therefore for topic territory, we need to set up the mapping of one-to-many.
Entity: entity comprises the fixedly noun phrase in name, place name, mechanism's name etc. and some common-use words.For entity, we need to mark classification and the entity title of entity, and recording user ID and microblogging ID.
Action: for action, we need to mark triplet information, < main body, moves, target >, and recording user ID and microblogging ID.
Message: if the message of reprinting, we need to store its user ID of being reprinted, microblogging ID etc.
According to above information, when setting up index, we set Yi Ge Global ID to each user and every microblogging, with this, information in disparate databases are alignd and are retrieved.
● analyzer
Analyzer is the core of system, comprises 6 submodules, respectively: topic analysis, Entity recognition, action recognition, message tracking, sentiment analysis and Community Clustering analysis.
Topic analysis is the comparatively coarse semantic analysis in upper strata.Topic is the classification problem of the multi-level many labels of multi-angle.We can become economy, politics, physical culture, amusement, education etc. by microblogging Data classification; Also news messages can be divided into home or overseas news.The microblogging relevant to economic society activity can be screened accordingly.We can further be categorized into economic class microblogging data macro economic analysis comment, stock analysis, company's comment etc.In addition, we can also divide some specific topics, for example, find out melamine event, microblogging that Japanese tsunami event is relevant etc.
Entity analysis and action analysis are comparatively thin a kind of semantic analyses.We carry out entity and semantic analysis to every microblogging, detect the synonym of entity and the cluster of action.On this basis we can provide corresponding entity and action the time series that forms of frequency, these time serieses form the basis of our Future Data service and expert system.
The message of carrying for not stall, the number of times that first we can reprint message is organized into time series; Secondly, the subgraph with sequential that we carry user's formation of this message by not stall stores, and is convenient to migration and the evolution of interest in futures analysis internet.
Sentiment analysis is used for the vocabulary with emotion in identifiable language, and we can combine result and other modules of this module output, realize and have the sentiment analysis of assembling meaning.
Community Clustering analysis provides user clustering.Cluster can, according to different semantemes and linguistic context, also can connect with reprinting to be connected and analyze according to the good friend between user.Different clusters gives people the not ipsilateral to data understanding.Our cluster module will be held assembly and disassembly very much.
data, services and expert system
The technology that we not only provide above-mentioned data to capture, analyze, and the data that can analyze out based on these provide some services.Data, services in our system and expert system provide more professional knowledge and information pushing for user.We introduce the concrete function of this part in detail at this.
● data, services
Data, services comprises the content of the following aspects.
Market sentiment index: we,, by allly carrying out sentiment analysis with the relevant microbloggings of socio-economic activity every day, obtain a market sentiment index, and announce every day, to improve influence power.
Critical event detects: to critical event in microblogging, especially accident detects, for user provides early warning and prompting the very first time.
Personage's liveness, key person excavate: based on the excavation of topic and event is excavated most active people in wherein discussing.By dispatch, add up, the rank that the equifrequent statistics of temperature provides focus personage is reprinted and replied to article.
User profile statistics and prediction: age, sex, interest, position: for everyone who occurs, carry out the not statistics of ipsilateral attribute in topic.We can obtain some information by the interface in open platform, and we can be excavated some attribute and be predicted by each user's dispatch content.
Time series correlation analysis: for topic, entity, action and message, and their corresponding emotion index, we can set up a time series.Between these time serieses and important economic target, stock and index thereof, can excavate some correlativitys.We provide user index or certain the maximally related text time series of stock for analysis.
Network evolution is analyzed: for different topics, we provide different network evolution analyses, and for example network is big or small, statistical property of structure etc.These network evolution results also can effectively get Useful Information when helping user to the social economy behavioural analysis of internet.
● expert system
Expert system is to have gathered a series of suggestions and solution that our all analytical technologies provide.At this, we provide three concrete examples.
The many empty judgements in stock market: we are by the statistics to historical data, and the motion that can obtain those crucial entities, action, message and the emotion Hui He stock market that they are relevant produces correlativity.Such as, stock index itself has represented the mood in market, on microblogging, the mood of people's dispatch has also reflected the popular attitude to market in some sense.If therefore a lot of people are in the much cities of microblogging, stock market has great probability rise to calculate by historical data so, and then can give some suggestions for investment of user.
Practical work automatic analysis after dish: analyze for the later result of closing the same day, by excavating historical data, find the critical event that likely affects tendency on the same day.For example " apple changes CEO " and the people evaluation to new CEO; And for example " tsunami occurs in Japan " and corresponding emotion index etc., can supply customer analysis as the event of summing up tendency on the same day.
Network is discussed analysis warmly: the topic of discussing warmly for network carries out analysis and prediction.For example both sides argue certain hot issue, and then judge both sides' emotion index, prediction which side can be won etc.For example: " issue of millet mobile phone " event is analyzed, and whether prediction both sides' argument and millet mobile phone can be successful.
Introduce two big frames that topic is analyzed below: classification and topic model.The topic of text is the set with the document of a certain class topic, and the content for example talk about politics in news, the article of military, economy and amusement relating to can be very different.If which class is known topic be if required, we can use for the sorting technique with supervision message and judge; If given text collection is not specified the classification of topic, need to analyze with non-supervisory cluster or topic model.
● topic analysis
Topic classification is mainly by six module compositions: be respectively topic training module (Training), model module (Model), literary composition gear marking module (Document Ranking), keyword marking module (Keyword Ranking), topic marking module (Topic Ranking), the user module (Author Ranking) of giving a mark.
Topic training module: topic training module is responsible for according to history or labeled data, topic being analyzed.If we are which class topic of known needs, as politics, economy, military affairs, amusement etc., we need to train a multicategory classification device so; If the kind of our unknown topic, we need to train corresponding topic model.
Model module: model is according to the Output rusults of training module, new data to be carried out the module of topic classification.No matter use sorter or topic model, we can obtain the function that a handle is newly arrived on the topic that text mapping can judge to us.According to this function, we can mark to text, be transported to the mark of giving a mark in marking module below, and are finally stored in database.
Literary composition gear marking module: document marking is the popularity degree (popularity according to the document; As reprint rate) and the significance level (importance in topic; As crucial personage dispatch, original document etc.), typical degree carries out (typicality; Whether can represent certain topic) etc. the module that marks.
Keyword marking module: keyword marking is that keyword important in document is marked.This module can coordinate together with other modules such as topic model, Entity recognition, action recognition and sentiment analysis works, and finds useful word to be convenient to retrieval, relatively and analyze.For example, in Entity recognition, we can identify place name and the exabytes such as " Fukushima ", " Tokyo generating ", but we do not mark accordingly to " tsunami ".In keyword marking module, we also wish to mark to identification event, message, the most helpful vocabulary of topic.
Topic marking module: topic is divided into two parts: first be overall topic marking-we wish to know which topic the most easily causes concern; Second portion be topic with respect to some texts marking-we wish to know in text, which topic is most important.
User's module of giving a mark: for each topic, we wish to know whom most active user is; For each document, we wish to know whom the most active user of comment is.User's module of giving a mark coordinates other modules, and each topic is carried out to user's marking and mark dynamically.
entity recognition
Entity analysis is the most important components of economy and finance activity analysis.The identification of important name, place name, exabyte all realizes in entity analysis module.
Entity recognition mainly comprises following module: grammer processing module (Chunking/POS Tagging), Entity recognition training aids module (Training Named Entity Recognizer), base module (Knowledge Base), model module (Model), knowledge base help module (Knowledge Base helper), entity disambiguation module (EntityDisambiguation), entity cluster module (Entity Clustering), Relation extraction help module (EntityRelation Extraction Helper).
Grammer processing module: grammer processing module comprises Chinese word segmentation, part-of-speech tagging.Be mainly used to help Entity recognition training module and model module to produce feature.Entity is noun phrase substantially, effectively extracts syntactic information and can help entity classification device better to identify.
Entity recognition training aids module: training module is mainly to identify corresponding entity according to the training data having marked.Entity class can be name, place name, exabyte, can be also amount, number percent, date, stock name etc.
Base module: base module is a very important link in entity analysis, because some information that we need must be very accurate.For example Business Name, corresponding shareholder, general manager (GM) ,CEO, company product etc.These information must by building, the method for knowledge base parses from specific website or people is manual marks out.Only in this way, we just can better analyze corresponding event in financial and economic news.
Model module: model module, according to the result of knowledge base and the training of Entity recognition sorter, integrates the new microblogging data of real-time mark.
Knowledge base help module: knowledge base help module is that the output according to model, the output of disappear qi and cluster module are summarized, and very definite entity information is put into knowledge base.
Entity disambiguation module: entity has ambiguity.For example apple can be company, can be also fruit.We need based on context semantic to the specific entity qi that disappears.Knowledge used comprises the data source (as the co-occurrence word frequency on internet) of knowledge base, outside etc.
Entity cluster module: entity cluster module helps us to find synonym, such as Microsoft, Microsoft etc.These synonyms are according to our marking, and very definite part is sent into and in knowledge base, carried out the abundant of content.Meanwhile, we also can retrieve according to synonym, find the microblogging that same entity is discussed.
Relation extraction help module: the output of Entity recognition can be used as the feature of Relation extraction equally, helps Relation extraction to find more accurately the corresponding event of crucial entity, behavior.
relation and action are extracted
Relation extraction mainly comprises following module: grammer processing module (Chunking/POS Tagging/Parsing), relation recognition training aids module (Training Relation Extractor), model module (Model), be related to disambiguation module (Relation Disambiguation), be related to Relation extraction help module in cluster module (Relation Clustering), Entity recognition (Entity Relation Extraction Helper).
Grammer processing module: grammer processing module comprises participle, part-of-speech tagging and syntax tree analysis.Grammer processing module provides feature for relation recognition training and model module.Because Relation extraction needs verb and corresponding subject object, therefore need syntax tree analysis.Meanwhile, we also can attempt not using the method for syntax tree.The method describes in detail in 3.3.2 joint.
Relation recognition training aids module: relation recognition is carried out Relation extraction by noun phrase and the corresponding knowledge base (as encyclopaedia) of judgement part of speech, verb the right and left.Relation extraction belongs to open information extraction technique, cannot accomplish some fields very comprehensively to cover.For this reason, we need to carry out special formulation to finance and economic field.According to the entity storehouse of our financial knowledge base and identification, update and enrich.
Model module: model module is responsible for new microblogging to identify.
Be related to disambiguation module: disappear qi module and entity of relation disappears qi module class seemingly, be responsible for noun phrase wherein and the verb phrase qi that disappears.
Be related to cluster module: be related to that cluster is that similar relation is carried out to cluster analysis, the relation of synonym or nearly justice is organized together.For example < Japan, occurs, and tsunami >, < Japan, is attacked, and tsunami > and < tsunami attack, and Japanese > should be got together.Relation extraction help module in Entity recognition: this module is the output of entity analysis part, here for the input feature vector of Relation extraction.
sentiment analysis
Sentiment analysis mainly comprises following module: grammer processing module (Chunking/POS Tagging), semi-supervised keyword labeling module (Training Semi-supervised Word Annotation), emotion base module (SentimentKnowledge Base), model module (Model), overall time correlation Emotion tagging module (Overall Time-stampedSentiment), the entity Emotion tagging module (Entity Associated Sentiment) of being correlated with, microblogging sentence Emotion tagging module (Sentence Level Sentiment), user feeling labeling module (User Level Sentiment).
Grammer processing module: grammer processing module comprises Chinese word segmentation, part-of-speech tagging.Being mainly used to provides feature for the Emotion tagging of word.
Semi-supervised keyword labeling module: we can carry out semi-supervised Emotion tagging according to existing emotion knowledge base (as HowNet) and a large amount of samples observing (as the co-occurrence frequency between word).This annotation results can be deposited in knowledge base.Emotion base module: emotion knowledge base is comprised of two parts.A part is emotion vocabulary, degree vocabulary and the viewpoint vocabulary etc. of people's mark.Equations of The Second Kind is according to the vocabulary of meaning automatic marking by machine.We become knowledge base two parts knowledge organization, for model module.
Model module: model module is used two class vocabulary in knowledge base to give a mark.Marking mechanism has two classes.The first kind, according to people's experience, provides scoring.Equations of The Second Kind can be learnt according to data.Such as us, can learn which class vocabulary according to the ups and downs of stock index tends to for describing that bull market, which class vocabulary are used for describing bear market etc.
Overall time correlation Emotion tagging module: for microblogging integral body, different communities, different crowd, we can provide an emotion index changing along with the time.This index carrys out weighted mean according to people's dispatch at that time.
The entity Emotion tagging module of being correlated with: the entity recognizing for each, we also provide corresponding emotion mark, so that relevant emotion and the comment of future Query entity.As the discussion temperature to millet mobile phone and comment quality can contrasted.
Microblogging sentence Emotion tagging module: every microblogging is carried out to Emotion tagging.
User feeling labeling module: each user's different time is carried out to Emotion tagging.
community analysis
Community analysis comprises following module: characteristic extracting module (Feature Extractor), training community mining parameter module (Training Community Mining Parameters), the community mining help module that customer analysis provides (User Based Community Mining Helper), model module (Model), community's evolution analysis module (Community Evolving Analyzer), community's statistical module (Community Statistics), user in predicting help module (Community based User Prediction Helper).
Characteristic extracting module: characteristic extracting module is the feature that community mining module is extracted each user, comprises user send the documents text, bean vermicelli, concern, group, interest label etc.
Training community mining parameter module: according to different community mining demands, propose Clustering Model, carry out parameter adjusting according to data historical or artificial mark.For example in community mining, different communities has overlapping user, and we can control parameter and be adjusted in permission system in great degree and keep the overlapping of community.
The community mining help module that customer analysis provides: customer analysis and user property predict that the feature that provides extra is for community mining.Age of predictive user for example, the interest of predictive user, possible label, whether prediction is the user of robot etc.Utilize these information of forecastings can help community mining better to find possible agglomerate.
Model module: model has and can, automatically by new user's classification (as robot filters), user be recommended to the functions such as most possible community, group.
Community's statistical module: community's statistics can help us to observe the feature of community from a plurality of angles, if size, the common interest that connects Du, community, community are to the emotion of particular event, attitude, viewpoint etc.
Community's evolution analysis module: the differentiation module of community coordinates community's statistical module, can help us to find the variation of user's number of particular community, the interests change that user is common, and the variation of the connection between user (pay close attention to, bean vermicelli) degree, and the variation etc. of microblogging emotion and viewpoint in Community Group.
User in predicting help module: the method for the result of community analysis by Collaborative Recommendation can help we predict certain specific user property, to reaction of certain event etc.
user property prediction
User property prediction comprises following module: characteristic extracting module (Feature Extractor), training user in predicting model module (Training User Prediction), the user in predicting help module (Community based User Prediction Helper) of community mining output, model module (Model), user's statistical module (User Statistics), advertisement and commending system help module (Advertising/Recommendation Helper), community mining help module (User based Community Mining Helper).
Characteristic extracting module: feature extraction comprises user's bean vermicelli, concern, tag label, dispatch content etc.
Training user in predicting model module: if this module according to the result of feature extraction and community mining output come predictive user attribute as age bracket, job specification, the same day mood buy stock and buy or sell etc.
The user in predicting help module of community mining output: this module is the Output rusults of community analysis, the Collaborative Recommendation by community provides more feature to training pattern.
Model module: model module comprehensively judges user's attribute etc. according to the result of training.
User's statistical module: user's statistical module produces the statistical property of predictive user information in whole microblogging or certain community.Although individual user's prediction is likely not accurate enough, in set rank, can find out more significant information.
Advertisement and commending system help module: for promoting this project, we can give the application of specific customer group and community's recommendering folder project.Advertisement and commending system module help us to select likely to use the user of our project.
Community mining help module: the Output rusults of user in predicting module can help the more effective community that finds of community mining, this module provides how possible feature for community mining.
time series analysis
Time series analysis mainly comprises following module: time series is cut apart (Segmentation), peak value valley detects (Peak/Valley Detection), correlation analysis (Correlation), assists whole and time series causality analysis analysis (Co-integration/Lead-lag analysis).
Time series is cut apart: time series is cut apart the time series that module is responsible for cutting topic, entity, message etc. or finance data formation, finds and has periodically or ageing part, for customer analysis.
Peak value valley detects: peak value valley detects the detection analysis etc. of being responsible for finding seasonal effect in time series Wave crest and wave trough, critical event.
Correlation analysis: correlation analysis module is responsible for finding the time series with strong correlation, is convenient to user search and analysis.
Assist whole and time series causality analysis analysis: assisting whole is means conventional in financial analysis, is used for analyzing two time serieses and whether has correlativity.Different in correlation analysis, it allows two time serieses to have difference in some position.In addition, assisting whole analysis is the back of judgement time series causality analysis.Causality analysis is commonly used to the Lead-lag of sequence analysis time, and then finds that sequence leads over another one sequence.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (5)

1. the economy and finance behavioural analysis system based on social media, it is characterized in that, system comprises three major types module: reptile, database and index and analyzer, reptile is mainly responsible for data acquisition, database is divided into two parts, structural data and unstructured data, according to the data message gathering, when setting up index, each user and every microblogging are set to Yi Ge Global ID, with this, information in disparate databases is alignd and retrieved, analyzer is the core of system, comprise 6 submodules, respectively: topic analysis, Entity recognition, action recognition, message is followed the tracks of, sentiment analysis and Community Clustering analysis.
2. the economy and finance behavioural analysis system based on social media according to claim 1, is characterized in that, detects the information in various territories to having the microblogging of identical ID:
Topic: topic is set up index with label, for whole microblogging data, we provide fixing some large category information, for every microblogging data, we mark its classification information, in addition, microblogging can belong to multiclass, and therefore for topic territory, we need to set up the mapping of one-to-many;
Entity: entity comprises the fixedly noun phrase in name, place name, mechanism's name etc. and some common-use words, and for entity, we need to mark classification and the entity title of entity, and recording user ID and microblogging ID;
Action: for action, we need to mark triplet information, main body, moves, target, and recording user ID and microblogging ID;
Message: if the message of reprinting, we need to store its user ID of being reprinted, microblogging ID etc.;
According to above information, when setting up index, each user and every microblogging are set to Yi Ge Global ID, with this, information in disparate databases is alignd and retrieved.
3. the economy and finance behavioural analysis system based on social media according to claim 1, it is characterized in that, data, services and expert system in the system providing based on analysis data out is also provided described system, is used to user that more professional knowledge and information pushing is provided.
4. the economy and finance behavioural analysis system based on social media according to claim 3, is characterized in that, data, services comprises the content of the following aspects:
Market sentiment index: by allly carrying out sentiment analysis with the relevant microbloggings of socio-economic activity every day, obtain a market sentiment index, and announce every day, to improve influence power;
Critical event detects: to critical event in microblogging, especially accident detects, for user provides early warning and prompting the very first time;
Personage's liveness, key person excavate: based on the excavation of topic and event is excavated most active people in wherein discussing, by dispatch, add up, the rank that the equifrequent statistics of temperature provides focus personage is reprinted and replied to article;
User profile statistics and prediction: age, sex, interest, position: for everyone who occurs, carry out the not statistics of ipsilateral attribute in topic, we can obtain some information by the interface in open platform, and we can be excavated some attribute and be predicted by each user's dispatch content;
Time series correlation analysis: for topic, entity, action and message, and their corresponding emotion index, set up a time series, between these time serieses and important economic target, stock and index thereof, some correlativitys be can excavate, user index or certain the maximally related text time series of stock offered for analysis;
Network evolution is analyzed: for different topics, provide different network evolution analyses, these network evolution results also can effectively get Useful Information when helping user to the social economy behavioural analysis of internet.
5. the economy and finance behavioural analysis system based on social media according to claim 3, is characterized in that, expert system is to have gathered a series of suggestions and the solution that all analytical technologies provide, and comprising:
The many empty judgements in stock market: by the statistics to historical data, the motion that obtains those crucial entities, action, message and the emotion Hui He stock market that they are relevant produces correlativity;
Practical work automatic analysis after dish: analyze for the later result of closing the same day, by excavating historical data, find the critical event that likely affects tendency on the same day;
Network is discussed analysis warmly: the topic of discussing warmly for network carries out analysis and prediction.
CN201310469922.2A 2013-10-10 2013-10-10 Financial behavior analyzing system based on social media calculation Pending CN103559207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310469922.2A CN103559207A (en) 2013-10-10 2013-10-10 Financial behavior analyzing system based on social media calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310469922.2A CN103559207A (en) 2013-10-10 2013-10-10 Financial behavior analyzing system based on social media calculation

Publications (1)

Publication Number Publication Date
CN103559207A true CN103559207A (en) 2014-02-05

Family

ID=50013454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310469922.2A Pending CN103559207A (en) 2013-10-10 2013-10-10 Financial behavior analyzing system based on social media calculation

Country Status (1)

Country Link
CN (1) CN103559207A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902716A (en) * 2014-04-08 2014-07-02 上海交通大学 Method for analyzing and publishing community-based socialized media topics
CN104063450A (en) * 2014-06-23 2014-09-24 百度在线网络技术(北京)有限公司 Hot spot information analyzing method and equipment
CN105405051A (en) * 2015-12-18 2016-03-16 百度在线网络技术(北京)有限公司 Financial event prediction method and apparatus
CN105653833A (en) * 2014-11-12 2016-06-08 腾讯科技(深圳)有限公司 Method and device for recommending game community
CN105938481A (en) * 2016-04-07 2016-09-14 北京航空航天大学 Anomaly detection method of multi-mode text data in cities
CN106296312A (en) * 2016-08-30 2017-01-04 江苏名通信息科技有限公司 Online education resource recommendation system based on social media
CN106991488A (en) * 2015-11-16 2017-07-28 Uberple有限公司 The relevance appraisal procedure and its device of keyword and assets value
CN107729455A (en) * 2017-09-25 2018-02-23 山东科技大学 A kind of social network opinion leader sort algorithm based on multidimensional characteristic analysis
CN107945034A (en) * 2017-11-17 2018-04-20 平安科技(深圳)有限公司 Financial analysis method, application server and computer-readable recording medium based on microblogging finance and economics event
CN108416644A (en) * 2017-02-09 2018-08-17 富士通株式会社 Information output method and information output apparatus
CN109214454A (en) * 2018-08-31 2019-01-15 东北大学 A kind of emotion community classification method towards microblogging
CN109598380A (en) * 2018-12-03 2019-04-09 郑州云海信息技术有限公司 A kind of method and system of polynary real-time time series data prediction
CN110245984A (en) * 2019-06-09 2019-09-17 广东工业大学 A kind of shopping at network behavior analysis method and system based on causal inference
CN110991852A (en) * 2019-11-27 2020-04-10 国网能源研究院有限公司 Corporate image promotion system architecture based on social media big data
JP2020129232A (en) * 2019-02-07 2020-08-27 株式会社日本総合研究所 Machine learning device, program, and machine learning method
TWI742328B (en) * 2018-12-12 2021-10-11 中華電信股份有限公司 Financial risk management device and financial risk management method
US11176619B1 (en) * 2015-08-27 2021-11-16 Hrb Innovations, Inc. Tax interview with third-party data source integration
CN114090771A (en) * 2021-10-19 2022-02-25 广州数说故事信息科技有限公司 Big data based propagation proposition and consumer story analysis method and system
CN114866264A (en) * 2021-01-19 2022-08-05 上海观安信息技术股份有限公司 DGA domain name detection and family clustering method based on semi-supervised learning algorithm
US11494792B2 (en) 2020-03-19 2022-11-08 Kyndryl, Inc. Predictive decision making based on influence identifiers and learned associations

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258034A (en) * 2013-05-14 2013-08-21 江苏名通信息科技有限公司 Economic and financial behavior analysis system model based on social media

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258034A (en) * 2013-05-14 2013-08-21 江苏名通信息科技有限公司 Economic and financial behavior analysis system model based on social media

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周胜臣等: "中文微博情感分析研究综述", 《计算机应用与软件》, vol. 30, no. 3, 31 March 2013 (2013-03-31) *
梁昌勇等: "电子商务推荐系统中群体用户推荐问题研究", 《中国管理科学》, vol. 21, no. 3, 30 June 2013 (2013-06-30) *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902716A (en) * 2014-04-08 2014-07-02 上海交通大学 Method for analyzing and publishing community-based socialized media topics
CN104063450B (en) * 2014-06-23 2018-04-03 百度在线网络技术(北京)有限公司 Hot information analysis method and equipment
CN104063450A (en) * 2014-06-23 2014-09-24 百度在线网络技术(北京)有限公司 Hot spot information analyzing method and equipment
CN105653833B (en) * 2014-11-12 2019-04-26 腾讯科技(深圳)有限公司 A kind of method and device that game community is recommended
CN105653833A (en) * 2014-11-12 2016-06-08 腾讯科技(深圳)有限公司 Method and device for recommending game community
US11176619B1 (en) * 2015-08-27 2021-11-16 Hrb Innovations, Inc. Tax interview with third-party data source integration
CN106991488A (en) * 2015-11-16 2017-07-28 Uberple有限公司 The relevance appraisal procedure and its device of keyword and assets value
CN105405051B (en) * 2015-12-18 2020-09-25 百度在线网络技术(北京)有限公司 Financial event prediction method and device
CN105405051A (en) * 2015-12-18 2016-03-16 百度在线网络技术(北京)有限公司 Financial event prediction method and apparatus
CN105938481A (en) * 2016-04-07 2016-09-14 北京航空航天大学 Anomaly detection method of multi-mode text data in cities
CN106296312A (en) * 2016-08-30 2017-01-04 江苏名通信息科技有限公司 Online education resource recommendation system based on social media
CN108416644A (en) * 2017-02-09 2018-08-17 富士通株式会社 Information output method and information output apparatus
CN107729455A (en) * 2017-09-25 2018-02-23 山东科技大学 A kind of social network opinion leader sort algorithm based on multidimensional characteristic analysis
CN107945034A (en) * 2017-11-17 2018-04-20 平安科技(深圳)有限公司 Financial analysis method, application server and computer-readable recording medium based on microblogging finance and economics event
CN109214454B (en) * 2018-08-31 2021-07-06 东北大学 Microblog-oriented emotion community classification method
CN109214454A (en) * 2018-08-31 2019-01-15 东北大学 A kind of emotion community classification method towards microblogging
CN109598380A (en) * 2018-12-03 2019-04-09 郑州云海信息技术有限公司 A kind of method and system of polynary real-time time series data prediction
TWI742328B (en) * 2018-12-12 2021-10-11 中華電信股份有限公司 Financial risk management device and financial risk management method
JP2020129232A (en) * 2019-02-07 2020-08-27 株式会社日本総合研究所 Machine learning device, program, and machine learning method
JP7280705B2 (en) 2019-02-07 2023-05-24 株式会社日本総合研究所 Machine learning device, program and machine learning method
CN110245984A (en) * 2019-06-09 2019-09-17 广东工业大学 A kind of shopping at network behavior analysis method and system based on causal inference
CN110991852A (en) * 2019-11-27 2020-04-10 国网能源研究院有限公司 Corporate image promotion system architecture based on social media big data
US11494792B2 (en) 2020-03-19 2022-11-08 Kyndryl, Inc. Predictive decision making based on influence identifiers and learned associations
CN114866264A (en) * 2021-01-19 2022-08-05 上海观安信息技术股份有限公司 DGA domain name detection and family clustering method based on semi-supervised learning algorithm
CN114090771A (en) * 2021-10-19 2022-02-25 广州数说故事信息科技有限公司 Big data based propagation proposition and consumer story analysis method and system
CN114090771B (en) * 2021-10-19 2024-07-23 广州数说故事信息科技有限公司 Big data-based propagation claim and consumer story analysis method and system

Similar Documents

Publication Publication Date Title
CN103559207A (en) Financial behavior analyzing system based on social media calculation
Karami et al. Twitter and research: A systematic literature review through text mining
Yang et al. Twitter financial community sentiment and its predictive relationship to stock market movement
Sun et al. A novel stock recommendation system using Guba sentiment analysis
US9317594B2 (en) Social community identification for automatic document classification
Sun et al. Predicting stock price returns using microblog sentiment for chinese stock market
CN103793503A (en) Opinion mining and classification method based on web texts
CN103258034A (en) Economic and financial behavior analysis system model based on social media
Chen et al. From opinion mining to financial argument mining
Smailović Sentiment analysis in streams of microblogging posts
Salari et al. Estimation of 2017 Iran’s presidential election using sentiment analysis on social media
Ennaji et al. Social intelligence framework: Extracting and analyzing opinions for social CRM
Seilsepour et al. 2016 olympic games on twitter: Sentiment analysis of sports fans tweets using big data framework
Ali et al. Big social data as a service (BSDaaS): a service composition framework for social media analysis
Zhou et al. Security topics related microblogs search based on deep convolutional neural networks
Saputra et al. C4. 5 and naive bayes for sentiment analysis Indonesian Tweet on E-Money user during pandemic
Walker et al. Big data and big business: Should statisticians join in?
Bansal et al. Cryptocurrency price prediction using Twitter and news articles analysis
Zhuo et al. How are texts analyzed in blockchain research? A systematic literature review
Jadhav et al. Twitter Intention Classification Using Bayes Approach for Cricket Test Match Played Between India and South Africa 2015
Sayin et al. Identifying specific interest areas of Twitter users tweeting about cryptocurrencies
Brown An aggregated sparse matrix factorisation model for market trading
Derouiche et al. Impact of Tweets’ Sentiment Upon Stock Prices of Sport Companies: Can Fans Influence the Share Price of Their Preferred Sport Brand?
Das et al. Optimizing Social Media Data Using Genetic Algorithm
Evans A Smart Data Ecosystem for the Monitoring of Financial Market Irregularities

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140205