CN109165349A - Securities data monitoring method, apparatus and system - Google Patents

Securities data monitoring method, apparatus and system Download PDF

Info

Publication number
CN109165349A
CN109165349A CN201810957428.3A CN201810957428A CN109165349A CN 109165349 A CN109165349 A CN 109165349A CN 201810957428 A CN201810957428 A CN 201810957428A CN 109165349 A CN109165349 A CN 109165349A
Authority
CN
China
Prior art keywords
data
network data
network
classification
securities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810957428.3A
Other languages
Chinese (zh)
Inventor
黄毅
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yongyisi Information Technology Co Ltd
Original Assignee
Nanjing Yongyisi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yongyisi Information Technology Co Ltd filed Critical Nanjing Yongyisi Information Technology Co Ltd
Priority to CN201810957428.3A priority Critical patent/CN109165349A/en
Publication of CN109165349A publication Critical patent/CN109165349A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a kind of securities data monitoring methods, apparatus and system.Wherein method and step includes: acquisition network data relevant to securities information;The network data is pre-processed;Classify according to preset data model to the network data;The data classification report of different dimensions is generated based on classification.Using the above scheme, the whole network real time data inspecting depth can be carried out and vertically grab target data, and the calculation systems such as sentiment analysis, incidence relation extraction, just negative judgement are carried out to data.And machine learning techniques can be cooperated to be constantly progressive in the analysis of continuous data by system, accurate securities data report is finally generated according to industry rule.

Description

Securities data monitoring method, apparatus and system
Technical field
The present invention relates to debt management fields, and in particular to a kind of securities data monitoring method, apparatus and system.
Background technique
Stock market's public sentiment has the characteristics that magnanimity, polynary, changeable, complicated.In this case, if manually going to screen each The concrete condition of public sentiment simultaneously is subject to statistic of classification to analyze to be unpractical.It at this moment can be by network public sentiment monitoring system, to stock City implements the monitoring of the whole network all the period of time, to understand the trend and trend of stock market's public sentiment, is analysed in depth, is obtained newest comprehensively to it Stock market's trend, generate Stock Market Analysis report.
Data acquisition aspect is mainly reflected in for the monitoring technology of stock market's public sentiment industry at present.Existing technology is to crawl To information handled inaccurate, also effectively can not show efficient analysis data to users such as industry specialists.
Summary of the invention
To solve the above problems, the present invention provides a kind of securities data monitoring method, comprising: acquisition is related to securities information Network data;The network data is pre-processed;Classify according to preset data model to the network data; The data classification report of different dimensions is generated based on classification.
In a possible embodiment, it is described to network data carry out pretreatment include, to the network data carry out Duplicate removal, and sensitive data is shielded using participle technique.
In a possible embodiment, the duplicate removal includes being sentenced using Google BigTable big data framework technology Whether the network data of breaking repeats, while carrying out similarity comparison to network data newly-increased daily with timer, removes The high network data of similarity.
In a possible embodiment, described that classification packet is carried out to the network data according to preset data model It includes, theme training is carried out to different classification using LDA and SVM algorithm, and obtain the data model;Then by described Data model is that the network data is classified.
In a possible embodiment, the method also includes receiving feedback information, according to the feedback information to institute Data model is stated to be trained and calibrate.
The embodiment of the invention also discloses a kind of securities data monitoring devices, comprising: acquisition module, for acquisition and security The relevant network data of information;First processing module, for being pre-processed to the network data;Second processing module is used In classifying according to preset data model to the network data;Generation module, for generating different dimensions based on classification Data classification report.
In a possible embodiment, the first processing module further includes that duplicate removal unit is used for the network number According to progress duplicate removal;Screening unit is used to shield sensitive data using participle technique.
In a possible embodiment, the duplicate removal unit is also used to, using Google BigTable big data frame Technology judges whether the network data repeats, while carrying out similarity ratio to network data newly-increased daily with timer It is right, remove the high network data of similarity.
In a possible embodiment, the Second processing module is also used to, using LDA and SVM algorithm to different Classification carries out theme training, and obtains the data model;It then is that the network data is divided by the data model Class.
The embodiment of the invention also discloses a kind of systems, including aforementioned described in any item devices and interaction end.
Compared with prior art, technical solution of the present invention has the advantage that
It using the above scheme, can using technologies such as distributed reptile data grabber, NLP natural language processing, machine learning It carries out the whole network real time data inspecting depth and vertically grabs target data, and sentiment analysis is carried out to data, incidence relation extracts, just The calculation systems such as negative judgement.And machine learning techniques can be cooperated to be constantly progressive in the analysis of continuous data by system, most Accurate securities data report is generated according to industry rule eventually.
The embodiment of the present invention carries out the monitoring of targeted website and whole network data with intelligentized system, carries out data point Analysis is extracted.The data extracted can only be handled by being presented to the user newest process, allowed user to save out and looked into the past in each website It sees webpage and collects the time and efforts of data.And generate data summarization, high frequency lexical analysis etc., it liberates the productive forces and allows for expert The thing that expert can have more energy to do profession.
Detailed description of the invention
Fig. 1 is a kind of securities data monitoring network configuration diagram in the embodiment of the present invention;
Fig. 2 is a kind of securities data monitoring method flow chart in the embodiment of the present invention;
Fig. 3 is a kind of securities data detection device block diagram in the embodiment of the present invention.
Fig. 4 is the first data processing module block diagram in the embodiment of the present invention.
Specific embodiment
Such as Fig. 1, a kind of securities data monitoring network framework is disclosed, which includes storing from bottom to top including data Layer, data service layer, represent layer and client layer.Wherein, data storage layer be used for store from network crawl data or from its The data that his external data base obtains;Data service layer includes data mining platform and data computing platform, data mining platform It can be used for carrying out data pick-up, data cleansing and data distribution to the data in data storage layer;Data computing platform is for dividing Analysis calculates data, such as by being analyzed and processed with intelligent algorithm to data.
Presentation layer, which can be used for presenting the processed data of data computing platform analysis, can be used for root in one embodiment The data after analysis are shown according to different classification dimension, wherein classification dimension may include market, stock market's relevant knowledge, high frequency in the industry Vocabulary etc..
Client layer can be used for receiving the feedback of user, and be sent to presentation layer.
Such as Fig. 2, it is based on foregoing structure, the embodiment of the invention discloses a kind of securities data monitoring methods, comprising:
S100 acquires network data relevant to securities information.
In one embodiment, can depth vertically grab network data and be stored in data storage layer.Specifically, to the whole network number It is crawled according to real-time deep is carried out, crawled including real-time deep and the daily crawl of mass data is analyzed and stored.
In one embodiment, network data includes finance and economics, finance, security, stock market, Macro-policy, international finance The relevant webpage information such as the general trend of market development, such as: east wealth, middle golden online, phoenix finance and economics, Homeway.com, Sina's finance and economics, Sohu The types website such as finance and economics, China's economic net, snowball finance and economics, China Securities net,.
Central data warehouse is constructed by acquisition data, incremental data and daily full dose can be stored entirely in this by system In data warehouse, supports Data Mart and data cube, rotate, be sliced, drill through, online on-line equiries.
S102 pre-processes the network data.
In one embodiment, the pretreatment may include duplicate removal and screening, including sensitive data shielding.Specifically, Duplicate removal can be realized using Google BigTable big data framework technology principle, judge to mark using web page title as uniqueness Know, judges whether content has repetition by title, while backstage has timer that content newly-increased daily is done full dose comparison phase on the 7th Like degree, if it find that title similarity it is high can also remove repetition related web page contents.
For screening, the supervised study in machine learning can be used, correlation article is established multiple classification, it is then right Network data is classified automatically.
In one embodiment, participle technique can be used to shield sensitive data.
S104 classifies to the network data according to preset data model.
In one embodiment, using technologies such as machine learning, data minings to data comparison processing.
Corresponding different share certificate analysis plates, it is desirable to be able to accomplish to classify automatically, according to the theme of article and article Therefore title using machine learning algorithm, is established to technologies such as data minings to data comparison processing.
In one embodiment, theme training is carried out to different classification using LDA and SVM algorithm, and obtains the number According to model;It then is that the network data is classified by the data model.Specifically, being by manually precipitating one first Partial data about 3000, while pre-set specific classification and the good specific feature of mark, using LDA and SVM algorithm carries out theme training to different classification, and trained result is obtained a data model, is then learnt by this The training pattern obtained goes to judge to predict which classification is this network data belong to or whether have correlation again.In one embodiment In, data sample can constantly be learnt and trained, the probable deviation obtained in this way will be more next by the data volume of increasing sample Smaller, the result accuracy rate obtained will be higher and higher.
In one embodiment, Word2Vec algorithm, including CBOW (continuity bag of words) and Skip-gram can be used.It is right In CBOW, target is to predict individual word in the case where given neighbouring word.Skip-gram is then opposite: by giving one Individual word predicts the word of some range.Two methods all use artificial neural network (Artificial Neural Networks) as their sorting algorithm.
Firstly, each word in vocabulary is random N-dimensional vector.In the training process, algorithm can utilize CBOW Or Skip-gram learns the optimal vector of each word.These term vectors are considered that the context of context now. This can be regarded as the relationship that word is excavated using basic algebraic expression.The input that these term vectors can be used as sorting algorithm comes Predict emotion, the method for being different from bag of words.
S106 generates the data classification report of different dimensions based on classification.
In one embodiment, the Word2Vec algorithm that Google can be used is timed analysis newly to mass network data Just negative vocabulary propagation degree analysis etc. using the analysis of NLP part of speech and word frequency analysis to news article is heard, user is generated different The analysis report of dimension
In one embodiment, classification dimension may include hot topic, trend, event, high frequency words etc..
Hot topic detection can refer to according to parameters, discoveries such as weight, the intensities of time limit of speech in web data source The hot topic fixed time in section;And whole semantic analysis can be carried out according to subject key words and money order receipt to be signed and returned to the sender number, it identifies all Sensitive subjects.
Topic classification can cluster webpage article, analyze which industry belonging to article.
Trend analysis, can be for the temperature in some subject analysis people in different times section.
Event analysis can carry out comprehensive analysis spanning space-time to emergency event, obtain the overall picture of whole event generation simultaneously The trend of the next step of event is predicted.
High frequency words can accurately extract present period high frequency words in multiple web page news, plate.
In one embodiment, preceding method further includes that S208 receives feedback information, according to the feedback information to described Data model is trained and calibrates.Specifically, expert, which checks that data classification is reported and carries out feedback, promotes machine learning.If There is the judgement of inaccuracy, expert, which checks and carries out feedback, promotes machine learning, calibrates, can lead to the precision of system prediction The continuous deep learning of expert feedback content is crossed, more accurately report is generated.
The embodiment of the present invention, which passes through, to be replaced artificially collecting a large amount of internet financial informations, collects internet by technological means Various financial informations and main industries website data, the knowledge mapping system for establishing systematization sentence the expected industry of stock Event, people occur for the fact that break, change by the potential information prediction stock market following recently, pass through current and past The probabilistic forecasting that object, time, data are found out correlation and may be occurred.Share certificate industry point is helped by artificial intelligence technology It analyses expert and saves plenty of time and manpower, increase the productivity of acquisition of information.
Corresponding to the above method, a kind of securities data monitoring device 30 is also disclosed in the embodiment of the present invention, such as Fig. 3 institute Show, including, acquisition module 301, for acquiring network data relevant to securities information;First processing module 302, for institute Network data is stated to be pre-processed;Second processing module 303, for being carried out according to preset data model to the network data Classification;Generation module 304, for generating the data classification report of different dimensions based on classification.
In one embodiment, if Fig. 4, first processing module 302 further include that duplicate removal unit 3021 is used for the network Data carry out duplicate removal;Screening unit 3022 is used to shield sensitive data using participle technique.
In one embodiment, the duplicate removal unit 3021 is also used to, using Google BigTable big data frame skill Art judges whether the network data repeats, while carrying out similarity comparison to network data newly-increased daily with timer, Remove the high network data of similarity.
In one embodiment, the Second processing module 303 is also used to, using LDA and SVM algorithm to different classification Theme training is carried out, and obtains the data model;It then is that the network data is classified by the data model.
The device corresponds to preceding method embodiment, and specific descriptions above-mentioned are also applied for the device, therefore repeat no more.
The embodiment of the invention also discloses a kind of security to monitor system, including aforementioned security detection device 30 and interaction end, Wherein interaction end can make Mobile portable equipment, such as mobile phone, plate, removable computer, can also make to be desktop computer, display device Deng.Interaction end can be used for presenting the analysis report of the generation of security monitoring device 30, also can receive user's such as industry specialists Feedback information.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: ROM, RAM, disk or CD etc..
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Subject to the range of restriction.
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates may exist Three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Separately Outside, character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is The specific work process of system, device and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit, Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be with In conjunction with or be desirably integrated into another system, or some features can be ignored or not executed.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks On unit.It can select some or all of unit therein according to the actual needs to realize the mesh of the embodiment of the present invention 's.

Claims (10)

1. a kind of securities data monitoring method, which is characterized in that the described method includes:
Acquire network data relevant to securities information;
The network data is pre-processed;
Classify according to preset data model to the network data;
The data classification report of different dimensions is generated based on classification.
2. the method as described in claim 1, which is characterized in that it is described to network data carry out pretreatment include, to the net Network data carry out duplicate removal, and are shielded using participle technique to sensitive data.
3. method according to claim 2, which is characterized in that the duplicate removal includes, using Google BigTable big data Framework technology judges whether the network data repeats, while carrying out similarity to network data newly-increased daily with timer It compares, removes the high network data of similarity.
4. the method as described in claim 1, which is characterized in that it is described according to preset data model to the network data into Row classification includes carrying out theme training to different classification using LDA and SVM algorithm, and obtain the data model;Then lead to Crossing the data model is that the network data is classified.
5. the method as described in claim 1, which is characterized in that the method also includes receiving feedback information, according to described anti- Feedforward information is trained and calibrates to the data model.
6. a kind of securities data monitoring device characterized by comprising
Acquisition module, for acquiring network data relevant to securities information;
First processing module, for being pre-processed to the network data;
Second processing module, for classifying according to preset data model to the network data;
Generation module, for generating the data classification report of different dimensions based on classification.
7. device as claimed in claim 6, which is characterized in that the first processing module further includes,
Duplicate removal unit is used to carry out duplicate removal to the network data;
Screening unit is used to shield sensitive data using participle technique.
8. device as claimed in claim 7, which is characterized in that the duplicate removal unit is also used to, using Google BigTable Big data framework technology judges whether the network data repeats, while carrying out with timer to network data newly-increased daily Similarity compares, and removes the high network data of similarity.
9. device as claimed in claim 6, which is characterized in that the Second processing module is also used to, and is calculated using LDA and SVM Method carries out theme training to different classification, and obtains the data model;It then is the network by the data model Data are classified.
10. a kind of system, which is characterized in that including device and interaction end described in any one of claim 6-10.
CN201810957428.3A 2018-08-22 2018-08-22 Securities data monitoring method, apparatus and system Pending CN109165349A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810957428.3A CN109165349A (en) 2018-08-22 2018-08-22 Securities data monitoring method, apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810957428.3A CN109165349A (en) 2018-08-22 2018-08-22 Securities data monitoring method, apparatus and system

Publications (1)

Publication Number Publication Date
CN109165349A true CN109165349A (en) 2019-01-08

Family

ID=64896498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810957428.3A Pending CN109165349A (en) 2018-08-22 2018-08-22 Securities data monitoring method, apparatus and system

Country Status (1)

Country Link
CN (1) CN109165349A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905581A (en) * 2021-03-22 2021-06-04 杭州联众医疗科技股份有限公司 Machine learning data storage system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN104573016A (en) * 2015-01-12 2015-04-29 武汉泰迪智慧科技有限公司 System and method for analyzing vertical public opinions based on industry
CN108334591A (en) * 2018-01-30 2018-07-27 天津中科智能识别产业技术研究院有限公司 Industry analysis method and system based on focused crawler technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN104573016A (en) * 2015-01-12 2015-04-29 武汉泰迪智慧科技有限公司 System and method for analyzing vertical public opinions based on industry
CN108334591A (en) * 2018-01-30 2018-07-27 天津中科智能识别产业技术研究院有限公司 Industry analysis method and system based on focused crawler technology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905581A (en) * 2021-03-22 2021-06-04 杭州联众医疗科技股份有限公司 Machine learning data storage system

Similar Documents

Publication Publication Date Title
Özgür et al. A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015
To et al. On identifying disaster-related tweets: Matching-based or learning-based?
US10387784B2 (en) Technical and semantic signal processing in large, unstructured data fields
Thongsatapornwatana A survey of data mining techniques for analyzing crime patterns
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
Hidayatullah et al. Road traffic topic modeling on Twitter using latent dirichlet allocation
CN107577688A (en) Original article influence power analysis system based on media information collection
CN111813960B (en) Knowledge graph-based data security audit model device, method and terminal equipment
CN106227756A (en) A kind of stock index forecasting method based on emotional semantic classification and system
CN104809108A (en) Information monitoring and analyzing system
CN111738843B (en) Quantitative risk evaluation system and method using running water data
Whitney et al. Don’t want to get caught? don’t say it: The use of emojis in online human sex trafficking ads
Pota et al. A subword-based deep learning approach for sentiment analysis of political tweets
Agarwal et al. Sentiment Analysis in Stock Price Prediction: A Comparative Study of Algorithms
Souai et al. Predicting at-risk students using the deep learning blstm approach
Wang et al. Early signals of trending rumor event in streaming social media
CN109165349A (en) Securities data monitoring method, apparatus and system
Hardaya et al. Application of text mining for classification of community complaints and proposals
Chen et al. Improving the forecasting and classification of extreme events in imbalanced time series through block resampling in the joint predictor-forecast space
Laya et al. Classification of natural disaster on online news data using machine learning
CN104809253A (en) Internet data analysis system
Bharathi et al. A supervised learning approach for criminal identification using similarity measures and K-Medoids clustering
Bhardwaj et al. Machine learning techniques based exploration of various types of crimes in India
Baig et al. An AODE-based intrusion detection system for computer networks
Prasad et al. Analysis and prediction of crime against woman using machine learning techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190108