CN104809253B - Internet data analysis system - Google Patents
Internet data analysis system Download PDFInfo
- Publication number
- CN104809253B CN104809253B CN201510257964.9A CN201510257964A CN104809253B CN 104809253 B CN104809253 B CN 104809253B CN 201510257964 A CN201510257964 A CN 201510257964A CN 104809253 B CN104809253 B CN 104809253B
- Authority
- CN
- China
- Prior art keywords
- text
- identified
- value
- vocabulary
- correlation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 239000010749 BS 2869 Class C1 Substances 0.000 claims description 3
- 239000010750 BS 2869 Class C2 Substances 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims 1
- 238000012544 monitoring process Methods 0.000 abstract description 23
- 238000004458 analytical method Methods 0.000 abstract description 14
- 239000011159 matrix material Substances 0.000 description 10
- 230000035945 sensitivity Effects 0.000 description 10
- 230000036651 mood Effects 0.000 description 7
- 238000000034 method Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 206010027940 Mood altered Diseases 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (1)
- A kind of 1. internet data analysis system, it is characterised in that including:Correlation calculations module, for using randomly selected text to be identified and the remaining text to be identified of being chosen as sight Sequencing row and status switch, calculate selected correlation probabilities value between text to be identified and remaining text to be identified;Classification and identification module, for correlation highest text in status switch and selected text to be identified to be merged, characterize For the first kind, while using the minimum text of correlation as Second Type;Using the first and second types as new state sequence Row, remaining text to be identified are iterated as new observation sequence, to realize the identification of sensitive vocabulary;The correlation calculations module is further used for:By y1, y2..., ynAs sensitive vocabulary type feature, y={ y1, y2..., yiAs vector space model represent one The type of individual sensitive vocabulary;By x1, x2..., xnAs the feature of text to be identified, x={ x1, x2..., xiIt is to use vector space The text to be identified that model represents, observation sequence x correspond to parameter sets Λ={ λ1..., λjDesignated state y condition Probability is:Wherein:fjIt is characterized function;λjFor the weights by training obtained characteristic function;Z (x) is regularization coefficient, and n is quick Feel lexical types feature and the dimension of text feature to be identified, and:The classification and identification module are further configured to:Choose 1 at random from K texts to be identified as s, remaining K-1 texts to be identified of observation list entries as K- 1 output class status Bar, the probable value between document in the document and output sequence in list entries is calculated, until identifying There is the type of sensitive vocabulary:A) K-1 obtained probable value is sorted, the text corresponding to most probable value is returned with the text in input observation sequence And for one kind and it is denoted as class C1, while text corresponding to minimum probability value is denoted as class C2;B) using remaining K-3 texts to be identified as input observation sequence, C1And C2As output class status Bar, so obtain Text to be identified is under the jurisdiction of C1And C2Two probable values of class;C) variance is asked to each probable value of each text to be identified and output class status Bar and sorted;D) all probable values of the text corresponding to minimum variance value are checked, if wherein minimum probable value is less than a certain threshold θ, Then as a new class C3;Otherwise, check that variance yields is located at deputy text, be less than threshold value until finding probable value θ text, while the text corresponding to maximum variance value is integrated into the type corresponding to maximum probability;E) repeat step b)~d), until all texts are all classified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510257964.9A CN104809253B (en) | 2015-05-20 | 2015-05-20 | Internet data analysis system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510257964.9A CN104809253B (en) | 2015-05-20 | 2015-05-20 | Internet data analysis system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104809253A CN104809253A (en) | 2015-07-29 |
CN104809253B true CN104809253B (en) | 2017-12-08 |
Family
ID=53694075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510257964.9A Active CN104809253B (en) | 2015-05-20 | 2015-05-20 | Internet data analysis system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104809253B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105610640B (en) * | 2015-12-21 | 2019-09-24 | 中国电子科技集团公司第十五研究所 | A kind of method and apparatus of internet information spreading path reduction |
CN105893582B (en) * | 2016-04-01 | 2019-06-28 | 深圳市未来媒体技术研究院 | A kind of social network user mood method of discrimination |
CN109034389A (en) * | 2018-08-02 | 2018-12-18 | 黄晓鸣 | Man-machine interactive modification method, device, equipment and the medium of information recommendation system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1158460A (en) * | 1996-12-31 | 1997-09-03 | 复旦大学 | Multiple languages automatic classifying and searching method |
CN101727500A (en) * | 2010-01-15 | 2010-06-09 | 清华大学 | Text classification method of Chinese web page based on steam clustering |
CN104216954B (en) * | 2014-08-20 | 2017-07-14 | 北京邮电大学 | The prediction meanss and Forecasting Methodology of accident topic state |
-
2015
- 2015-05-20 CN CN201510257964.9A patent/CN104809253B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN104809253A (en) | 2015-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104809108B (en) | Information monitoring analysis system | |
US11580104B2 (en) | Method, apparatus, device, and storage medium for intention recommendation | |
CN109271512B (en) | Emotion analysis method, device and storage medium for public opinion comment information | |
Nagy et al. | Crowd sentiment detection during disasters and crises. | |
Purohit et al. | Emergency-relief coordination on social media: Automatically matching resource requests and offers | |
Keneshloo et al. | Predicting the popularity of news articles | |
Li et al. | Using text mining and sentiment analysis for online forums hotspot detection and forecast | |
US9990368B2 (en) | System and method for automatic generation of information-rich content from multiple microblogs, each microblog containing only sparse information | |
CN107577759A (en) | User comment auto recommending method | |
CN107862022B (en) | Culture resource recommendation system | |
CN111737495A (en) | Middle-high-end talent intelligent recommendation system and method based on domain self-classification | |
KR101695011B1 (en) | System for Detecting and Tracking Topic based on Topic Opinion and Social-influencer and Method thereof | |
US10387805B2 (en) | System and method for ranking news feeds | |
Weiler et al. | Survey and experimental analysis of event detection techniques for twitter | |
CN106156372B (en) | A kind of classification method and device of internet site | |
CN108733791B (en) | Network event detection method | |
Sharma et al. | Detecting hate speech and insults on social commentary using nlp and machine learning | |
CN104809253B (en) | Internet data analysis system | |
KR101543680B1 (en) | Entity searching and opinion mining system of hybrid-based using internet and method thereof | |
Hazimeh et al. | SocialMatching++: A Novel Approach for Interlinking User Profiles on Social Networks. | |
CN107330076A (en) | A kind of network public sentiment information display systems and method | |
Masood et al. | Semantic analysis to identify students’ feedback | |
CN115018255A (en) | Tourist attraction evaluation information quality validity analysis method based on integrated learning data mining technology | |
CN113971213A (en) | Smart city management public information sharing system | |
Pino et al. | Assessment and visualization of geographically distributed event-related sentiments by mining social networks and news |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20180730 Address after: 510623 room 3301 -3302, 1 Jinsui Road, Tianhe District, Guangzhou, Guangdong (for office use only) Patentee after: GUANGZHOU FENGSHEN NETWORK TECHNOLOGY Co.,Ltd. Address before: 610041 No. 1, No. 3 Shen Xian Nan Road, Chengdu high tech Zone, Sichuan, China. Patentee before: CHENGDU BLTSAFE INFORMATION TECHNOLOGY Co.,Ltd. |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Internet data analysis system Effective date of registration: 20210223 Granted publication date: 20171208 Pledgee: Zhujiang Branch of Guangzhou Bank Co.,Ltd. Pledgor: GUANGZHOU FENGSHEN NETWORK TECHNOLOGY Co.,Ltd. Registration number: Y2021980001275 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PC01 | Cancellation of the registration of the contract for pledge of patent right |
Date of cancellation: 20220420 Granted publication date: 20171208 Pledgee: Zhujiang Branch of Guangzhou Bank Co.,Ltd. Pledgor: GUANGZHOU FENGSHEN NETWORK TECHNOLOGY Co.,Ltd. Registration number: Y2021980001275 |
|
PC01 | Cancellation of the registration of the contract for pledge of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240226 Address after: Room 499, 4th Floor, No. 89 Yanling Road, Tianhe District, Guangzhou City, Guangdong Province 510000. Self made No. 134 (for office only) Patentee after: Guangzhou Kunchuan Network Technology Co.,Ltd. Country or region after: China Address before: 510623 room 3301 -3302, 1 Jinsui Road, Tianhe District, Guangzhou, Guangdong (for office use only) Patentee before: GUANGZHOU FENGSHEN NETWORK TECHNOLOGY Co.,Ltd. Country or region before: China |
|
TR01 | Transfer of patent right |