CN109636352A - A kind of distributed content duplicate checking early warning system based on financial big data - Google Patents
A kind of distributed content duplicate checking early warning system based on financial big data Download PDFInfo
- Publication number
- CN109636352A CN109636352A CN201811562264.0A CN201811562264A CN109636352A CN 109636352 A CN109636352 A CN 109636352A CN 201811562264 A CN201811562264 A CN 201811562264A CN 109636352 A CN109636352 A CN 109636352A
- Authority
- CN
- China
- Prior art keywords
- content
- early warning
- duplicate checking
- center
- lemma
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 33
- 238000004458 analytical method Methods 0.000 claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims abstract description 13
- 230000011218 segmentation Effects 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 5
- 238000007726 management method Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 3
- 230000000877 morphologic effect Effects 0.000 claims description 3
- 238000013523 data management Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000010276 construction Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 229910002056 binary alloy Inorganic materials 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012550 audit Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 244000309464 bull Species 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Abstract
The invention discloses a kind of distributed content duplicate checking early warning systems and method based on financial big data, including projects report system, content Early-warning Model center, content analysis engine, big data management platform, information push center, task schedule center.The invention has the advantages that being based on big data management system, that establishes unification declares project library, industrial and commercial library;Content analysis engine based on distributed computing technology, support that the quick duplicate checking for declaring content-data on a large scale based on project library and industrial and commercial library is analyzed, it can use multiserver calculation power quickly to be calculated, the similar value for declaring content quickly is calculated, system availability is strong, duplicate checking is high-efficient, result is safe and reliable.
Description
Technical field
The distributed content duplicate checking early warning system based on financial big data that present invention relates particularly to a kind of.
Background technique
With the continuous development of information-based industry, the finance department has built a collection of special fund communication management application system,
It realizes papery and handles official business and change to the great-leap-forward of online working, improve office efficiency, but with the support of government-to-businesses
Dynamics continues to increase, and the finance department needs to handle a large amount of enterprise and special fund is helped to declare, and audit largely declares content, is
Cope with the problem, system needs more intelligent, duplicate checking analysis can be carried out to content is declared, based on the analysis results to management
Personnel's early warning;Since e-government construction lacks unified planning, the mode of independent dispersion construction, information resources are mostly used
Utilization cannot effectively be shared, it is difficult to solve the problems, such as Data Integration by simply upgrading.
With the development of big data technology and distributed computing technology, it is flat to establish unified project application big data management
Platform becomes and solves special fund and declare the contents of a project to have identical solution, an existing financial information early warning platform product more
It can complete to decide whether that sending early warning leads to by setting content similarity early warning threshold values to contents of a project duplicate checking early warning is declared
Know, be primarily present following problems: 1) in face of large-scale data content, stand-alone server calculating calculation power is limited, simplest length
Degree only have 20 characters two datacycle 100w time calculate the two data similarity time-consuming >=4000ms, it is assumed that we
As soon as day needing to compare 100w times, whether light is to compare 100w data to repeat to need 4s, even if mono- document of 4s, single thread
15 documents, a hour ability 900 are just handled within one minute, if one is declared content text document and is likely to be breached several hundred million greatly
Small, efficiency also has decline;2) data store relative distribution, and data are not centrally stored in unified data platform, information money
Source cannot effectively share utilization, cause to declare the duplicate content duplicate checking fortune of the special platforms progress of content-data needs finances at different levels
It calculates;3) fail to establish unified industrial and commercial large database concept, due to declaring unit legal person or shareholder has a many enterprises under one's name, and more families
Enterprise may be involved in and declare same project, it is thus possible to lead to the problem of bull and declare, cannot be fully effective evade interior bulk density
Problem is declared again.But since analytical precision is low, system architecture is stored in single node, does not support distributed computing, magnanimity
Declare content-data, result timely cannot be fed back to user, while be easy to cause and asking by the too low outstanding problem of computational efficiency
Topic examination & approval.
It is therefore desirable to provide a kind of to solve based on the distributed content duplicate checking early warning system of financial big data and method
The above problem.
Summary of the invention
Higher and high safety and reliability the distribution based on financial big data the purpose of the present invention is to provide a kind of efficiency
Formula content duplicate checking early warning system, a kind of distribution realized based on the identification of document character image, Chinese Word Automatic Segmentation, financial big data
Formula calculates content similarity and analyzes the efficient warning information platform of Similar content.
One of the object of the invention provides the distributed content duplicate checking early warning system based on financial big data, including the project application
Module, content warning module, content analysis engine, financial large database concept, information push center, task schedule center, in which:
Project application module declares special fund project for user;
Content warning module, the warning line numerical value of setting content similitude early warning and corresponding warning level.
Content analysis engine, current reference are divided into two parts, Chinese Word Automatic Segmentation and content similarity algorithm, Chinese point
The sentence that word algorithm is responsible for the entire document content that will be declared splits into word (i.e. lemma refers to the word of composition a word), phase
The similar value of document is declared like the target that degree algorithm is responsible for two comparisons of calculating, similarity algorithm is Simhash algorithm.
Financial large database concept is connect with industrial and commercial database and project application database communication, and financial large database concept is to acquisition
To project application main body industrial and commercial data and project application data cleaned, processed, formation industry and commerce theme library and item of classify
Mesh class theme library.
Information pushes center, and the information of early warning is precisely pushed by the different requirements of management.
Task schedule center, is responsible for the corresponding Processing Algorithm of scheduling and function executes task.
Another object of the present invention is to provide a kind of distributed content based on financial big data using above system to look into
Weight method for early warning, comprising the following steps:
S1 finance large database concept is established, and by the algorithm model of setting, industrial and commercial data and project application data to acquisition are carried out
Cleaning, processing, classification form industrial and commercial theme library and item class theme library;
Content is declared by project application module writes special fund by S2 enterprise, submits the special fund project application to ask to server
It asks, the project application request that received server-side client is sent starts to receive data;
S3 calls segmentation methods functional interface to carry out morphological analysis to project application content by content analysis engine, and by language
Sentence content splits into lemma, calls storage layer interface to store the lemma of participle in financial large database concept, declares the project of main body
Declaring content can be stored in HDFS and MangoDB with document form;
S4 calls the task interface at task schedule center, publication similarity calculation task and industrial and commercial library by task schedule center
Business connection link calculation task calls distributed computing tool Spark interface, executes calculating task, is calculated using multiserver
Power is quickly calculated, and is quickly calculated and is declared similarity duplicate checking of the content based on item class theme library Yu Business Administration theme library
Analysis;
Calculated result is fed back to content Early-warning Model center by S5, and whether model judging result triggers early warning threshold values, is more than early warning
Value then starts step S6, and nothing then terminates entire contents of a project early warning calculation process;
Early warning log is written into early warning table by Early-warning Model center by S6, and early warning results messages is called to push interface, and message pushes away
Send mainly mail, stand in carry out by way of letter, short message, APP, can be dynamically to set in a manner of message push;
S7 pushes center by information and carries out the push of early warning results messages, opens message informing, checks duplicate checking as a result, in duplicate
Appearance, which is marked out, to be shown.
Segmentation methods are based on positive matched segmentation methods in the S3, method particularly includes: the Word Intelligent Segmentation mould of use
Formula smart mode, this system participle engine segmenter then can export one according to inherent method and think most reasonable word segmentation result,
Constructive in this algorithm simultaneously to start lemma and lemma chain concept, lemma chain is a kind of result of participle according to tandem
Form a chain structure, the ordered set that essence is made of the lemma intersected defines lemma whole in lemma object
Position in a link is used for disambiguation.
The method that similarity duplicate checking is analyzed in the S4 preferably uses SimHash similarity algorithm, and algorithmic procedure is as follows:
1) Doc is subjected to keyword abstraction (including segmenting and calculating weight), it is right to extract n (keyword, weight),
(feature, weight) i.e. in figure.It is denoted as feature_weight_pairs=[fw1, fw2 ... fwn],
Middle fwn=(feature_n, weight_n`);
2) hash_weight_pairs=[(hash (feature), weight) for feature, weight in
Feature_weight_pairs] generate figure in (hash, weight), it is assumed that hash generate digit bits_
count = 6;
3) longitudinal direction for then carrying out position to hash_weight_pairs is cumulative, if the position is 1 ,+weight, if it is
0, then-weight, ultimately produces bits_count number, and the digital value of generation is related to algorithm used in hash function;
4) digital value -> 110001 generated, positive 1 minus 0.
This distributed content duplicate checking early warning system and method based on financial big data provided by the invention, user pass through
Platform submits project application list, and all data declared are stored in HDFS and MangoDB with document form, passes through setting
Algorithm model SimHash similarity algorithm, cleans the project data declared, is processed, is classified, and structured storage is got up,
In order to efficiently search and read, the present invention can eliminate data resource islanding problem caused by the dispersion of resource, can be with
In the complete period that the whole entire contents of a project of tracking are declared, the monitoring in complete period is provided, guarantees that project funds can accomplish science
Reasonable to use, the repetition that avoids practising fraud to the greatest extent is declared, and the waste of financial fund is avoided, to promote the height of enterprise
Speed development.
Detailed description of the invention
Fig. 1 is system construction drawing of the invention.
Fig. 2 is distributed computing flow chart of the invention.
Fig. 3 is Simhash schematic diagram calculation of the invention.
Specific embodiment
It is as shown in Figure 1 system construction drawing of the invention, this distribution based on financial big data provided by the invention
Content duplicate checking early warning system, including financial large database concept, project application module, Early-warning Model center, content analysis engine, information
Push center, task schedule center, in which:
Financial large database concept is connect with industrial and commercial database and project application database communication, and financial large database concept is to collected
The industrial and commercial data and project application data of project application main body are cleaned, are processed, classifying forms industrial and commercial theme library and item class
Theme library;
Project application module, user carry out the special fund project application from terminal;
Early-warning Model center, the warning line numerical value of setting content similitude early warning and corresponding warning level;
Content analysis engine, current reference are divided into two parts, Chinese Word Automatic Segmentation and content similarity algorithm, and Chinese word segmentation is calculated
The sentence that method is responsible for the entire document content that will be declared splits into word (i.e. lemma refers to the word of composition a word), similarity
Algorithm is responsible for calculating the similar value that the target that two compare declares document, and similarity algorithm is Simhash algorithm;
Information pushes center, and the information of early warning is precisely pushed by the different requirements of management;
Task schedule center, is responsible for the corresponding Processing Algorithm of scheduling and function executes task.
A kind of distributed content duplicate checking method for early warning based on financial big data of the present embodiment, comprising the following steps:
S1 finance large database concept is established, and by the algorithm model of setting, industrial and commercial data and project application data to acquisition are carried out
Cleaning, processing, classification form industrial and commercial theme library and item class theme library;
Content is declared by project application module writes special fund by S2 enterprise, submits the special fund project application to ask to server
It asks, the project application request that received server-side client is sent starts to receive data;
S3 calls segmentation methods functional interface to carry out morphological analysis to project application content by content analysis engine, and by language
Sentence content splits into lemma, calls storage layer interface to store the lemma of participle in financial large database concept, declares the project of main body
Declaring content can be stored in HDFS and MangoDB with document form;
S4 calls the task interface at task schedule center by task schedule center, and task interface includes publication similarity calculation
Task and industrial and commercial library business connection link calculation task, call distributed computing tool Spark interface, pass through content analysis engine
In similarity calculation engine execute calculating task, quickly calculated using multiserver node, referring to fig. 2, quickly meter
It calculates and declares similarity duplicate checking analysis of the content based on item class theme library with Business Administration theme library;
Calculated result is fed back to content Early-warning Model center by S5, and whether model judging result triggers early warning threshold values, is more than early warning
Value then starts step S6, otherwise terminates entire contents of a project early warning calculation process;
Early warning log is written into early warning table by Early-warning Model center by S6, and early warning results messages is called to push interface, and message pushes away
Send mainly mail, stand in carry out by way of letter, short message, APP, can be dynamically to set in a manner of message push;
S7 pushes center by information and carries out the push of early warning results messages, opens message informing, checks duplicate checking as a result, in duplicate
Appearance, which is marked out, to be shown.
Segmentation methods are based on positive matched segmentation methods in the S3, method particularly includes: the Word Intelligent Segmentation mould of use
Formula smart mode, this system participle engine segmenter then can export one according to inherent method and think most reasonable word segmentation result,
Constructive in this algorithm simultaneously to start lemma and lemma chain concept, lemma chain is a kind of result of participle according to tandem
Form a chain structure, the ordered set that essence is made of the lemma intersected defines lemma whole in lemma object
Position in a link is used for disambiguation.
The method that similarity duplicate checking is analyzed in the S4 of the present embodiment uses SimHash similarity algorithm, referring to Fig. 3, algorithm
Process is as follows:
1) Doc is subjected to keyword abstraction (including segmenting and calculating weight), it is right to extract n (keyword, weight), i.e.,
(feature, weight) in figure.It is denoted as feature_weight_pairs=[fw1, fw2 ... fwn], wherein
fwn = (feature_n,weight_n`);
2) hash_weight_pairs=[(hash (feature), weight) for feature, weight in
Feature_weight_pairs] generate figure in (hash, weight), it is assumed that hash generate digit bits_
count = 6;
3) longitudinal direction for then carrying out position to hash_weight_pairs is cumulative, if the position is 1 ,+weight, if it is
0, then-weight, ultimately produces bits_count number, is [13,108, -22, -5, -32,55] as shown in the figure,
Here the value generated is related to algorithm used in hash function;Hash is carried out to these words, 64 binary systems is obtained, obtains 20
The binary system set that a length is 64, hash are 1, then are replaced with positive weights;Hash are 0, then are replaced with negative weight;?
To 20 length be 64 list [weight ,-weight, weight ..., weight], 20 lists are arranged to tired
Add, obtain a list, that is, for a document, obtains the list that a length is 64.
4) this list is judged, positive value takes 1, and negative value takes 0;As [13,108, -22, -5, -32,55] obtain
10001, here it is the simhash value of a document, two simhash carry out XOR operation (Hamming distances), and exclusive or is as a result, 1
Number be more than 3 dissmilarities, be less than or equal to 3 similar.
Claims (4)
1. a kind of distributed content duplicate checking early warning system based on financial big data, which is characterized in that including financial large database concept,
Project application module, content warning module, content analysis engine, information push center, task schedule center, in which:
Financial large database concept is connect with industrial and commercial database and project application database communication, and financial large database concept is to collected
The industrial and commercial data and project application data of project application main body are cleaned, are processed, classifying forms industrial and commercial theme library and item class
Theme library;
Project application module declares special fund project for user;
Content warning module, the warning line numerical value of setting content similitude early warning and corresponding warning level;
Content analysis engine, current reference are divided into two parts, Chinese Word Automatic Segmentation and content similarity algorithm;
Information pushes center, and the information of early warning is precisely pushed by the different requirements of management;
Task schedule center, is responsible for the corresponding Processing Algorithm of scheduling and function executes task.
2. a kind of distributed content duplicate checking method for early warning based on financial big data, which comprises the following steps:
S1 finance large database concept is established, and by the algorithm model of setting, industrial and commercial data and project application data to acquisition are carried out
Cleaning, processing, classification form industrial and commercial theme library and item class theme library;
Content is declared by project application module writes special fund by S2 enterprise, submits the special fund project application to ask to server
It asks, the project application request that received server-side client is sent starts to receive data;
S3 calls segmentation methods functional interface to carry out morphological analysis to project application content by content analysis engine, and by language
Sentence content splits into lemma, calls storage layer interface to store the lemma of participle in financial large database concept, declares the project of main body
Declaring content can be stored in HDFS and MangoDB with document form;
S4 calls the task interface at task schedule center, publication similarity calculation task and industrial and commercial library by task schedule center
Business connection link calculation task calls distributed computing tool Spark interface, executes calculating task, is calculated using multiserver
Power is quickly calculated, and is quickly calculated and is declared similarity duplicate checking of the content based on item class theme library Yu Business Administration theme library
Analysis;
Calculated result is fed back to content Early-warning Model center by S5, and whether model judging result triggers early warning threshold values, is more than early warning
Value then starts step S6, and nothing then terminates entire contents of a project early warning calculation process;
Early warning log is written into early warning table by Early-warning Model center by S6, and early warning results messages is called to push interface, and message pushes away
Send mainly mail, stand in carry out by way of letter, short message, APP, can be dynamically to set in a manner of message push;
S7 pushes center by information and carries out the push of early warning results messages, opens message informing, checks duplicate checking as a result, in duplicate
Appearance, which is marked out, to be shown.
3. the distributed content duplicate checking method for early warning according to claim 2 based on financial big data, which is characterized in that institute
Segmentation methods are based on positive matched segmentation methods in the S3 stated, method particularly includes: the Word Intelligent Segmentation mode smart mould of use
Formula, this system participle engine segmenter then can export one according to inherent method and think most reasonable word segmentation result, while at this
Constructive in algorithm to start lemma and lemma chain concept, lemma chain is that a kind of result of participle forms one according to tandem
Chain structure, the ordered set that essence is made of the lemma intersected define lemma in entire link in lemma object
Position, be used for disambiguation.
4. the distributed content duplicate checking method for early warning according to claim 2 based on financial big data, which is characterized in that institute
The method that similarity duplicate checking is analyzed in the S4 stated uses SimHash similarity algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811562264.0A CN109636352A (en) | 2018-12-20 | 2018-12-20 | A kind of distributed content duplicate checking early warning system based on financial big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811562264.0A CN109636352A (en) | 2018-12-20 | 2018-12-20 | A kind of distributed content duplicate checking early warning system based on financial big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109636352A true CN109636352A (en) | 2019-04-16 |
Family
ID=66075908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811562264.0A Pending CN109636352A (en) | 2018-12-20 | 2018-12-20 | A kind of distributed content duplicate checking early warning system based on financial big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109636352A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175280A (en) * | 2019-04-30 | 2019-08-27 | 广东鼎义互联科技股份有限公司 | A kind of crawler analysis platform based on government affairs big data |
CN110223048A (en) * | 2019-06-18 | 2019-09-10 | 湖南晖龙集团股份有限公司 | Special fund declares comprehensive management platform system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103455596A (en) * | 2013-09-02 | 2013-12-18 | 广东省计算中心 | Science and technology project establishment evaluation method based on big data |
CN103593338A (en) * | 2013-11-15 | 2014-02-19 | 北京锐安科技有限公司 | Information processing method and device |
CN104133838A (en) * | 2014-06-24 | 2014-11-05 | 国家电网公司 | Data processing method and system with system detection function |
CN105718506A (en) * | 2016-01-04 | 2016-06-29 | 胡新伟 | Duplicate-checking comparison method for science and technology projects |
CN106570055A (en) * | 2016-09-27 | 2017-04-19 | 山东浪潮云服务信息科技有限公司 | Information early-warning platform based on financial big data |
CN106649251A (en) * | 2015-10-30 | 2017-05-10 | 北京国双科技有限公司 | Method and device for Chinese word segmentation |
CN107608968A (en) * | 2017-09-22 | 2018-01-19 | 深圳市易图资讯股份有限公司 | Chinese word cutting method, the device of text-oriented big data |
CN107908796A (en) * | 2017-12-15 | 2018-04-13 | 广州市齐明软件科技有限公司 | E-Government duplicate checking method, apparatus and computer-readable recording medium |
CN108846031A (en) * | 2018-05-28 | 2018-11-20 | 同方知网数字出版技术股份有限公司 | Project similarity comparison method for power industry |
-
2018
- 2018-12-20 CN CN201811562264.0A patent/CN109636352A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103455596A (en) * | 2013-09-02 | 2013-12-18 | 广东省计算中心 | Science and technology project establishment evaluation method based on big data |
CN103593338A (en) * | 2013-11-15 | 2014-02-19 | 北京锐安科技有限公司 | Information processing method and device |
CN104133838A (en) * | 2014-06-24 | 2014-11-05 | 国家电网公司 | Data processing method and system with system detection function |
CN106649251A (en) * | 2015-10-30 | 2017-05-10 | 北京国双科技有限公司 | Method and device for Chinese word segmentation |
CN105718506A (en) * | 2016-01-04 | 2016-06-29 | 胡新伟 | Duplicate-checking comparison method for science and technology projects |
CN106570055A (en) * | 2016-09-27 | 2017-04-19 | 山东浪潮云服务信息科技有限公司 | Information early-warning platform based on financial big data |
CN107608968A (en) * | 2017-09-22 | 2018-01-19 | 深圳市易图资讯股份有限公司 | Chinese word cutting method, the device of text-oriented big data |
CN107908796A (en) * | 2017-12-15 | 2018-04-13 | 广州市齐明软件科技有限公司 | E-Government duplicate checking method, apparatus and computer-readable recording medium |
CN108846031A (en) * | 2018-05-28 | 2018-11-20 | 同方知网数字出版技术股份有限公司 | Project similarity comparison method for power industry |
Non-Patent Citations (2)
Title |
---|
冉崇善等: "Simhash算法在试题查重中的应用", 《软件导刊》 * |
怎么肥事: "IK分词器实现原理", 《HTTPS://BLOG.CSDN.NET/LALA12D/ARTICLE/DETAILS/82776571》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175280A (en) * | 2019-04-30 | 2019-08-27 | 广东鼎义互联科技股份有限公司 | A kind of crawler analysis platform based on government affairs big data |
CN110223048A (en) * | 2019-06-18 | 2019-09-10 | 湖南晖龙集团股份有限公司 | Special fund declares comprehensive management platform system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Batra et al. | Integrating StockTwits with sentiment analysis for better prediction of stock price movement | |
CN111767403B (en) | Text classification method and device | |
CN110321466B (en) | Securities information duplicate checking method and system based on semantic analysis | |
Krishna et al. | A feature based approach for sentiment analysis using SVM and coreference resolution | |
CN106886579B (en) | Real-time streaming text grading monitoring method and device | |
CN110309234B (en) | Knowledge graph-based customer warehouse-holding early warning method and device and storage medium | |
US10216837B1 (en) | Selecting pattern matching segments for electronic communication clustering | |
CN110990529B (en) | Industry detail dividing method and system for enterprises | |
CN111767725A (en) | Data processing method and device based on emotion polarity analysis model | |
CN110147540B (en) | Method and system for generating business security requirement document | |
Kirchheim et al. | Pytorch-ood: A library for out-of-distribution detection based on pytorch | |
CN112330455A (en) | Method, device, equipment and storage medium for pushing information | |
CN109636352A (en) | A kind of distributed content duplicate checking early warning system based on financial big data | |
Al-Alwani | Improving email response in an email management system using natural language processing based probabilistic methods | |
CN105808602B (en) | Method and device for detecting junk information | |
Jagadeesan et al. | Twitter Sentiment Analysis with Machine Learning | |
CN109299007A (en) | A kind of defect repair person's auto recommending method | |
CN112579781A (en) | Text classification method and device, electronic equipment and medium | |
Giri et al. | SMS spam classification–simple deep learning models with higher accuracy using BUNOW and GloVe word embedding | |
CN115391701A (en) | Internet content risk analysis and early warning method | |
CN110019772B (en) | Text emotion classification method and system | |
Povoda et al. | Genetic optimization of big data sentiment analysis | |
Thushara et al. | A graph-based model for keyword extraction and tagging of research documents | |
Gatchalee et al. | Thai text classification experiment using cnn and transformer models for timely-timeless content marketing | |
Chelyshev et al. | Information System for Automatic News Text Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190416 |