CN109636352A - A kind of distributed content duplicate checking early warning system based on financial big data - Google Patents

A kind of distributed content duplicate checking early warning system based on financial big data Download PDF

Info

Publication number
CN109636352A
CN109636352A CN201811562264.0A CN201811562264A CN109636352A CN 109636352 A CN109636352 A CN 109636352A CN 201811562264 A CN201811562264 A CN 201811562264A CN 109636352 A CN109636352 A CN 109636352A
Authority
CN
China
Prior art keywords
content
early warning
duplicate checking
center
lemma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811562264.0A
Other languages
Chinese (zh)
Inventor
李景龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Long Hui Group Ltd By Share Ltd
Original Assignee
Hunan Long Hui Group Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Long Hui Group Ltd By Share Ltd filed Critical Hunan Long Hui Group Ltd By Share Ltd
Priority to CN201811562264.0A priority Critical patent/CN109636352A/en
Publication of CN109636352A publication Critical patent/CN109636352A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The invention discloses a kind of distributed content duplicate checking early warning systems and method based on financial big data, including projects report system, content Early-warning Model center, content analysis engine, big data management platform, information push center, task schedule center.The invention has the advantages that being based on big data management system, that establishes unification declares project library, industrial and commercial library;Content analysis engine based on distributed computing technology, support that the quick duplicate checking for declaring content-data on a large scale based on project library and industrial and commercial library is analyzed, it can use multiserver calculation power quickly to be calculated, the similar value for declaring content quickly is calculated, system availability is strong, duplicate checking is high-efficient, result is safe and reliable.

Description

A kind of distributed content duplicate checking early warning system based on financial big data
Technical field
The distributed content duplicate checking early warning system based on financial big data that present invention relates particularly to a kind of.
Background technique
With the continuous development of information-based industry, the finance department has built a collection of special fund communication management application system, It realizes papery and handles official business and change to the great-leap-forward of online working, improve office efficiency, but with the support of government-to-businesses Dynamics continues to increase, and the finance department needs to handle a large amount of enterprise and special fund is helped to declare, and audit largely declares content, is Cope with the problem, system needs more intelligent, duplicate checking analysis can be carried out to content is declared, based on the analysis results to management Personnel's early warning;Since e-government construction lacks unified planning, the mode of independent dispersion construction, information resources are mostly used Utilization cannot effectively be shared, it is difficult to solve the problems, such as Data Integration by simply upgrading.
With the development of big data technology and distributed computing technology, it is flat to establish unified project application big data management Platform becomes and solves special fund and declare the contents of a project to have identical solution, an existing financial information early warning platform product more It can complete to decide whether that sending early warning leads to by setting content similarity early warning threshold values to contents of a project duplicate checking early warning is declared Know, be primarily present following problems: 1) in face of large-scale data content, stand-alone server calculating calculation power is limited, simplest length Degree only have 20 characters two datacycle 100w time calculate the two data similarity time-consuming >=4000ms, it is assumed that we As soon as day needing to compare 100w times, whether light is to compare 100w data to repeat to need 4s, even if mono- document of 4s, single thread 15 documents, a hour ability 900 are just handled within one minute, if one is declared content text document and is likely to be breached several hundred million greatly Small, efficiency also has decline;2) data store relative distribution, and data are not centrally stored in unified data platform, information money Source cannot effectively share utilization, cause to declare the duplicate content duplicate checking fortune of the special platforms progress of content-data needs finances at different levels It calculates;3) fail to establish unified industrial and commercial large database concept, due to declaring unit legal person or shareholder has a many enterprises under one's name, and more families Enterprise may be involved in and declare same project, it is thus possible to lead to the problem of bull and declare, cannot be fully effective evade interior bulk density Problem is declared again.But since analytical precision is low, system architecture is stored in single node, does not support distributed computing, magnanimity Declare content-data, result timely cannot be fed back to user, while be easy to cause and asking by the too low outstanding problem of computational efficiency Topic examination & approval.
It is therefore desirable to provide a kind of to solve based on the distributed content duplicate checking early warning system of financial big data and method The above problem.
Summary of the invention
Higher and high safety and reliability the distribution based on financial big data the purpose of the present invention is to provide a kind of efficiency Formula content duplicate checking early warning system, a kind of distribution realized based on the identification of document character image, Chinese Word Automatic Segmentation, financial big data Formula calculates content similarity and analyzes the efficient warning information platform of Similar content.
One of the object of the invention provides the distributed content duplicate checking early warning system based on financial big data, including the project application Module, content warning module, content analysis engine, financial large database concept, information push center, task schedule center, in which:
Project application module declares special fund project for user;
Content warning module, the warning line numerical value of setting content similitude early warning and corresponding warning level.
Content analysis engine, current reference are divided into two parts, Chinese Word Automatic Segmentation and content similarity algorithm, Chinese point The sentence that word algorithm is responsible for the entire document content that will be declared splits into word (i.e. lemma refers to the word of composition a word), phase The similar value of document is declared like the target that degree algorithm is responsible for two comparisons of calculating, similarity algorithm is Simhash algorithm.
Financial large database concept is connect with industrial and commercial database and project application database communication, and financial large database concept is to acquisition To project application main body industrial and commercial data and project application data cleaned, processed, formation industry and commerce theme library and item of classify Mesh class theme library.
Information pushes center, and the information of early warning is precisely pushed by the different requirements of management.
Task schedule center, is responsible for the corresponding Processing Algorithm of scheduling and function executes task.
Another object of the present invention is to provide a kind of distributed content based on financial big data using above system to look into Weight method for early warning, comprising the following steps:
S1 finance large database concept is established, and by the algorithm model of setting, industrial and commercial data and project application data to acquisition are carried out Cleaning, processing, classification form industrial and commercial theme library and item class theme library;
Content is declared by project application module writes special fund by S2 enterprise, submits the special fund project application to ask to server It asks, the project application request that received server-side client is sent starts to receive data;
S3 calls segmentation methods functional interface to carry out morphological analysis to project application content by content analysis engine, and by language Sentence content splits into lemma, calls storage layer interface to store the lemma of participle in financial large database concept, declares the project of main body Declaring content can be stored in HDFS and MangoDB with document form;
S4 calls the task interface at task schedule center, publication similarity calculation task and industrial and commercial library by task schedule center Business connection link calculation task calls distributed computing tool Spark interface, executes calculating task, is calculated using multiserver Power is quickly calculated, and is quickly calculated and is declared similarity duplicate checking of the content based on item class theme library Yu Business Administration theme library Analysis;
Calculated result is fed back to content Early-warning Model center by S5, and whether model judging result triggers early warning threshold values, is more than early warning Value then starts step S6, and nothing then terminates entire contents of a project early warning calculation process;
Early warning log is written into early warning table by Early-warning Model center by S6, and early warning results messages is called to push interface, and message pushes away Send mainly mail, stand in carry out by way of letter, short message, APP, can be dynamically to set in a manner of message push;
S7 pushes center by information and carries out the push of early warning results messages, opens message informing, checks duplicate checking as a result, in duplicate Appearance, which is marked out, to be shown.
Segmentation methods are based on positive matched segmentation methods in the S3, method particularly includes: the Word Intelligent Segmentation mould of use Formula smart mode, this system participle engine segmenter then can export one according to inherent method and think most reasonable word segmentation result, Constructive in this algorithm simultaneously to start lemma and lemma chain concept, lemma chain is a kind of result of participle according to tandem Form a chain structure, the ordered set that essence is made of the lemma intersected defines lemma whole in lemma object Position in a link is used for disambiguation.
The method that similarity duplicate checking is analyzed in the S4 preferably uses SimHash similarity algorithm, and algorithmic procedure is as follows:
1) Doc is subjected to keyword abstraction (including segmenting and calculating weight), it is right to extract n (keyword, weight), (feature, weight) i.e. in figure.It is denoted as feature_weight_pairs=[fw1, fw2 ... fwn], Middle fwn=(feature_n, weight_n`);
2) hash_weight_pairs=[(hash (feature), weight) for feature, weight in Feature_weight_pairs] generate figure in (hash, weight), it is assumed that hash generate digit bits_ count = 6;
3) longitudinal direction for then carrying out position to hash_weight_pairs is cumulative, if the position is 1 ,+weight, if it is 0, then-weight, ultimately produces bits_count number, and the digital value of generation is related to algorithm used in hash function;
4) digital value -> 110001 generated, positive 1 minus 0.
This distributed content duplicate checking early warning system and method based on financial big data provided by the invention, user pass through Platform submits project application list, and all data declared are stored in HDFS and MangoDB with document form, passes through setting Algorithm model SimHash similarity algorithm, cleans the project data declared, is processed, is classified, and structured storage is got up, In order to efficiently search and read, the present invention can eliminate data resource islanding problem caused by the dispersion of resource, can be with In the complete period that the whole entire contents of a project of tracking are declared, the monitoring in complete period is provided, guarantees that project funds can accomplish science Reasonable to use, the repetition that avoids practising fraud to the greatest extent is declared, and the waste of financial fund is avoided, to promote the height of enterprise Speed development.
Detailed description of the invention
Fig. 1 is system construction drawing of the invention.
Fig. 2 is distributed computing flow chart of the invention.
Fig. 3 is Simhash schematic diagram calculation of the invention.
Specific embodiment
It is as shown in Figure 1 system construction drawing of the invention, this distribution based on financial big data provided by the invention Content duplicate checking early warning system, including financial large database concept, project application module, Early-warning Model center, content analysis engine, information Push center, task schedule center, in which:
Financial large database concept is connect with industrial and commercial database and project application database communication, and financial large database concept is to collected The industrial and commercial data and project application data of project application main body are cleaned, are processed, classifying forms industrial and commercial theme library and item class Theme library;
Project application module, user carry out the special fund project application from terminal;
Early-warning Model center, the warning line numerical value of setting content similitude early warning and corresponding warning level;
Content analysis engine, current reference are divided into two parts, Chinese Word Automatic Segmentation and content similarity algorithm, and Chinese word segmentation is calculated The sentence that method is responsible for the entire document content that will be declared splits into word (i.e. lemma refers to the word of composition a word), similarity Algorithm is responsible for calculating the similar value that the target that two compare declares document, and similarity algorithm is Simhash algorithm;
Information pushes center, and the information of early warning is precisely pushed by the different requirements of management;
Task schedule center, is responsible for the corresponding Processing Algorithm of scheduling and function executes task.
A kind of distributed content duplicate checking method for early warning based on financial big data of the present embodiment, comprising the following steps:
S1 finance large database concept is established, and by the algorithm model of setting, industrial and commercial data and project application data to acquisition are carried out Cleaning, processing, classification form industrial and commercial theme library and item class theme library;
Content is declared by project application module writes special fund by S2 enterprise, submits the special fund project application to ask to server It asks, the project application request that received server-side client is sent starts to receive data;
S3 calls segmentation methods functional interface to carry out morphological analysis to project application content by content analysis engine, and by language Sentence content splits into lemma, calls storage layer interface to store the lemma of participle in financial large database concept, declares the project of main body Declaring content can be stored in HDFS and MangoDB with document form;
S4 calls the task interface at task schedule center by task schedule center, and task interface includes publication similarity calculation Task and industrial and commercial library business connection link calculation task, call distributed computing tool Spark interface, pass through content analysis engine In similarity calculation engine execute calculating task, quickly calculated using multiserver node, referring to fig. 2, quickly meter It calculates and declares similarity duplicate checking analysis of the content based on item class theme library with Business Administration theme library;
Calculated result is fed back to content Early-warning Model center by S5, and whether model judging result triggers early warning threshold values, is more than early warning Value then starts step S6, otherwise terminates entire contents of a project early warning calculation process;
Early warning log is written into early warning table by Early-warning Model center by S6, and early warning results messages is called to push interface, and message pushes away Send mainly mail, stand in carry out by way of letter, short message, APP, can be dynamically to set in a manner of message push;
S7 pushes center by information and carries out the push of early warning results messages, opens message informing, checks duplicate checking as a result, in duplicate Appearance, which is marked out, to be shown.
Segmentation methods are based on positive matched segmentation methods in the S3, method particularly includes: the Word Intelligent Segmentation mould of use Formula smart mode, this system participle engine segmenter then can export one according to inherent method and think most reasonable word segmentation result, Constructive in this algorithm simultaneously to start lemma and lemma chain concept, lemma chain is a kind of result of participle according to tandem Form a chain structure, the ordered set that essence is made of the lemma intersected defines lemma whole in lemma object Position in a link is used for disambiguation.
The method that similarity duplicate checking is analyzed in the S4 of the present embodiment uses SimHash similarity algorithm, referring to Fig. 3, algorithm Process is as follows:
1) Doc is subjected to keyword abstraction (including segmenting and calculating weight), it is right to extract n (keyword, weight), i.e., (feature, weight) in figure.It is denoted as feature_weight_pairs=[fw1, fw2 ... fwn], wherein fwn = (feature_n,weight_n`);
2) hash_weight_pairs=[(hash (feature), weight) for feature, weight in Feature_weight_pairs] generate figure in (hash, weight), it is assumed that hash generate digit bits_ count = 6;
3) longitudinal direction for then carrying out position to hash_weight_pairs is cumulative, if the position is 1 ,+weight, if it is 0, then-weight, ultimately produces bits_count number, is [13,108, -22, -5, -32,55] as shown in the figure, Here the value generated is related to algorithm used in hash function;Hash is carried out to these words, 64 binary systems is obtained, obtains 20 The binary system set that a length is 64, hash are 1, then are replaced with positive weights;Hash are 0, then are replaced with negative weight;? To 20 length be 64 list [weight ,-weight, weight ..., weight], 20 lists are arranged to tired Add, obtain a list, that is, for a document, obtains the list that a length is 64.
4) this list is judged, positive value takes 1, and negative value takes 0;As [13,108, -22, -5, -32,55] obtain 10001, here it is the simhash value of a document, two simhash carry out XOR operation (Hamming distances), and exclusive or is as a result, 1 Number be more than 3 dissmilarities, be less than or equal to 3 similar.

Claims (4)

1. a kind of distributed content duplicate checking early warning system based on financial big data, which is characterized in that including financial large database concept, Project application module, content warning module, content analysis engine, information push center, task schedule center, in which:
Financial large database concept is connect with industrial and commercial database and project application database communication, and financial large database concept is to collected The industrial and commercial data and project application data of project application main body are cleaned, are processed, classifying forms industrial and commercial theme library and item class Theme library;
Project application module declares special fund project for user;
Content warning module, the warning line numerical value of setting content similitude early warning and corresponding warning level;
Content analysis engine, current reference are divided into two parts, Chinese Word Automatic Segmentation and content similarity algorithm;
Information pushes center, and the information of early warning is precisely pushed by the different requirements of management;
Task schedule center, is responsible for the corresponding Processing Algorithm of scheduling and function executes task.
2. a kind of distributed content duplicate checking method for early warning based on financial big data, which comprises the following steps:
S1 finance large database concept is established, and by the algorithm model of setting, industrial and commercial data and project application data to acquisition are carried out Cleaning, processing, classification form industrial and commercial theme library and item class theme library;
Content is declared by project application module writes special fund by S2 enterprise, submits the special fund project application to ask to server It asks, the project application request that received server-side client is sent starts to receive data;
S3 calls segmentation methods functional interface to carry out morphological analysis to project application content by content analysis engine, and by language Sentence content splits into lemma, calls storage layer interface to store the lemma of participle in financial large database concept, declares the project of main body Declaring content can be stored in HDFS and MangoDB with document form;
S4 calls the task interface at task schedule center, publication similarity calculation task and industrial and commercial library by task schedule center Business connection link calculation task calls distributed computing tool Spark interface, executes calculating task, is calculated using multiserver Power is quickly calculated, and is quickly calculated and is declared similarity duplicate checking of the content based on item class theme library Yu Business Administration theme library Analysis;
Calculated result is fed back to content Early-warning Model center by S5, and whether model judging result triggers early warning threshold values, is more than early warning Value then starts step S6, and nothing then terminates entire contents of a project early warning calculation process;
Early warning log is written into early warning table by Early-warning Model center by S6, and early warning results messages is called to push interface, and message pushes away Send mainly mail, stand in carry out by way of letter, short message, APP, can be dynamically to set in a manner of message push;
S7 pushes center by information and carries out the push of early warning results messages, opens message informing, checks duplicate checking as a result, in duplicate Appearance, which is marked out, to be shown.
3. the distributed content duplicate checking method for early warning according to claim 2 based on financial big data, which is characterized in that institute Segmentation methods are based on positive matched segmentation methods in the S3 stated, method particularly includes: the Word Intelligent Segmentation mode smart mould of use Formula, this system participle engine segmenter then can export one according to inherent method and think most reasonable word segmentation result, while at this Constructive in algorithm to start lemma and lemma chain concept, lemma chain is that a kind of result of participle forms one according to tandem Chain structure, the ordered set that essence is made of the lemma intersected define lemma in entire link in lemma object Position, be used for disambiguation.
4. the distributed content duplicate checking method for early warning according to claim 2 based on financial big data, which is characterized in that institute The method that similarity duplicate checking is analyzed in the S4 stated uses SimHash similarity algorithm.
CN201811562264.0A 2018-12-20 2018-12-20 A kind of distributed content duplicate checking early warning system based on financial big data Pending CN109636352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811562264.0A CN109636352A (en) 2018-12-20 2018-12-20 A kind of distributed content duplicate checking early warning system based on financial big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811562264.0A CN109636352A (en) 2018-12-20 2018-12-20 A kind of distributed content duplicate checking early warning system based on financial big data

Publications (1)

Publication Number Publication Date
CN109636352A true CN109636352A (en) 2019-04-16

Family

ID=66075908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811562264.0A Pending CN109636352A (en) 2018-12-20 2018-12-20 A kind of distributed content duplicate checking early warning system based on financial big data

Country Status (1)

Country Link
CN (1) CN109636352A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175280A (en) * 2019-04-30 2019-08-27 广东鼎义互联科技股份有限公司 A kind of crawler analysis platform based on government affairs big data
CN110223048A (en) * 2019-06-18 2019-09-10 湖南晖龙集团股份有限公司 Special fund declares comprehensive management platform system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455596A (en) * 2013-09-02 2013-12-18 广东省计算中心 Science and technology project establishment evaluation method based on big data
CN103593338A (en) * 2013-11-15 2014-02-19 北京锐安科技有限公司 Information processing method and device
CN104133838A (en) * 2014-06-24 2014-11-05 国家电网公司 Data processing method and system with system detection function
CN105718506A (en) * 2016-01-04 2016-06-29 胡新伟 Duplicate-checking comparison method for science and technology projects
CN106570055A (en) * 2016-09-27 2017-04-19 山东浪潮云服务信息科技有限公司 Information early-warning platform based on financial big data
CN106649251A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for Chinese word segmentation
CN107608968A (en) * 2017-09-22 2018-01-19 深圳市易图资讯股份有限公司 Chinese word cutting method, the device of text-oriented big data
CN107908796A (en) * 2017-12-15 2018-04-13 广州市齐明软件科技有限公司 E-Government duplicate checking method, apparatus and computer-readable recording medium
CN108846031A (en) * 2018-05-28 2018-11-20 同方知网数字出版技术股份有限公司 Project similarity comparison method for power industry

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455596A (en) * 2013-09-02 2013-12-18 广东省计算中心 Science and technology project establishment evaluation method based on big data
CN103593338A (en) * 2013-11-15 2014-02-19 北京锐安科技有限公司 Information processing method and device
CN104133838A (en) * 2014-06-24 2014-11-05 国家电网公司 Data processing method and system with system detection function
CN106649251A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for Chinese word segmentation
CN105718506A (en) * 2016-01-04 2016-06-29 胡新伟 Duplicate-checking comparison method for science and technology projects
CN106570055A (en) * 2016-09-27 2017-04-19 山东浪潮云服务信息科技有限公司 Information early-warning platform based on financial big data
CN107608968A (en) * 2017-09-22 2018-01-19 深圳市易图资讯股份有限公司 Chinese word cutting method, the device of text-oriented big data
CN107908796A (en) * 2017-12-15 2018-04-13 广州市齐明软件科技有限公司 E-Government duplicate checking method, apparatus and computer-readable recording medium
CN108846031A (en) * 2018-05-28 2018-11-20 同方知网数字出版技术股份有限公司 Project similarity comparison method for power industry

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冉崇善等: "Simhash算法在试题查重中的应用", 《软件导刊》 *
怎么肥事: "IK分词器实现原理", 《HTTPS://BLOG.CSDN.NET/LALA12D/ARTICLE/DETAILS/82776571》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175280A (en) * 2019-04-30 2019-08-27 广东鼎义互联科技股份有限公司 A kind of crawler analysis platform based on government affairs big data
CN110223048A (en) * 2019-06-18 2019-09-10 湖南晖龙集团股份有限公司 Special fund declares comprehensive management platform system

Similar Documents

Publication Publication Date Title
Batra et al. Integrating StockTwits with sentiment analysis for better prediction of stock price movement
CN111767403B (en) Text classification method and device
CN110321466B (en) Securities information duplicate checking method and system based on semantic analysis
Krishna et al. A feature based approach for sentiment analysis using SVM and coreference resolution
CN106886579B (en) Real-time streaming text grading monitoring method and device
CN110309234B (en) Knowledge graph-based customer warehouse-holding early warning method and device and storage medium
US10216837B1 (en) Selecting pattern matching segments for electronic communication clustering
CN110990529B (en) Industry detail dividing method and system for enterprises
CN111767725A (en) Data processing method and device based on emotion polarity analysis model
CN110147540B (en) Method and system for generating business security requirement document
Kirchheim et al. Pytorch-ood: A library for out-of-distribution detection based on pytorch
CN112330455A (en) Method, device, equipment and storage medium for pushing information
CN109636352A (en) A kind of distributed content duplicate checking early warning system based on financial big data
Al-Alwani Improving email response in an email management system using natural language processing based probabilistic methods
CN105808602B (en) Method and device for detecting junk information
Jagadeesan et al. Twitter Sentiment Analysis with Machine Learning
CN109299007A (en) A kind of defect repair person's auto recommending method
CN112579781A (en) Text classification method and device, electronic equipment and medium
Giri et al. SMS spam classification–simple deep learning models with higher accuracy using BUNOW and GloVe word embedding
CN115391701A (en) Internet content risk analysis and early warning method
CN110019772B (en) Text emotion classification method and system
Povoda et al. Genetic optimization of big data sentiment analysis
Thushara et al. A graph-based model for keyword extraction and tagging of research documents
Gatchalee et al. Thai text classification experiment using cnn and transformer models for timely-timeless content marketing
Chelyshev et al. Information System for Automatic News Text Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190416