CN115759071A - Government affair sensitive information identification system and method based on big data - Google Patents
Government affair sensitive information identification system and method based on big data Download PDFInfo
- Publication number
- CN115759071A CN115759071A CN202211424814.9A CN202211424814A CN115759071A CN 115759071 A CN115759071 A CN 115759071A CN 202211424814 A CN202211424814 A CN 202211424814A CN 115759071 A CN115759071 A CN 115759071A
- Authority
- CN
- China
- Prior art keywords
- sensitive
- sensitive information
- data
- words
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides a government affair sensitive information identification system and method based on big data, and belongs to the technical field of big data analysis. The method comprises the following steps: step 1, acquiring text data to be analyzed; step 2, converting the text data into a vector form; step 3, constructing a sensitive information identification analysis model and receiving text data in a vector form; step 4, identifying the sensitive information of the text data by using a sensitive information identification analysis model; and 5, outputting a recognition analysis result. The method reduces the possibility that the malicious text is widely spread in practical application by effectively identifying the sensitive words, further analyzes different text tampering modes, expands the sensitive word database according to different text tampering modes, and effectively improves the detection accuracy of comparison.
Description
Technical Field
The invention belongs to the technical field of big data analysis, and particularly relates to a government affair sensitive information identification system and method based on big data.
Background
With the development of internet technology, the phenomenon of data electronization gradually dominates the data management mode, and government information is an important category of information and is a general term for information, conditions, data, charts, text materials, audio-video materials and the like which reflect government work and related things in government activities. Government affair information should meet three conditions at the same time, wherein the information mastered by the government agency means that the government agency legally generates, collects and integrates; the second is information related to economic, social management and public services, and the third is content reflected by a specific carrier. Since government affair information relates to all aspects of society, compared with other application fields, sensitive words related in the neighborhood of government affairs often cause deviation of understanding and development direction of public sentiment, how to realize deep mining and analysis of sensitive information in massive texts and improve recognition results are a problem to be solved urgently at present.
In the prior art, there is a technical scheme for identifying and screening government affair sensitive information:
prior art 1 (CN 114386408 a) discloses a government affair sensitive information identification method, apparatus, device, medium, and program product, and specifically discloses obtaining at least one government affair statement, including text content associated with government affair data; generating a first sentence vector based on semantic information of the at least one government affair sentence; and taking the first sentence vector as the input of an identification model to obtain a classification result output by the identification model, and determining the sensitive information related to the at least one government affair sentence according to the classification result.
The prior art 2 (CN 113792308 a) discloses a method for analyzing risk of security behaviors facing government affairs sensitive data, and specifically discloses a method for studying and judging use behaviors of sensitive data, identifying and automatically combing sensitive assets, and assisting in judging properties and motivations of sensitive data circulation; and performing risk identification and analysis by using the associated risk strategy and risk rule.
Prior art 3 (CN 111782811 a) discloses an e-government affair sensitive text detection method based on a convolutional neural network and a support vector machine, and specifically discloses a text vector constructed by using a TFIDF weighting technique, a sensitive field text classification model constructed by using a support vector machine algorithm through continuous machine learning training, and the model is used for judging whether a text belongs to the sensitive field.
When the prior art processes government affair sensitive information, the following problems still exist:
1. taking the prior art 1 as an example, a technical means for performing semantic extraction on a vector and training a recognition model is disclosed, but the technical means does not disclose the type and recognition method of the recognition model as a key technology, which is a common problem of the technology, still stays at the level of sensitive information screening based on the semantic vector, and has a high accuracy for a standard word stock based on the traditional technology in the Natural Language (NL) field of the semantic vector, but cannot perform rapid and accurate recognition on a non-standard word stock, especially information with emotional languages, and frequently overlooks sensitive information with strong emotional words but main ideographic vocabularies.
2. Taking the prior art 2 as an example, the information detection technology based on the label is disclosed, and belongs to the technical sub-field of label-based detection, however, when the label is inaccurate, the detection precision can be rapidly reduced, and when the big data of the label is lacked, the detection efficiency and precision are low;
3. taking the prior art 3 as an example, the invention discloses a technical sub-field for identifying sensitive information based on a convolution algorithm, which belongs to a high-complexity model, has higher requirement on the computational power of data, can deal with a small amount of data when the computational power of an operation system or a platform is lower, but has the problems of identification delay, memory occupation and the like caused by computational power congestion when identifying large data.
Disclosure of Invention
The invention aims to: a government affair sensitive information identification system and method based on big data are provided to solve the above problems in the prior art. The possibility of widespread dissemination of malicious text in practical applications is reduced by the efficient recognition of sensitive words.
The technical scheme is as follows: in a first aspect, a method for identifying government affairs sensitive information based on big data is provided, and the method specifically includes a method for identifying government affairs sensitive information based on big data, and is characterized by specifically including the following steps:
step 1, acquiring text data to be analyzed;
step 1.1, preprocessing text data, and extracting subject, predicate, object, fixed phrase, object, complement and punctuation information of the text;
step 1.2, extracting keywords after pretreatment; the extraction expression of the key words is as follows:
in the formula (I), the compound is shown in the specification,representing each numerical value with the emotion degree after the preprocessing, and the subscript c representing the serial number of each numerical value with the emotion degree after the preprocessing;the parameter is determined based on a previous sensitive word frequency library, and the subscript t represents the serial number of the parameter with the emotion degree; z represents a criticality parameter, wherein the criticality parameter refers to the frequency of occurrence of the keyword in the current network heat ranking;
step 2, converting the text data into a vector form;
step 3, constructing a sensitive information identification analysis model and receiving text data in a vector form;
and 4, identifying the sensitive information of the text data by using a sensitive information identification analysis model:
when the type of the sensitive words is the sensitive words with similar pronunciations, firstly analyzing the acquired text into phonetic codes, and then calculating the editing distance of the phonetic codes to obtain the semantic similarity between the sensitive words and the words to be detected;
when the type of the sensitive word is the sensitive word in the form of short name, firstly extracting the initial letter of the word to be analyzed and combining the initial letter, and then taking the initial letter as a matched target string and template string;
when the type of the sensitive word is the sensitive word in a splitting form, firstly converting the split word into a region code, and then matching the obtained region code, thereby realizing the matching of the word to be analyzed;
and 5, outputting a recognition analysis result.
In the process of recognizing the sensitive information through the sensitive information recognition analysis model, in order to improve the recognition accuracy of the sensitive words, a sensitive word database for storing sensitive words is further expanded; and then, detecting the sensitive words by further carrying out a corresponding processing mode in a mode of classifying the types. The expansion mode of the sensitive word database comprises the following steps: expanding sensitive words with similar pronunciations, expanding sensitive words in a form of short names and expanding sensitive words in a split form.
When the type of the sensitive words is the sensitive words with similar pronunciations, the acquired text is firstly analyzed into phonetic codes, and then the semantic similarity between the sensitive words and the words to be detected is obtained through the edit distance calculation of the phonetic codes.
When the type of the sensitive word is the sensitive word in the form of short name, firstly, the initials of the word to be analyzed are extracted and combined, and then the initials are used as the matched target string and template string.
When the type of the sensitive word is the sensitive word in the split form, firstly, the split word is converted into the region code, and then, the obtained region code is matched, so that the matching of the word to be analyzed is realized.
When the sensitive words are analyzed by using the sensitive information recognition and analysis model, the semantic tendency of the current text is further mined, the existence of extreme viewpoints is recognized by analyzing the tendency of the viewpoint, and a basis is provided for subsequent artificial viewpoint monitoring and control by transmitting the extreme viewpoints to a responsible person.
Firstly, a sensitive word fingerprint library is established, and then the semantic similarity distance of the extracted sensitive words in the sensitive word fingerprint library is calculated in a semantic similarity detection mode. And finally, through the judgment of the calculated threshold, finding out the tendency viewpoint corresponding to the text data from the sensitive word fingerprint database.
Based on the obtained semantic fingerprints, when a text with high similarity in the text data is subjected to rapid viewpoint orientation identification, the corresponding similarity calculation expression is as follows:
in the formula (I), the compound is shown in the specification,representing an exclusive or operation; numful () represents a function for calculating a value of 1; f i And F j The ratio is a generation operation parameter for calculating the distance between two values.
In a second aspect, a government affair sensitive information identification system based on big data is provided for realizing an identification method of government affair sensitive information, and the system specifically comprises the following modules:
the data acquisition module is used for reading government affair text data to be analyzed;
a data conversion module configured to convert the read text data into a desired form;
the model construction module is arranged for constructing a sensitive information identification analysis model;
the data analysis module is used for analyzing the read text data by utilizing the sensitive information identification analysis model;
and the data output module is used for outputting the analysis result of the data analysis module.
In some implementation manners of the second aspect, when analyzing the massive sensitive words, firstly reading text data to be analyzed by using a data acquisition module; secondly, converting the read text data form into a required form by using a data conversion module according to requirements; thirdly, a sensitive information identification analysis model is built by utilizing a model building module; then, the data analysis module analyzes the sensitive information of the read text data by using the sensitive information identification and analysis model; and finally, outputting the analysis result of the data analysis module by adopting a data output module.
In a third aspect, a big data-based government affairs sensitive information identification device is provided, which includes: a processor and a memory storing computer program instructions.
The processor reads and executes computer program instructions to realize the government affair sensitive information identification method.
In a fourth aspect, a computer-readable storage medium having computer program instructions stored thereon is presented. The computer program instructions, when executed by the processor, implement a government-sensitive information identification method.
Has the advantages that:
1. the invention provides a government affair sensitive information identification system and method based on big data, which reduces the possibility that malicious texts are widely spread in practical application by effectively identifying sensitive words;
2. the method further analyzes different text tampering modes, expands the sensitive word database according to the different text tampering modes, and effectively improves the detection accuracy of comparison;
3. the sensitive information identification analysis model provided by the invention further excavates the semantic tendency of the current text, and identifies the existence of extreme points through the analysis of the tendency of the points of interest, thereby improving the monitoring strength on malicious texts;
4. the invention introduces an emotion investigation method, extracts keywords based on emotional degree, and improves the extraction precision of sensitive information when encountering sudden public sentiment, and particularly, the invention extracts the emotion keywords based on network popularity, so that the sensitive information related to the public sentiment can be determined more accurately and efficiently when the sudden public sentiment is faced.
5. The method is based on the sensitive word fingerprint library and the semantic fingerprint for identification, when a user performs anti-monitoring operation, for example, expression is performed by means of harmonic sounds, pinyin, characters with similar shapes, symbol segmentation and the like, the traditional monitoring mode of word library comparison can be avoided, but the semantic fingerprint cannot be completely hidden.
Drawings
FIG. 1 is a flow chart of data processing according to the present invention.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the present invention.
The applicant considers that a great amount of text data is an important component for constructing the electronic information age, and the sensitivity analysis of the web text is one of important factors for monitoring current public opinions. In order to avoid social security threats caused by propagation of certain sensitive words, a government affair sensitive information recognition system and method based on big data are provided, and emotion trends represented in texts are obtained through mining and analyzing mass text data, so that the security and stability of a network environment in the aspect of government affair application are maintained.
Example one
In one embodiment, a government affair sensitive information identification method based on big data is provided, potential relation between sensitive words and public opinion directions is mined through mining and analyzing a large amount of government affair data, network data supervision is achieved from a surface active head, and stability of network safety is improved. As shown in fig. 1, the method specifically includes the following steps:
step 1, acquiring mass text data to be analyzed;
specifically, the obtained text data is preprocessed firstly, and the preprocessing refers to extracting subject, predicate, object, fixed phrase, subject, complement and punctuation information of the text, so that the interference of multi-noise data is reduced, and the subsequent text recognition accuracy is improved. And then, extracting keywords from the data in the text, and analyzing the sensitive words through extracting the keywords.
Step 2, converting the text data into a vector form;
step 3, constructing a sensitive information identification analysis model and receiving text data in a vector form;
step 4, identifying the sensitive information of the text data by using a sensitive information identification analysis model;
and 5, outputting a recognition analysis result.
The embodiment provides a government affair sensitive information identification method based on big data, and the possibility that malicious texts are widely spread in practical application is reduced through effective identification of sensitive words.
Example two
In a further embodiment based on the first embodiment, the meaning of the sentence is usually highlighted by a keyword, so in the data processing process, the method further includes an extraction step of the keyword, specifically, the expression for extracting the keyword is as follows:
in the formula (I), the compound is shown in the specification,representing each numerical value with the emotion degree after the pretreatment, and the subscript c representing the serial number of each numerical value with the emotion degree after the pretreatment;the parameter is determined based on a previous sensitive word frequency library, and the subscript t represents the serial number of the parameter with the emotion degree; z represents a criticality parameter, which refers to the frequency of occurrence of the keyword in the current network heat ranking.
In the preprocessing process, firstly, the division of text data is realized through word segmentation, and then interfering words such as the mood assist words and the like are removed, so that a more accurate analysis text is obtained.
The method further provides extraction of the keywords in the text data aiming at the analysis process of the sensitive words, and by extracting the keywords which have more representative meanings for text expression, the speed of identifying the sensitive words can be effectively improved, and the operation time is shortened.
EXAMPLE III
In a further embodiment based on the embodiment, in an actual application, in order to avoid the possibility that malicious tampering text data is detected, a user with improper intention often changes the text presentation form by deforming the form of the sensitive words, so as to convey semantic information similar to the sensitive words to other users. Therefore, the implementation further performs a corresponding processing mode by a mode of classifying the sensitive words. Specifically, according to the data analysis requirements, the types of the sensitive words are divided into: sensitive words with similar pronunciations, short-form sensitive words and split-form sensitive words.
In a further embodiment, since the semantic propagation of the chinese text is mainly determined by pinyin, for the detection of sensitive words, users with improper intentions often adopt texts with similar pronunciations to avoid the detection of sensitive words. Therefore, when the type of the sensitive word is a sensitive word with similar pronunciation, the acquired text is firstly analyzed into a phonetic code, and then the semantic similarity between the sensitive word and the word to be detected is obtained through the edit distance calculation of the phonetic code.
In a further embodiment, language habits oriented to mass-living, communication in a way of omitting words often occurs, and when too many ellipses exist, the expression is made too obscure, thereby reducing the possibility of detecting sensitive words. Therefore, when the kind of sensitive word is a sensitive word in the form of abbreviation, the initials of the words to be analyzed are first extracted and combined, and then taken as the matching target string and template string.
In a further embodiment, with the increase of network terms, in order to meet the demand of mass entertainment, a phenomenon of splitting words by components occurs, for example, "research" is transformed into "stone-breaking study", so that users with improper intentions can also adopt the way which is not easy to be intelligently detected to carry out malicious semantic dissemination. When the type of the sensitive word is the sensitive word in the split form due to malicious splitting, the split word is firstly converted into a region code, and then the obtained region code is matched, so that the matching of the word to be analyzed is realized.
In a further embodiment, in the process of detecting the sensitive words, in order to improve the detection accuracy of the text, a sensitive word database used for comparison of the sensitive words is enriched according to the classification types of the sensitive words.
According to the method and the device, the text data are analyzed, and the sensitive word database is expanded based on the analysis result, so that the data base is effectively tamped for the detection of subsequent sensitive word information.
Example four
In a further embodiment based on the embodiment, when the sensitive words are analyzed by using the sensitive information identification and analysis model, the semantic tendency of the current text is further mined, the existence of the extreme viewpoint is identified by analyzing the tendency of the viewpoint, and a basis is provided for subsequent artificial viewpoint monitoring and control by transmitting the extreme viewpoint to a user supervisor.
Specifically, a sensitive word fingerprint library is established, and then semantic similarity distance calculation is performed on the extracted sensitive words in the sensitive word fingerprint library in a semantic similarity detection mode. And finally, through the judgment of the calculated threshold, finding out the tendency viewpoint corresponding to the text data from the sensitive word fingerprint database.
In a further embodiment, when there are many identical text contents in the text data to be analyzed, if the sensitive information identification analysis model is repeatedly called to perform the piece-by-piece analysis, a large amount of system resources are consumed, which results in the waste of operation resources. Aiming at the problems, the semantic fingerprint technology is adopted to quickly identify the text data with higher similarity. In a preferred embodiment, the process of computing semantic fingerprints specifically comprises the following steps:
step 1, performing word segmentation on received text data to obtain a word segmentation set;
step 2, identifying the sensitive words and obtaining fingerprint values corresponding to the sensitive words from the existing fingerprint database;
step 3, transforming the word set in the step 1 by adopting Hash processing to obtain a corresponding binary Hash value;
step 4, carrying out bitwise summation on the obtained hash values to obtain sequence values;
step 5, assigning 0 or 1 according to the sequence value and the positive and negative conditions of the numerical value to further obtain a final semantic fingerprint value of the text;
and 6, circularly calling the operation process of the semantic fingerprints until the text data in the whole process is analyzed.
And based on the obtained semantic fingerprint, quickly identifying the viewpoint tendency of the text with high similarity in the text data to be treated.
Wherein, the calculation expression of the similarity is as follows:
in the formula (I), the compound is shown in the specification,representing an exclusive or operation; numful () represents a function for calculating a value of 1; f i And F j The ratio is a generation operation parameter for calculating the distance between two values.
In a further embodiment, the text data only containing sensitive words does not represent the current text and relates to sensitive viewpoints, so that the defect of semantic existence is overcome by means of deep learning aiming at the proposed sensitive information recognition and analysis model.
Because the importance degree of each word to the text classification result is different, the embodiment introduces a self-attention mechanism, learns the weight values of the words in the sentence, and highlights the influence of the important words on the classification result because the words with high importance degree in the sentence have higher weight values, thereby further improving the identification accuracy of the model. The main purpose of the self-attention layer is to learn the weight value of a word at each position, so that the attention of a task is transferred to the word which plays an important role in a sentence during task learning. Since the multi-task learning has the same input, but the importance of each word in the two tasks is different, the weights of the words are adjusted in the self-attention layer, and the words playing an important role in the embodiment are given larger weights.
In order to effectively improve the performance of the sensitive information identification analysis model, a loss function is adopted to carry out performance optimization on the sensitive information identification analysis model, wherein the corresponding loss function expression is as follows:
where y represents the actual value of the parsed text;representing a model prediction output value; s represents the likelihood probability distribution over each class.
EXAMPLE five
In one embodiment, a big data-based government affair sensitive information identification system is provided, which is used for implementing a big data-based government affair sensitive information identification method, and specifically includes the following modules: the device comprises a data acquisition module, a data conversion module, a model construction module, a data analysis module and a data output module.
Specifically, the data acquisition module is used for reading mass government affair text data to be analyzed according to analysis requirements; the data conversion module is used for performing form conversion on the read text data according to the file format requirement; the model construction module is used for constructing a sensitive information identification analysis model; the data analysis module is used for analyzing the sensitive information of the converted text data; and the data output module is used for outputting the analysis result obtained by the data analysis module.
In a further embodiment, when analyzing massive sensitive words, reading text data to be analyzed by using a data acquisition module; secondly, converting the read text data form into a required form by using a data conversion module according to requirements; thirdly, a sensitive information identification analysis model is built by utilizing a model building module; then, the data analysis module analyzes the sensitive information of the read text data by using the sensitive information identification and analysis model; and finally, outputting the analysis result of the data analysis module by adopting a data output module.
EXAMPLE six
In one embodiment, a big data-based government affairs sensitive information identification device is provided, which comprises: a processor and a memory storing computer program instructions.
Wherein, the processor reads and executes the computer program instructions to realize the government affair sensitive information identification method.
EXAMPLE seven
In one embodiment, a computer-readable storage medium having computer program instructions stored thereon is presented.
Wherein the computer program instructions, when executed by the processor, implement a government affairs sensitive information identification method.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A government affair sensitive information identification method based on big data is characterized by comprising the following steps:
step 1, acquiring text data to be analyzed;
step 1.1, preprocessing text data, and extracting subject, predicate, object, fixed phrase, object, complement and punctuation information of the text;
step 1.2, extracting keywords after pretreatment; the extraction expression of the key words is as follows:
in the formula (I), the compound is shown in the specification,representing each numerical value with the emotion degree after the preprocessing, and the subscript c representing the serial number of each numerical value with the emotion degree after the preprocessing;the parameter is determined based on a previous sensitive word frequency library, and the subscript t represents the serial number of the parameter with the emotion degree; z represents a criticality parameter, wherein the criticality parameter refers to the frequency of occurrence of the keyword in the current network heat ranking;
step 2, converting the text data into a vector form;
step 3, constructing a sensitive information identification analysis model and receiving text data in a vector form;
and 4, identifying the sensitive information of the text data by using a sensitive information identification analysis model:
when the type of the sensitive words is the sensitive words with similar pronunciations, firstly analyzing the acquired text into phonetic codes, and then calculating the editing distance of the phonetic codes to obtain the semantic similarity between the sensitive words and the words to be detected;
when the type of the sensitive word is the sensitive word in the form of short name, firstly extracting the initial letter of the word to be analyzed and combining the initial letter, and then taking the initial letter as a matched target string and template string;
when the type of the sensitive word is the sensitive word in a splitting form, firstly converting the split word into a region code, and then matching the obtained region code, thereby realizing the matching of the word to be analyzed;
and 5, outputting a recognition analysis result.
2. The government affair sensitive information recognition method based on big data according to claim 1, wherein in the process of recognizing the sensitive information through the sensitive information recognition analysis model, in order to improve the recognition accuracy of the sensitive words, the sensitive word database storing the sensitive words is further expanded.
3. The method for identifying government affairs sensitive information based on big data according to claim 2, wherein the expansion mode of the sensitive word database comprises the following steps:
expanding sensitive words with similar pronunciations;
expanding sensitive words in a short form;
and expanding the sensitive words in a split form.
4. The government affair sensitive information identifying method based on big data as claimed in claim 3, wherein when the sensitive words are analyzed by using the sensitive information identifying and analyzing model, the semantic tendency of the current text is further mined, the existence of extreme viewpoints is identified by analyzing the tendency of the viewpoint, and a basis is provided for the subsequent artificial viewpoint monitoring and control by transmitting the extreme viewpoints to the responsible persons.
5. A government affairs sensitive information identification system based on big data, which is used for realizing the government affairs sensitive information identification method according to any one of claims 1-4, and is characterized by comprising the following modules:
the data acquisition module is used for reading government affair text data to be analyzed;
a data conversion module configured to convert the read text data into a desired form;
the model construction module is arranged for constructing a sensitive information identification analysis model;
the data analysis module is used for analyzing the read text data by utilizing the sensitive information identification analysis model;
and the data output module is used for outputting the analysis result of the data analysis module.
6. A big data-based government affairs sensitive information identification device, comprising:
a processor and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the government affairs sensitive information identification method according to any one of claims 1-4.
7. A computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the government-sensitive information identifying method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211424814.9A CN115759071A (en) | 2022-11-14 | 2022-11-14 | Government affair sensitive information identification system and method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211424814.9A CN115759071A (en) | 2022-11-14 | 2022-11-14 | Government affair sensitive information identification system and method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115759071A true CN115759071A (en) | 2023-03-07 |
Family
ID=85370785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211424814.9A Withdrawn CN115759071A (en) | 2022-11-14 | 2022-11-14 | Government affair sensitive information identification system and method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115759071A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628211A (en) * | 2023-07-25 | 2023-08-22 | 中国电信股份有限公司 | Data classification method and device, storage medium and electronic equipment |
CN116939292A (en) * | 2023-09-15 | 2023-10-24 | 天津市北海通信技术有限公司 | Video text content monitoring method and system in rail transit environment |
-
2022
- 2022-11-14 CN CN202211424814.9A patent/CN115759071A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628211A (en) * | 2023-07-25 | 2023-08-22 | 中国电信股份有限公司 | Data classification method and device, storage medium and electronic equipment |
CN116628211B (en) * | 2023-07-25 | 2023-11-07 | 中国电信股份有限公司 | Data classification method and device, storage medium and electronic equipment |
CN116939292A (en) * | 2023-09-15 | 2023-10-24 | 天津市北海通信技术有限公司 | Video text content monitoring method and system in rail transit environment |
CN116939292B (en) * | 2023-09-15 | 2023-11-24 | 天津市北海通信技术有限公司 | Video text content monitoring method and system in rail transit environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107315737B (en) | Semantic logic processing method and system | |
JP5167546B2 (en) | Sentence search method, sentence search device, computer program, recording medium, and document storage device | |
CN111046656B (en) | Text processing method, text processing device, electronic equipment and readable storage medium | |
US20100185691A1 (en) | Scalable semi-structured named entity detection | |
CN111950273A (en) | Network public opinion emergency automatic identification method based on emotion information extraction analysis | |
CN111310476B (en) | Public opinion monitoring method and system using aspect-based emotion analysis method | |
CN115759071A (en) | Government affair sensitive information identification system and method based on big data | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
CN114676255A (en) | Text processing method, device, equipment, storage medium and computer program product | |
Suyanto | Synonyms-based augmentation to improve fake news detection using bidirectional LSTM | |
CN114756675A (en) | Text classification method, related equipment and readable storage medium | |
CN111782793A (en) | Intelligent customer service processing method, system and equipment | |
CN112711666B (en) | Futures label extraction method and device | |
CN110020024B (en) | Method, system and equipment for classifying link resources in scientific and technological literature | |
CN112528653B (en) | Short text entity recognition method and system | |
CN111061939A (en) | Scientific research academic news keyword matching recommendation method based on deep learning | |
Hao | Naive Bayesian Prediction of Japanese Annotated Corpus for Textual Semantic Word Formation Classification | |
JP2000339310A (en) | Method and device for classifying document and recording medium with program recorded thereon | |
Golovko et al. | Neural network approach for semantic coding of words | |
CN114036946B (en) | Text feature extraction and auxiliary retrieval system and method | |
Oghaz et al. | Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models | |
Mussabayev et al. | Creation of necessary technical and expert-analytical conditions for development of the information system of evaluating open text information sources’ influence on society | |
BARKOVSKA et al. | WAYS TO DETERMINE THE RANGE OF KEYWORDS IN A FREQUENCY DICTIONARY FOR TEXT CLASSIFICATION | |
Ferrández et al. | Fine tuning features and post-processing rules to improve named entity recognition | |
Chaudhary | Word Embedding Based Feature Extraction for Nepali News Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20230307 |