CN115828888A - Method for semantic analysis and structurization of various weblogs - Google Patents
Method for semantic analysis and structurization of various weblogs Download PDFInfo
- Publication number
- CN115828888A CN115828888A CN202211444888.9A CN202211444888A CN115828888A CN 115828888 A CN115828888 A CN 115828888A CN 202211444888 A CN202211444888 A CN 202211444888A CN 115828888 A CN115828888 A CN 115828888A
- Authority
- CN
- China
- Prior art keywords
- log
- logs
- word
- convolution
- weblogs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000004458 analytical method Methods 0.000 title abstract description 8
- 230000014509 gene expression Effects 0.000 claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 238000006243 chemical reaction Methods 0.000 claims abstract description 5
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 43
- 238000013527 convolutional neural network Methods 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 8
- 230000005856 abnormality Effects 0.000 claims description 3
- 238000010219 correlation analysis Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000012360 testing method Methods 0.000 abstract description 6
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method for carrying out semantic analysis and structuralization on various weblogs, which comprises the following steps: data preprocessing, namely processing original log data into standard input data required by an algorithm, wherein the standard input data comprises named entity recognition, word segmentation, filtering, case and case conversion, vectorization and the like; detecting log sources, namely analyzing logs of different sources, summarizing log formats of the logs, extracting regular expressions, constructing log formats for the logs of each source, and detecting the log sources according to the log formats; the method for carrying out semantic analysis and structuralization on various weblogs can carry out semantic analysis and structuralization analysis on file/folder operation abnormity, network abnormity, database abnormity, hardware abnormity, system abnormity, other abnormity and the like, and quickly tests logs of components from different sources, wherein 10000 logs are selected for each component log to test, and the accuracy rate reaches 99.95%.
Description
Technical Field
The invention relates to the technical field of wall surface cleaning, and particularly discloses a method for performing semantic analysis and structuralization on various weblogs.
Background
With the continuous development of information technology, information systems and facilities provide great convenience for production and life of various industries, and related network security becomes a key link related to public security, even national security, and real-time monitoring of network attack behaviors and illegal behaviors becomes a necessary measure for protecting the security of key information infrastructures;
semantic parsing, which refers to a task of converting a natural language question into a logical form. The logical form is a structured semantic expression, usually an executable statement, such as Lambda expression, SQL query language, which can be directly executed by a program, retrieved from a database and returned an answer. Because of the tightly coupled nature with the knowledge base, semantic parsing is often applied in the field of automatic question-answering based on knowledge maps or databases;
in order to construct a semantic parser in a new field, researchers need to first obtain a large amount of training data, usually starting with writing a template rule of a tuple (in a standard question, logical form);
however, since only the corpus is generated by using the template rule, the trained naive semantic parser has a poor effect on the real question (natural language question) and poor generalization performance due to the obvious difference in data distribution between the standard sentence and the natural sentence. Therefore, a method for performing semantic parsing and structuring on various weblogs is provided.
Disclosure of Invention
In view of the above-mentioned drawbacks and deficiencies of the prior art, the present application is directed to a method for semantic parsing and structuring multiple weblogs, comprising:
the method comprises the following steps of firstly, preprocessing data, namely processing original log data into standard input data required by an algorithm, wherein the standard input data comprises named entity recognition, word segmentation, filtering, case and case conversion, vectorization and the like;
secondly, detecting log sources, namely analyzing logs from different sources, summarizing the log formats of the logs, extracting regular expressions, constructing a log format for the logs from each source, and detecting the log sources according to the log formats;
acquiring log data, analyzing the logs, and classifying the processed logs based on log semantics and service completion strength by using a VCNN (virtual record network) server;
step four, the VCNN server uses wide convolution, the convolution result is a feature space two-dimensional graph, output vectors of each word vector on the number of components are spliced to obtain a final output feature graph cemw ∈ Rn × k, the variable pooling layer respectively adopts maximum pooling and average pooling to pool the features extracted from the variable pooling layer, and then the results are combined to be input of a full connection layer of the convolutional neural network;
step five, the full-connection layer of the convolutional neural network plays a role of a classifier in the whole convolutional neural network, and 5 isomorphic and heterogeneous classification clusters are obtained according to the strength from failure to success of the service through convolution of the full-connection layer of the convolutional neural network;
step six, performing improved Bayesian classification based on the correlation among words, performing correlation analysis on classification results and performance of online services in the classification, finding out log source texts related to service abnormality, outputting 5 isomorphic and heterogeneous classification clusters for the VCNN server, and sequentially performing classification based on online service fault classification on the clusters;
and seventhly, identifying the level of the log belonging to the service completion strength through the steps, if the level is the level with high service failure rate, identifying the service performance associated with the log, and repeating the steps by continuously collecting the system logs of the online service to complete the online service abnormity detection.
Preferably, named entity identification requires identifying entities that frequently appear in the timestamp, url, ip, file, path, number, email logs.
Preferably, the overall structure of the VCNN server includes an input layer of a word vector matrix, a variable convolution layer, a variable pooling layer, a fully-connected layer of a convolutional neural network, and an output layer.
Preferably, the variable convolutional layer extracts features from the sentence length and the number of word vector components in the word vector matrix.
Preferably, the input matrix of the variable convolution layer is s ∈ Rn × k, where R denotes a geometric space, n denotes the length of the input sentence, and k denotes the dimension of the word vector.
Preferably, in the first step, the term is divided by considering a common hump expression in the log; in the log vectorization process, word vectors are trained by using the general corpus, the system/middleware log corpus and the service log corpus, and finally, the number of the word vectors in the components is 200 dimensions, and the size of a word bank is 583511.
Preferably, in addition to performing one-dimensional convolution in the sentence length direction, the VCNN server also performs convolution on the word vector by the number of components, where the convolution kernel size is w × 1, and w is the width of the convolution kernel in the sentence length; the number of components of each word vector corresponds to its own convolution kernel; assuming the convolution width as wg ∈ Rw × 1 and representing a one-dimensional convolution kernel applied to the g-th dimension of the input matrix; in the direction of sentence length, si represents the word vector of the ith word, si: g represents a concatenation matrix of word vectors from the ith word to the g-th word; convolution kernel wg is used for convolution of word sequences to generate features and the convolution kernel wg of the g < th > word vector in the number of components is applied to all possible word sequences in the number of the g < th > word vector in the sentence in the number of components to generate corresponding feature maps.
Has the advantages that: the method for semantic parsing and structuring aiming at various weblogs can classify log exception types into 6 types: the method comprises the steps of carrying out semantic analysis and structural analysis on file/folder operation abnormity, network abnormity, database abnormity, hardware abnormity, system abnormity, other abnormity and the like, rapidly testing logs of components with different sources, selecting 10000 logs for each component log to test, wherein the accuracy rate reaches 99.94%, and constructing rules to carry out source detection on mature system/intermediate components, so that the extremely high accuracy rate can be achieved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is a block diagram of a system for semantic parsing and structuring of various weblogs in accordance with the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
The drawings in the embodiments of the invention: the different types of cross-sectional lines in the figures are not given the national standard, do not require the material of the elements, and distinguish between cross-sectional views of the elements in the figures.
Referring to fig. 1, a method for semantic parsing and structuring multiple weblogs includes the following steps:
step one, data preprocessing, namely processing original log data into standard input data required by an algorithm, wherein the standard input data comprises named entity recognition, word segmentation, filtering, case and case conversion, vectorization and the like;
secondly, detecting log sources, namely analyzing logs from different sources, summarizing the log formats of the logs, extracting regular expressions, constructing a log format for the logs from each source, and detecting the log sources according to the log formats;
acquiring log data, analyzing the logs, and classifying the processed logs based on log semantics and service completion strength by using a VCNN (virtual record network) server;
step four, the VCNN server uses wide convolution, the convolution result is a feature space two-dimensional graph, output vectors of each word vector on the number of components are spliced to obtain a final output feature graph cemw ∈ Rn × k, the variable pooling layer respectively adopts maximum pooling and average pooling to pool the features extracted from the variable pooling layer, and then the results are combined to be input of a full connection layer of the convolutional neural network;
step five, the full-connection layer of the convolutional neural network plays a role of a classifier in the whole convolutional neural network, and 5 isomorphic and heterogeneous classification clusters are obtained according to the strength from failure to success of the service through convolution of the full-connection layer of the convolutional neural network;
step six, performing improved Bayesian classification based on the correlation among words, performing correlation analysis on classification results and performance of online services in the classification, finding out log source texts related to service abnormality, outputting 5 isomorphic and heterogeneous classification clusters for the VCNN server, and sequentially performing classification based on online service fault classification on the clusters;
and seventhly, identifying the level of the log belonging to the service completion strength through the steps, if the level is the level with high service failure rate, identifying the service performance associated with the log, and repeating the steps by continuously collecting the system logs of the online service to complete the online service abnormity detection.
The named entity identification needs to identify entities which often appear in time, url, ip, file, path, number and email logs.
The overall structure of the VCNN server comprises an input layer, a variable convolution layer, a variable pooling layer, a full connection layer and an output layer of a convolutional neural network of a word vector matrix.
Wherein the variable convolutional layer extracts features from the sentence length and the number of word vector components in the word vector matrix.
Wherein the input matrix of the variable convolution layer is s ∈ Rn × k, where R represents a geometric space, n represents the length of the input sentence, and k represents the dimension of the word vector.
The common hump expression in the log needs to be considered in word segmentation; in the log vectorization process, word vectors are trained by using the general corpus, the system/middleware log corpus and the service log corpus, and finally, the number of the word vectors in the components is 200 dimensions, and the size of a word bank is 583511.
Besides performing one-dimensional convolution in the sentence length direction, the VCNN server performs convolution on the word vectors in the number of components, wherein the convolution kernel size is w multiplied by 1, and w is the width of the convolution kernel in the sentence length; the number of components of each word vector corresponds to its own convolution kernel; assuming the convolution width as wg ∈ Rw × 1 and representing a one-dimensional convolution kernel applied to the g-th dimension of the input matrix; in the direction of sentence length, si represents the word vector of the ith word, si: g represents a concatenation matrix of word vectors from the ith word to the g-th word; convolution of the word sequence using the convolution kernel wg to generate features the convolution kernel wg over the number of components of the g-th word vector is applied to all possible word sequences over the number of components of the g-th word vector of the sentence to generate a corresponding feature map.
It should be noted that the log records detailed information of the software system during operation, and the system development and operation and maintenance personnel can analyze abnormal behaviors and errors of the system according to the log monitoring system. Log exception detection can be divided into semantic exceptions (execution results), execution exceptions (execution log sequences), and performance exceptions (execution times).
The logging system performs certain operations and the results of the corresponding operations at a certain point in time.
The types of exceptions may be broadly categorized, such as network exceptions, database exceptions, hardware exceptions, I/O exceptions, operating system exceptions, and the like. Each type can be subdivided, and taking hardware exception as an example, there may be hardware exceptions such as CPU exception, insufficient disk space, disk damage, and the like.
The premise of automatically judging the log abnormal type is to formulate a uniform log abnormal type description standard and fine classification and characteristics in each category.
The log is different from natural language text:
(1) The log is a semi-structured text, the log usually comprises a log header and log description information, and the log header often comprises fields such as a timestamp, a source and a log grade; the log description information contains the description of the current operation and the corresponding result, and the semantic information is rich;
(2) A large amount of repetition exists in the log, the log description information contains constant information and variable values, and after the variable values are often used as parameters for symbolization, a large amount of logs can be compressed into a log template;
(3) The log contains a large number of continuous writing character strings in hump format, which are related to naming formats of functions, classes and the like of different programming languages
(4) The vocabulary contained in the log data of a sophisticated system/middleware is small.
3. Vectorization of logs
Vectorized representation of logs requires consideration of the following issues:
(1) Before log vectorization, a log description field needs to be extracted, and the log description field is initialized;
(2) The variable values in the log are usually meaningless values or different ip, url, path and the like, and the variable values need to be replaced;
(3) The special writing method of the log needs to make a new rule to segment the log
(4) The more the log is repeated and the more mature the system is, the more consistent the format and description are, so that the effective vocabulary of the log is less, the subsequent OOV problem occurs, and the log data and the general data need to be combined for vectorization training.
When semantic parsing is performed on various weblogs, firstly, data preprocessing is performed: processing raw log data into standard input data required by an algorithm, comprising: named entity recognition, word segmentation, filtering, case conversion, vectorization, and the like.
Named entity identification needs to identify entities frequently appearing in logs such as timestamp, url, ip, file, path, number, email and the like;
the common hump expression in the log needs to be considered in word segmentation;
in the log vectorization process, word vectors are trained by using a universal corpus (wikidata) + system/middleware log corpus + business log corpus, and finally, the dimension of the word vectors is 200 dimensions, and the size of a word bank is 583511.
Log source detection: analyzing logs of different sources, summarizing log formats of the logs, extracting regular expressions, constructing a log format for the logs of each source, and detecting the log sources according to the log formats.
The log source detection method based on the rules tests logs of components with different sources, 10000 logs are selected for each component log to test, and the accuracy rate reaches 99.94%. For mature systems/intermediate components, the construction rules for source detection can achieve extremely high accuracy.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
Claims (7)
1. A method for semantic parsing and structuring aiming at various weblogs is characterized in that: the method comprises the following steps:
the method comprises the following steps of firstly, preprocessing data, namely processing original log data into standard input data required by an algorithm, wherein the standard input data comprises named entity recognition, word segmentation, filtering, case and case conversion, vectorization and the like;
secondly, detecting log sources, namely analyzing logs from different sources, summarizing the log formats of the logs, extracting regular expressions, constructing a log format for the logs from each source, and detecting the log sources according to the log formats;
acquiring log data, analyzing the logs, and classifying the processed logs based on log semantics and service completion strength by using a VCNN (virtual record network) server;
step four, the VCNN server uses wide convolution, the convolution result is a feature space two-dimensional graph, output vectors of each word vector on the number of components are spliced to obtain a final output feature graph cemw ∈ Rn × k, the variable pooling layer respectively adopts maximum pooling and average pooling to pool the features extracted from the variable pooling layer, and then the results are combined to be input of a full connection layer of the convolutional neural network;
step five, the full-connection layer of the convolutional neural network plays a role of a classifier in the whole convolutional neural network, and 5 isomorphic and heterogeneous classification clusters are obtained according to the strength from failure to success of the service through convolution of the full-connection layer of the convolutional neural network;
step six, performing improved Bayesian classification based on the correlation among words, performing correlation analysis on classification results and performance of online services in the classification, finding out log source texts related to service abnormality, outputting 5 isomorphic and heterogeneous classification clusters for the VCNN server, and sequentially performing classification based on online service fault classification on the clusters;
and seventhly, identifying the level of the log belonging to the service completion strength through the steps, if the level is the level with high service failure rate, identifying the service performance associated with the log, and repeating the steps by continuously collecting the system logs of the online service to complete the online service abnormity detection.
2. The method of claim 1, wherein the semantic parsing and structuring for the plurality of weblogs is performed by: named entity recognition requires recognition of entities that often appear in the timestamp, url, ip, file, path, number, email logs.
3. The method of claim 2, wherein the semantic parsing and structuring is performed on a plurality of weblogs according to a formula selected from the group consisting of: the overall structure of the VCNN server comprises an input layer, a variable convolution layer, a variable pooling layer, a full-connection layer and an output layer of a convolutional neural network.
4. The method of claim 1, wherein the semantic parsing and structuring is performed on a plurality of weblogs, and the method comprises: the variable convolutional layer extracts features from the sentence length and the number of word vector components in the word vector matrix.
5. The method of claim 1, wherein the semantic parsing and structuring is performed on a plurality of weblogs, and the method comprises: the input matrix of the variable convolutional layer is s ∈ Rn × k, where R represents a geometric space, n represents the length of the input sentence, and k represents the dimension of the word vector.
6. The method of claim 1, wherein the semantic parsing and structuring is performed on a plurality of weblogs, and the method comprises: in the first step, common hump expressions in logs need to be considered; in the log vectorization process, word vectors are trained by using the general corpus, the system/middleware log corpus and the service log corpus, and finally, the number of the word vectors in the components is 200 dimensions, and the size of a word bank is 583511.
7. The method of claim 4, wherein the semantic parsing and structuring is performed on a plurality of weblogs according to a formula selected from the group consisting of: in addition to performing one-dimensional convolution in the sentence length direction, the VCNN server performs convolution on the word vector by the number of components, the convolution kernel size is w × 1, and w is the width of the convolution kernel in the sentence length; the number of components of each word vector corresponds to its own convolution kernel; assuming the convolution width as wg ∈ Rw × 1 and representing a one-dimensional convolution kernel applied to the g-th dimension of the input matrix; in the direction of sentence length, si represents the word vector of the ith word, si: g represents a concatenation matrix of word vectors from the ith word to the g-th word; convolution of the word sequence using the convolution kernel wg to generate features the convolution kernel wg over the number of components of the g-th word vector is applied to all possible word sequences over the number of components of the g-th word vector of the sentence to generate a corresponding feature map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211444888.9A CN115828888A (en) | 2022-11-18 | 2022-11-18 | Method for semantic analysis and structurization of various weblogs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211444888.9A CN115828888A (en) | 2022-11-18 | 2022-11-18 | Method for semantic analysis and structurization of various weblogs |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115828888A true CN115828888A (en) | 2023-03-21 |
Family
ID=85528952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211444888.9A Pending CN115828888A (en) | 2022-11-18 | 2022-11-18 | Method for semantic analysis and structurization of various weblogs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115828888A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628451A (en) * | 2023-05-31 | 2023-08-22 | 江苏华存电子科技有限公司 | High-speed analysis method for information to be processed |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112182219A (en) * | 2020-10-09 | 2021-01-05 | 杭州电子科技大学 | Online service abnormity detection method based on log semantic analysis |
CN113297051A (en) * | 2021-07-26 | 2021-08-24 | 云智慧(北京)科技有限公司 | Log analysis processing method and device |
CN113377607A (en) * | 2021-05-13 | 2021-09-10 | 长沙理工大学 | Method and device for detecting log abnormity based on Word2Vec and electronic equipment |
US20210357282A1 (en) * | 2020-05-13 | 2021-11-18 | Mastercard International Incorporated | Methods and systems for server failure prediction using server logs |
CN114610515A (en) * | 2022-03-10 | 2022-06-10 | 电子科技大学 | Multi-feature log anomaly detection method and system based on log full semantics |
-
2022
- 2022-11-18 CN CN202211444888.9A patent/CN115828888A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210357282A1 (en) * | 2020-05-13 | 2021-11-18 | Mastercard International Incorporated | Methods and systems for server failure prediction using server logs |
CN112182219A (en) * | 2020-10-09 | 2021-01-05 | 杭州电子科技大学 | Online service abnormity detection method based on log semantic analysis |
CN113377607A (en) * | 2021-05-13 | 2021-09-10 | 长沙理工大学 | Method and device for detecting log abnormity based on Word2Vec and electronic equipment |
CN113297051A (en) * | 2021-07-26 | 2021-08-24 | 云智慧(北京)科技有限公司 | Log analysis processing method and device |
CN114610515A (en) * | 2022-03-10 | 2022-06-10 | 电子科技大学 | Multi-feature log anomaly detection method and system based on log full semantics |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628451A (en) * | 2023-05-31 | 2023-08-22 | 江苏华存电子科技有限公司 | High-speed analysis method for information to be processed |
CN116628451B (en) * | 2023-05-31 | 2023-11-14 | 江苏华存电子科技有限公司 | High-speed analysis method for information to be processed |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220405592A1 (en) | Multi-feature log anomaly detection method and system based on log full semantics | |
CN109697162B (en) | Software defect automatic detection method based on open source code library | |
US6047277A (en) | Self-organizing neural network for plain text categorization | |
US20050246353A1 (en) | Automated transformation of unstructured data | |
CN113191148B (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
CN111967761A (en) | Monitoring and early warning method and device based on knowledge graph and electronic equipment | |
CN112000802A (en) | Software defect positioning method based on similarity integration | |
Verma et al. | A novel approach for text summarization using optimal combination of sentence scoring methods | |
CN112115326B (en) | Multi-label classification and vulnerability detection method for Etheng intelligent contracts | |
CN112131453A (en) | Method, device and storage medium for detecting network bad short text based on BERT | |
CN115757695A (en) | Log language model training method and system | |
CN115828888A (en) | Method for semantic analysis and structurization of various weblogs | |
Sharma et al. | Ideology detection in the Indian mass media | |
US11604923B2 (en) | High volume message classification and distribution | |
Vu et al. | Revising FUNSD dataset for key-value detection in document images | |
CN116881971A (en) | Sensitive information leakage detection method, device and storage medium | |
CN116932753A (en) | Log classification method, device, computer equipment, storage medium and program product | |
CN114783446B (en) | Voice recognition method and system based on contrast predictive coding | |
Khritankov et al. | Discovering text reuse in large collections of documents: A study of theses in history sciences | |
Merlo et al. | Feed‐forward and recurrent neural networks for source code informal information analysis | |
CN115373982A (en) | Test report analysis method, device, equipment and medium based on artificial intelligence | |
Hisham et al. | An innovative approach for fake news detection using machine learning | |
CN111859896B (en) | Formula document detection method and device, computer readable medium and electronic equipment | |
Sulaiman et al. | South China Sea Conflicts Classification Using Named Entity Recognition (NER) and Part-of-Speech (POS) Tagging | |
Wunderle et al. | Pointer Networks: A Unified Approach to Extracting German Opinions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20230321 |
|
WD01 | Invention patent application deemed withdrawn after publication |