CN113419720A - Automatic judgment method for necessity of abbreviation expansion for source code - Google Patents

Automatic judgment method for necessity of abbreviation expansion for source code Download PDF

Info

Publication number
CN113419720A
CN113419720A CN202110762787.5A CN202110762787A CN113419720A CN 113419720 A CN113419720 A CN 113419720A CN 202110762787 A CN202110762787 A CN 202110762787A CN 113419720 A CN113419720 A CN 113419720A
Authority
CN
China
Prior art keywords
abbreviations
abbreviation
expanded
abb
source code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110762787.5A
Other languages
Chinese (zh)
Other versions
CN113419720B (en
Inventor
刘辉
罗晓青
姜艳杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110762787.5A priority Critical patent/CN113419720B/en
Publication of CN113419720A publication Critical patent/CN113419720A/en
Application granted granted Critical
Publication of CN113419720B publication Critical patent/CN113419720B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/77Software metrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for automatically judging the expansion necessity of an abbreviation oriented to a source code, belonging to the technical field of computer software quality maintenance. First, from a corpus of source code, common abbreviations that are frequently used by various developers in similar contexts are collected using data mining techniques. A given abbreviation is not expanded if it matches at least one common abbreviation found. In the same corpus, probability distributions for different types of identifier lengths are calculated. An abbreviation is not expanded if its full term is contained in its surrounding context, i.e. in the same source code line. Other abbreviations that do not determine whether or not to expand from a given method are expected to be replaced by their full names. For a given test item, most of the abbreviations may be classified correctly, with only a very small proportion of the abbreviations classified incorrectly. It is very accurate in selecting abbreviations that do not need to be expanded, with accuracy as high as 98% and recall as high as 96%.

Description

Automatic judgment method for necessity of abbreviation expansion for source code
Technical Field
The invention relates to a method for automatically judging whether an abbreviation needs to be expanded into a complete term, and belongs to the technical field of computer software quality maintenance.
Background
In software source code, they account for a large portion (70%) of the source code in terms of identifiers. These identifiers, which are composed of natural language terms, become a major source of software understanding. Meaningful identifiers are very helpful in understanding the source code, and therefore, qualified identifiers are particularly important.
Abbreviations are widely used for abbreviated identifiers. The skilled person often replaces a series of terms in an identifier with a short abbreviation. For example, "e" is often used to denote "exception," XMLParser "is used to denote" extensibilemarkuplangugageparser, "and so on. The appropriate abbreviations can greatly facilitate typing, typesetting, and reading lengthy source codes.
However, abbreviations can also significantly reduce the readability and maintainability of the software source code if improperly used. For example, the acronyms "s" (for "students") and "ds" (for "data sequence") are good examples of inappropriately used acronyms. In addition to code authors, it may be difficult for other technicians to ascertain the exact meaning of these abbreviations, which may lead to misunderstanding and improper use of software programs.
To this end, the prior art has proposed automated methods to provide complete terminology for a given abbreviation. For example, a software developer may replace abbreviations with full terms with these tools by renaming. However, there has not been an automated process for automatically determining whether an abbreviation needs to be expanded, i.e., whether the abbreviation should be replaced with a corresponding full term. Making such decisions is often challenging for inexperienced software developers and maintenance personnel, as the decisions have no quantitative guidance and are completely dependent on the experience and intuition of the developer.
Therefore, it is important to find a method that can automatically determine whether an abbreviation needs to be expanded by a software development tool, a software quality maintenance tool, or the like.
Disclosure of Invention
The invention aims to solve the technical problem of automatically judging whether an abbreviation needs to be expanded or not in the process of developing and maintaining computer software, namely, automatically judging whether the abbreviation should be replaced by a corresponding complete term or not.
The rationale for the method of the present invention is that an abbreviation should not be expanded if expansion of the abbreviation would result in a lengthy designator, or if a developer/maintainer could easily find the meaning (i.e., the full term) of the abbreviation based on their domain knowledge or the context of the abbreviation.
According to the basic principle, the invention provides a series of heuristic methods for selecting abbreviations which do not need to be expanded. First, from a corpus of source code, common abbreviations that are frequently used by various developers in similar contexts are collected using data mining techniques. The key to data mining is to transform the mining problem of common acronyms into the biggest clique problem that has been widely studied. A given abbreviation is not expanded if it matches at least one common abbreviation found. In the same corpus, probability distributions of different types of identifier (e.g., variable names and method names) lengths are computed. The probability distribution specifies the likelihood that a T-type identifier consists of exactly n characters. The heuristic method is as follows: an abbreviation is not expanded if the probability of its peripheral identifier is reduced by the expansion of the abbreviation. Finally, the method proposes that an abbreviation is not expanded if its complete term is contained in its surrounding context, i.e. in the same source code line. Other abbreviations that do not determine whether or not to expand from a given method are expected to be replaced with their full names.
Advantageous effects
The method firstly provides an automatic method for judging whether the abbreviation in the source code needs to be expanded, and the heuristic method focuses on different aspects of the abbreviation, namely length, popularity and context. Has the following beneficial effects:
1. for a given test item, most (95%) abbreviations may be classified correctly, with only 5% of abbreviations classified incorrectly;
2. the method is very accurate in selecting abbreviations which do not need to be expanded, the accuracy rate is as high as 98%, and the recall rate is as high as 96%;
drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Detailed Description
The invention is further illustrated and described in detail below with reference to the figures and examples.
The present invention is realized by the following technical means.
As shown in fig. 1, a method for automatically determining the necessity of expanding an abbreviation oriented to source code includes an offline mining phase and an online classification phase. In the offline mining phase, learning from a corpus of source code finds the distribution probability of identifier lengths and common abbreviations. In the online classification phase, a series of heuristic-based filtering methods are employed to determine whether a given abbreviation needs to be expanded.
The method specifically comprises the following steps:
step 1: the identifier length is analyzed.
First, all identifiers (i.e., names of software entities) are extracted from a corpus of software source code and classified according to the type of entity (e.g., variable name, method name, class name, etc.). For each type of identifier, a probability distribution of its length is calculated.
In this step, the generated identifiers are classified into 5 types including variable names, parameter names, method names, class names, and field names, for the extracted identifiers, regardless of whether they contain abbreviations or not. For each type of identifier, a probability distribution P of its length is calculated, the probability distribution representing the probability that an identifier of type T is composed of exactly n characters, denoted P (T, n).
Step 2: abbreviations in which the size of the maximum clique (i.e. the number of vertices) is not less than a predefined threshold β are extracted from the source code corpus.
The same-vocabulary abbreviations are presented in a graph wherever they are extracted, each abbreviation being represented as a node, and the weight of an edge representing the contextual similarity between the same-vocabulary abbreviations. If there is a large subgraph with one edge connected for each node pair, the subgraph (called a very big clique) represents a common abbreviation, widely used in similar contexts.
In this step, abbreviations are identified and extracted from the source code corpus using graph-based data mining techniques. Each resulting abbreviation is represented as a tuple. The tuple includes text and context of the abbreviation, wherein the context consists of the designator. The identifier was vectorized using the Paragraph2Vector algorithm. One significant advantage of the Paragraph2Vector algorithm is that semantically related texts can generate similar vectors, so that the similarity of the resulting vectors obtained after vectorization of the Paragraph2Vector can be used for representing the similarity of the original contexts.
For each group of abbreviations, an undirected graph G is constructed, where nodes represent the abbreviations in the group and weights for edges represent contextual similarity between the abbreviations. To identify common abbreviations that are often used by different developers in similar contexts, the problem is translated into a widely studied tremendous group of problems. To mine common abbreviations used in highly similar contexts, the edges of two vertices are deleted if the contextual similarity of the two abbreviations (represented by the two vertices) is less than a predefined threshold α. The largest clique of the result graph represents a popular abbreviation, often used in a highly similar context, and the size of the clique indicates the popularity of the abbreviation. To remove less popular abbreviations, only the abbreviations for which the size of the maximum clique (i.e., the number of vertices) is not less than the predefined threshold β size are retained.
And step 3: in step 1, for each type of identifier, a probability distribution of its length is calculated.
For a given abbreviation in an identifier id of type T, replacing the abbreviation with its full term will increase the length of id from k characters to j characters, k, j representing the number, if P (T, j) < P (T, k), the abbreviation is not expanded. The heuristic principle is that because of the expansion of the acronyms, the length of the identifier becomes less acceptable (i.e., less popular) and therefore, it is better not to perform the expansion. Otherwise, other heuristics will be applied to the abbreviation to obtain a final decision.
And 4, step 4: in step 2, abbreviations in the source code corpus are identified by data mining techniques, resulting in a set of maximal cliques. The method comprises the following specific steps:
search contains acronyms abbiAll abbreviations in the project of (a), and calculating and abbiThe number of lexically identical abbreviations. If this number is greater than the threshold γ, the abbreviation is not expanded. If there is a lexically identical maximum clique as the abb abbreviation and the average contextual similarity between nodes within the clique and abb is greater than the threshold β, the abbreviation is not expanded.
And 5: if abbreviation abb and its full term appear on the same line of source code that defines the peripheral identifier, it is not expanded;
in step 5, in order to identify these abbreviations that are easy to interpret by context, the following method is employed:
step A: first, for the abbreviation abbiThe entire line of source code is extracted, where its peripheral identifier is defined as the context of an abbreviation, denoted CTX (abb)i);
And B: the context CTX (abb) is then appliedi) The space, capital letters and special characters (e.g., "(" and ")") are broken down into a sequence of tokens, and the resulting sequence token is Seq (abb)i);
And C: let the full name of the abbreviation be < omega1,...,ωnIf Seq (abb)i) Where all words in the text have equivalent designations, abbreviations are not expanded. Two words are equivalent if they are the same or share the same root. For example, "threads" and "threads" are not identical, but they share the same root ("thread"), and thus they are considered equivalent.
Through steps A, B and C, it can be determined efficiently whether the abbreviation and its full term appear on the same line of source code that defines the peripheral identifier.
Step 6: if none of the abbreviations have been expanded in the previous step, the final expansion abb is performed.
Thus, through steps 1 through 6, an automated method of determining whether an abbreviation needs to be expanded is completed.
Examples
This embodiment details the steps and effects of the method for determining whether the abbreviation needs to be expanded in the project, which is specifically implemented under the open source projects with 5 different topics.
The open source software shown in table 2 was tested in a hardware environment as shown in table 1.
Table 1: hardware environment configuration information table
Hardware environment configuration Processor model Memory device Operating system
Test environment 3.4GHz Core i7-6700 16G 64-bit Windows 10
Table 2: basic information table of open source software
Figure BDA0003150604080000051
Table 3: number of abbreviations in a project that need to be expanded versus unexpanded
Name of item Number of samples Positive sample Negative sample Ratio of positive samples
Mail 346 71 275 21%
Doc 346 88 258 25%
DrJ 378 63 315 17%
Dubbo 377 52 325 14%
jEdit 371 75 296 20%
TOTAL 1,818 349 1,469 19%
For the open source items shown in Table 2, abbreviations are sampled from each item and a manual decision is made as to which ones need expansion.
The size of the sample is determined by the number of abbreviations in the test item. The minimum size of the sample was calculated using a sample size calculator with an error of 5% and a confidence level of 95%. Where all samples are drawn randomly. For each of the 1818 abbreviations that are generated, a human determines whether the abbreviation needs to be expanded. The resulting data set (called golden set) will be used as a benchmark in later evaluations. The example abbreviations in golden set fall into two categories. The first type is a positive sample, consisting of abbreviations that need to be expanded. Others belong to a second class, called negative examples;
for the golden set obtained, the method is applied thereto, and the generated result is compared with a manual decision. A suggestion for a given abbreviation is correct if and only if the generated suggestion is the same as the manual decision for the same abbreviation. Calculating classification indexes, namely accuracy, precision and recall rate;
three thresholds are used, α, β, γ, respectively. α is the minimum contextual similarity between abbreviations on the same blob. Beta represents the minimum size of the largest cluster of common abbreviations. γ represents the minimum time a domain-specific general abbreviation should appear in a single item (referred to as the least popular domain abbreviation). By changing the size of the threshold and repeating the previous evaluation. It should be noted that the value of a single threshold is changed at a time to explicitly reveal the effect of each threshold;
specifically, 1818 abbreviations in five open source projects are selected for the experiment; first, for a given abbreviation, a manual determination is made as to whether the abbreviation needs to be expanded, requiring 3 developers to manually decide whether they should be expanded. All three developers had more than three years of Java experience, requiring them to expand abbreviations and make independent decisions. If there are places of disagreement, they discuss agreement together. Then, the method of the present invention was applied to these abbreviations, the obtained results were compared with the results of manual judgment, and the performance of the method was evaluated, and the obtained evaluation results are shown in table 4. From this table it can be seen that most (95%) abbreviations can be classified correctly, with only 5% of them classified incorrectly. Secondly, the method is very accurate in selecting abbreviations that do not need to be expanded. In the search of the negative abbreviations, the method has the accuracy rate of 98 percent and the recall rate of 96 percent. The performance of items on different topics varies slightly. For example, its minimum and maximum precisions are 93% (on DecFetcher) and 96% (on DavMail and DrJava), respectively, indicating that the method is accurate.
Table 4: method performance
Figure BDA0003150604080000061
The negative sample precision in table 4 ═ number of true negative samples/(number of true negative samples + number of false negative samples);
negative sample recall in table 4 ═ number of true negative samples/(number of true negative samples + number of false positive samples);
the negative sample precision in table 4 ═ number of true positive samples/(number of true positive samples + number of false positive samples);
the negative sample recall in table 4 is true positive sample number/(true positive sample number + false negative sample number).
By varying the size of the threshold, it has been found that any such decrease in threshold results in a decrease in recall when positive samples are retrieved and an increase in recall when negative samples are retrieved. An increase in either threshold results in an increase in the accuracy of the negative sample search and a decrease in the accuracy of the positive sample search. Maximum accuracy can be produced at the default values of the threshold (α -0.4, β -15, γ -25), while decreasing or increasing the threshold decreases the accuracy of the method.
The results of the evaluation of 1818 abbreviations from 5 open source applications show that the method is accurate with up to 95% accuracy.
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims (2)

1. A method for automatically judging the expansion necessity of an abbreviation oriented to source codes is characterized by comprising the following steps:
step 1: analyzing the length of the identifier;
firstly, extracting all identifiers from a corpus of software source codes, and classifying the identifiers according to the types of entities, wherein the types comprise variable names, parameter names, method names, class names and field names; for each type of identifier, calculating a probability distribution of its length;
step 2: extracting abbreviations of which the size of the maximum cluster is not smaller than a predefined threshold value beta from a source code corpus;
and step 3: in step 1, for each type of identifier, a probability distribution of its length is calculated;
for a given abbreviation in an identifier id of type T, replacing the abbreviation with its full term will increase the length of id from k characters to j characters, k, j representing the number, if P (T, j) < P (T, k), the abbreviation is not expanded;
and 4, step 4: in step 2, identifying abbreviations in a source code corpus by a data mining technology, thereby generating a set of maximum cliques;
search contains acronyms abbiAll abbreviations in the project of (a), and calculating and abbiThe number of lexically identical abbreviations; if this number is greater than the threshold γ, the abbreviation is not expanded; if there is a lexically identical maximum clique as the abb abbreviation and the average contextual similarity between nodes within the clique and abb is greater than the threshold β, the abbreviation is not expanded;
and 5: if abbreviation abb and its full term appear on the same line of source code that defines the peripheral identifier, it is not expanded;
step A: first, for the abbreviation abbiThe entire line of source code is extracted, where its peripheral identifier is defined as the context of an abbreviation, denoted CTX (abb)i);
And B: the context CTX (abb) is then appliedi) The sequence tag is Seq (abb) by decomposing space, capital letters and special characters into a tag sequencei);
And C: let the full name of the abbreviation be < omega1,...,ωnIf Seq (abb)i) If all the words in the Chinese language have equivalent marks, the abbreviations are not expanded; two words are equivalent if they are the same or share the same root;
through steps A, B and C, it can be determined efficiently whether the abbreviation and its full term appear on the same line of source code that defines the peripheral identifier;
step 6: if none of the abbreviations have been expanded in the previous step, the abbreviations are eventually expanded abb.
2. The method for automatically determining the necessity of expanding abbreviations oriented to source codes as claimed in claim 1, wherein in step 2, the abbreviations are recognized and extracted from the source code corpus by using a graph-based data mining technology, and each obtained abbreviation is represented as a tuple, wherein the tuple comprises texts and contexts of the abbreviations; wherein, the context consists of an identifier, and the identifier is vectorized by using a Paragraph2Vector algorithm;
for each group of abbreviations, constructing an undirected graph G, wherein nodes represent the abbreviations in the group, and weights of edges represent the context similarity between the abbreviations;
deleting the edges of the two vertices if the context similarity of the two abbreviations is less than a predefined threshold α; the maximum clique of the result graph represents a popular abbreviation, and the size of the clique represents the popularity of the abbreviation; only abbreviations for which the size of the maximum clique is not smaller than a predefined threshold β size are retained.
CN202110762787.5A 2021-07-06 2021-07-06 Automatic judgment method for necessity of abbreviation expansion for source code Expired - Fee Related CN113419720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110762787.5A CN113419720B (en) 2021-07-06 2021-07-06 Automatic judgment method for necessity of abbreviation expansion for source code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110762787.5A CN113419720B (en) 2021-07-06 2021-07-06 Automatic judgment method for necessity of abbreviation expansion for source code

Publications (2)

Publication Number Publication Date
CN113419720A true CN113419720A (en) 2021-09-21
CN113419720B CN113419720B (en) 2022-01-07

Family

ID=77720356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110762787.5A Expired - Fee Related CN113419720B (en) 2021-07-06 2021-07-06 Automatic judgment method for necessity of abbreviation expansion for source code

Country Status (1)

Country Link
CN (1) CN113419720B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359797A (en) * 2022-08-18 2022-11-18 北京有竹居网络技术有限公司 Method, device, equipment and storage medium for voice recognition

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106168946A (en) * 2016-06-24 2016-11-30 中国科学院信息工程研究所 A kind of method identifying user initials phenomenon
US20170132201A1 (en) * 2014-11-12 2017-05-11 International Business Machines Corporation Contraction aware parsing system for domain-specific languages
US20180276196A1 (en) * 2017-03-27 2018-09-27 International Business Machines Corporation Domain-specific terminology extraction by boosting frequency metrics
CN108628631A (en) * 2018-05-14 2018-10-09 北京理工大学 A method of the abbreviation in parameter is extended automatically
CN108984159A (en) * 2018-06-15 2018-12-11 浙江网新恒天软件有限公司 A kind of breviary phrase extended method based on markov language model
CN110069252A (en) * 2019-04-11 2019-07-30 浙江网新恒天软件有限公司 A kind of source code file multi-service label mechanized classification method
CN110998588A (en) * 2017-08-22 2020-04-10 微软技术许可有限责任公司 Reducing text length while preserving meaning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132201A1 (en) * 2014-11-12 2017-05-11 International Business Machines Corporation Contraction aware parsing system for domain-specific languages
CN106168946A (en) * 2016-06-24 2016-11-30 中国科学院信息工程研究所 A kind of method identifying user initials phenomenon
US20180276196A1 (en) * 2017-03-27 2018-09-27 International Business Machines Corporation Domain-specific terminology extraction by boosting frequency metrics
CN110998588A (en) * 2017-08-22 2020-04-10 微软技术许可有限责任公司 Reducing text length while preserving meaning
CN108628631A (en) * 2018-05-14 2018-10-09 北京理工大学 A method of the abbreviation in parameter is extended automatically
CN108984159A (en) * 2018-06-15 2018-12-11 浙江网新恒天软件有限公司 A kind of breviary phrase extended method based on markov language model
CN110069252A (en) * 2019-04-11 2019-07-30 浙江网新恒天软件有限公司 A kind of source code file multi-service label mechanized classification method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EMILY HILL等: "AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools", 《MSR "08: PROCEEDINGS OF THE 2008 INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES》 *
YANJIE JIANG等: "Automated Expansion of Abbreviations Based", 《 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING》 *
沈筱彦: "Web信息检索若干关联挖掘问题的研究", 《中国博士学位论文全文数据库 (信息科技辑)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359797A (en) * 2022-08-18 2022-11-18 北京有竹居网络技术有限公司 Method, device, equipment and storage medium for voice recognition

Also Published As

Publication number Publication date
CN113419720B (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN106649783B (en) Synonym mining method and device
EP3819785A1 (en) Feature word determining method, apparatus, and server
CN107145516B (en) Text clustering method and system
WO2021051864A1 (en) Dictionary expansion method and apparatus, electronic device and storage medium
WO2022121163A1 (en) User behavior tendency identification method, apparatus, and device, and storage medium
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN107193915A (en) A kind of company information sorting technique and device
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN111666350A (en) Method for extracting medical text relation based on BERT model
CN112633000A (en) Method and device for associating entities in text, electronic equipment and storage medium
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
JPH0816620A (en) Data sorting device/method, data sorting tree generation device/method, derivative extraction device/method, thesaurus construction device/method, and data processing system
Jiang et al. Which abbreviations should be expanded?
CN107341142B (en) Enterprise relation calculation method and system based on keyword extraction and analysis
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
CN104572632B (en) A kind of method in the translation direction for determining the vocabulary with proper name translation
CN113419720B (en) Automatic judgment method for necessity of abbreviation expansion for source code
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN112395854B (en) Standard element consistency inspection method
CN108462624A (en) A kind of recognition methods of spam, device and electronic equipment
Putra et al. Document Classification using Naïve Bayes for Indonesian Translation of the Quran
CN116166789A (en) Method naming accurate recommendation and examination method
JPH06282587A (en) Automatic classifying method and device for document and dictionary preparing method and device for classification
CN114610576A (en) Log generation monitoring method and device
CN110069780B (en) Specific field text-based emotion word recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220107

CF01 Termination of patent right due to non-payment of annual fee