CN113419720A - Automatic judgment method for necessity of abbreviation expansion for source code - Google Patents
Automatic judgment method for necessity of abbreviation expansion for source code Download PDFInfo
- Publication number
- CN113419720A CN113419720A CN202110762787.5A CN202110762787A CN113419720A CN 113419720 A CN113419720 A CN 113419720A CN 202110762787 A CN202110762787 A CN 202110762787A CN 113419720 A CN113419720 A CN 113419720A
- Authority
- CN
- China
- Prior art keywords
- abbreviations
- abbreviation
- expanded
- abb
- source code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000009826 distribution Methods 0.000 claims abstract description 10
- 238000007418 data mining Methods 0.000 claims abstract description 7
- 230000002093 peripheral effect Effects 0.000 claims description 7
- 230000000717 retained effect Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 claims 2
- 238000012423 maintenance Methods 0.000 abstract description 4
- 238000012360 testing method Methods 0.000 abstract description 4
- 230000007423 decrease Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/77—Software metrics
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method for automatically judging the expansion necessity of an abbreviation oriented to a source code, belonging to the technical field of computer software quality maintenance. First, from a corpus of source code, common abbreviations that are frequently used by various developers in similar contexts are collected using data mining techniques. A given abbreviation is not expanded if it matches at least one common abbreviation found. In the same corpus, probability distributions for different types of identifier lengths are calculated. An abbreviation is not expanded if its full term is contained in its surrounding context, i.e. in the same source code line. Other abbreviations that do not determine whether or not to expand from a given method are expected to be replaced by their full names. For a given test item, most of the abbreviations may be classified correctly, with only a very small proportion of the abbreviations classified incorrectly. It is very accurate in selecting abbreviations that do not need to be expanded, with accuracy as high as 98% and recall as high as 96%.
Description
Technical Field
The invention relates to a method for automatically judging whether an abbreviation needs to be expanded into a complete term, and belongs to the technical field of computer software quality maintenance.
Background
In software source code, they account for a large portion (70%) of the source code in terms of identifiers. These identifiers, which are composed of natural language terms, become a major source of software understanding. Meaningful identifiers are very helpful in understanding the source code, and therefore, qualified identifiers are particularly important.
Abbreviations are widely used for abbreviated identifiers. The skilled person often replaces a series of terms in an identifier with a short abbreviation. For example, "e" is often used to denote "exception," XMLParser "is used to denote" extensibilemarkuplangugageparser, "and so on. The appropriate abbreviations can greatly facilitate typing, typesetting, and reading lengthy source codes.
However, abbreviations can also significantly reduce the readability and maintainability of the software source code if improperly used. For example, the acronyms "s" (for "students") and "ds" (for "data sequence") are good examples of inappropriately used acronyms. In addition to code authors, it may be difficult for other technicians to ascertain the exact meaning of these abbreviations, which may lead to misunderstanding and improper use of software programs.
To this end, the prior art has proposed automated methods to provide complete terminology for a given abbreviation. For example, a software developer may replace abbreviations with full terms with these tools by renaming. However, there has not been an automated process for automatically determining whether an abbreviation needs to be expanded, i.e., whether the abbreviation should be replaced with a corresponding full term. Making such decisions is often challenging for inexperienced software developers and maintenance personnel, as the decisions have no quantitative guidance and are completely dependent on the experience and intuition of the developer.
Therefore, it is important to find a method that can automatically determine whether an abbreviation needs to be expanded by a software development tool, a software quality maintenance tool, or the like.
Disclosure of Invention
The invention aims to solve the technical problem of automatically judging whether an abbreviation needs to be expanded or not in the process of developing and maintaining computer software, namely, automatically judging whether the abbreviation should be replaced by a corresponding complete term or not.
The rationale for the method of the present invention is that an abbreviation should not be expanded if expansion of the abbreviation would result in a lengthy designator, or if a developer/maintainer could easily find the meaning (i.e., the full term) of the abbreviation based on their domain knowledge or the context of the abbreviation.
According to the basic principle, the invention provides a series of heuristic methods for selecting abbreviations which do not need to be expanded. First, from a corpus of source code, common abbreviations that are frequently used by various developers in similar contexts are collected using data mining techniques. The key to data mining is to transform the mining problem of common acronyms into the biggest clique problem that has been widely studied. A given abbreviation is not expanded if it matches at least one common abbreviation found. In the same corpus, probability distributions of different types of identifier (e.g., variable names and method names) lengths are computed. The probability distribution specifies the likelihood that a T-type identifier consists of exactly n characters. The heuristic method is as follows: an abbreviation is not expanded if the probability of its peripheral identifier is reduced by the expansion of the abbreviation. Finally, the method proposes that an abbreviation is not expanded if its complete term is contained in its surrounding context, i.e. in the same source code line. Other abbreviations that do not determine whether or not to expand from a given method are expected to be replaced with their full names.
Advantageous effects
The method firstly provides an automatic method for judging whether the abbreviation in the source code needs to be expanded, and the heuristic method focuses on different aspects of the abbreviation, namely length, popularity and context. Has the following beneficial effects:
1. for a given test item, most (95%) abbreviations may be classified correctly, with only 5% of abbreviations classified incorrectly;
2. the method is very accurate in selecting abbreviations which do not need to be expanded, the accuracy rate is as high as 98%, and the recall rate is as high as 96%;
drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Detailed Description
The invention is further illustrated and described in detail below with reference to the figures and examples.
The present invention is realized by the following technical means.
As shown in fig. 1, a method for automatically determining the necessity of expanding an abbreviation oriented to source code includes an offline mining phase and an online classification phase. In the offline mining phase, learning from a corpus of source code finds the distribution probability of identifier lengths and common abbreviations. In the online classification phase, a series of heuristic-based filtering methods are employed to determine whether a given abbreviation needs to be expanded.
The method specifically comprises the following steps:
step 1: the identifier length is analyzed.
First, all identifiers (i.e., names of software entities) are extracted from a corpus of software source code and classified according to the type of entity (e.g., variable name, method name, class name, etc.). For each type of identifier, a probability distribution of its length is calculated.
In this step, the generated identifiers are classified into 5 types including variable names, parameter names, method names, class names, and field names, for the extracted identifiers, regardless of whether they contain abbreviations or not. For each type of identifier, a probability distribution P of its length is calculated, the probability distribution representing the probability that an identifier of type T is composed of exactly n characters, denoted P (T, n).
Step 2: abbreviations in which the size of the maximum clique (i.e. the number of vertices) is not less than a predefined threshold β are extracted from the source code corpus.
The same-vocabulary abbreviations are presented in a graph wherever they are extracted, each abbreviation being represented as a node, and the weight of an edge representing the contextual similarity between the same-vocabulary abbreviations. If there is a large subgraph with one edge connected for each node pair, the subgraph (called a very big clique) represents a common abbreviation, widely used in similar contexts.
In this step, abbreviations are identified and extracted from the source code corpus using graph-based data mining techniques. Each resulting abbreviation is represented as a tuple. The tuple includes text and context of the abbreviation, wherein the context consists of the designator. The identifier was vectorized using the Paragraph2Vector algorithm. One significant advantage of the Paragraph2Vector algorithm is that semantically related texts can generate similar vectors, so that the similarity of the resulting vectors obtained after vectorization of the Paragraph2Vector can be used for representing the similarity of the original contexts.
For each group of abbreviations, an undirected graph G is constructed, where nodes represent the abbreviations in the group and weights for edges represent contextual similarity between the abbreviations. To identify common abbreviations that are often used by different developers in similar contexts, the problem is translated into a widely studied tremendous group of problems. To mine common abbreviations used in highly similar contexts, the edges of two vertices are deleted if the contextual similarity of the two abbreviations (represented by the two vertices) is less than a predefined threshold α. The largest clique of the result graph represents a popular abbreviation, often used in a highly similar context, and the size of the clique indicates the popularity of the abbreviation. To remove less popular abbreviations, only the abbreviations for which the size of the maximum clique (i.e., the number of vertices) is not less than the predefined threshold β size are retained.
And step 3: in step 1, for each type of identifier, a probability distribution of its length is calculated.
For a given abbreviation in an identifier id of type T, replacing the abbreviation with its full term will increase the length of id from k characters to j characters, k, j representing the number, if P (T, j) < P (T, k), the abbreviation is not expanded. The heuristic principle is that because of the expansion of the acronyms, the length of the identifier becomes less acceptable (i.e., less popular) and therefore, it is better not to perform the expansion. Otherwise, other heuristics will be applied to the abbreviation to obtain a final decision.
And 4, step 4: in step 2, abbreviations in the source code corpus are identified by data mining techniques, resulting in a set of maximal cliques. The method comprises the following specific steps:
search contains acronyms abbiAll abbreviations in the project of (a), and calculating and abbiThe number of lexically identical abbreviations. If this number is greater than the threshold γ, the abbreviation is not expanded. If there is a lexically identical maximum clique as the abb abbreviation and the average contextual similarity between nodes within the clique and abb is greater than the threshold β, the abbreviation is not expanded.
And 5: if abbreviation abb and its full term appear on the same line of source code that defines the peripheral identifier, it is not expanded;
in step 5, in order to identify these abbreviations that are easy to interpret by context, the following method is employed:
step A: first, for the abbreviation abbiThe entire line of source code is extracted, where its peripheral identifier is defined as the context of an abbreviation, denoted CTX (abb)i);
And B: the context CTX (abb) is then appliedi) The space, capital letters and special characters (e.g., "(" and ")") are broken down into a sequence of tokens, and the resulting sequence token is Seq (abb)i);
And C: let the full name of the abbreviation be < omega1,...,ωnIf Seq (abb)i) Where all words in the text have equivalent designations, abbreviations are not expanded. Two words are equivalent if they are the same or share the same root. For example, "threads" and "threads" are not identical, but they share the same root ("thread"), and thus they are considered equivalent.
Through steps A, B and C, it can be determined efficiently whether the abbreviation and its full term appear on the same line of source code that defines the peripheral identifier.
Step 6: if none of the abbreviations have been expanded in the previous step, the final expansion abb is performed.
Thus, through steps 1 through 6, an automated method of determining whether an abbreviation needs to be expanded is completed.
Examples
This embodiment details the steps and effects of the method for determining whether the abbreviation needs to be expanded in the project, which is specifically implemented under the open source projects with 5 different topics.
The open source software shown in table 2 was tested in a hardware environment as shown in table 1.
Table 1: hardware environment configuration information table
Hardware environment configuration | Processor model | Memory device | Operating system |
Test environment | 3.4GHz Core i7-6700 | 16G | 64-bit Windows 10 |
Table 2: basic information table of open source software
Table 3: number of abbreviations in a project that need to be expanded versus unexpanded
Name of item | Number of samples | Positive sample | Negative sample | Ratio of positive samples |
346 | 71 | 275 | 21% | |
Doc | 346 | 88 | 258 | 25% |
DrJ | 378 | 63 | 315 | 17% |
Dubbo | 377 | 52 | 325 | 14% |
jEdit | 371 | 75 | 296 | 20% |
TOTAL | 1,818 | 349 | 1,469 | 19% |
For the open source items shown in Table 2, abbreviations are sampled from each item and a manual decision is made as to which ones need expansion.
The size of the sample is determined by the number of abbreviations in the test item. The minimum size of the sample was calculated using a sample size calculator with an error of 5% and a confidence level of 95%. Where all samples are drawn randomly. For each of the 1818 abbreviations that are generated, a human determines whether the abbreviation needs to be expanded. The resulting data set (called golden set) will be used as a benchmark in later evaluations. The example abbreviations in golden set fall into two categories. The first type is a positive sample, consisting of abbreviations that need to be expanded. Others belong to a second class, called negative examples;
for the golden set obtained, the method is applied thereto, and the generated result is compared with a manual decision. A suggestion for a given abbreviation is correct if and only if the generated suggestion is the same as the manual decision for the same abbreviation. Calculating classification indexes, namely accuracy, precision and recall rate;
three thresholds are used, α, β, γ, respectively. α is the minimum contextual similarity between abbreviations on the same blob. Beta represents the minimum size of the largest cluster of common abbreviations. γ represents the minimum time a domain-specific general abbreviation should appear in a single item (referred to as the least popular domain abbreviation). By changing the size of the threshold and repeating the previous evaluation. It should be noted that the value of a single threshold is changed at a time to explicitly reveal the effect of each threshold;
specifically, 1818 abbreviations in five open source projects are selected for the experiment; first, for a given abbreviation, a manual determination is made as to whether the abbreviation needs to be expanded, requiring 3 developers to manually decide whether they should be expanded. All three developers had more than three years of Java experience, requiring them to expand abbreviations and make independent decisions. If there are places of disagreement, they discuss agreement together. Then, the method of the present invention was applied to these abbreviations, the obtained results were compared with the results of manual judgment, and the performance of the method was evaluated, and the obtained evaluation results are shown in table 4. From this table it can be seen that most (95%) abbreviations can be classified correctly, with only 5% of them classified incorrectly. Secondly, the method is very accurate in selecting abbreviations that do not need to be expanded. In the search of the negative abbreviations, the method has the accuracy rate of 98 percent and the recall rate of 96 percent. The performance of items on different topics varies slightly. For example, its minimum and maximum precisions are 93% (on DecFetcher) and 96% (on DavMail and DrJava), respectively, indicating that the method is accurate.
Table 4: method performance
The negative sample precision in table 4 ═ number of true negative samples/(number of true negative samples + number of false negative samples);
negative sample recall in table 4 ═ number of true negative samples/(number of true negative samples + number of false positive samples);
the negative sample precision in table 4 ═ number of true positive samples/(number of true positive samples + number of false positive samples);
the negative sample recall in table 4 is true positive sample number/(true positive sample number + false negative sample number).
By varying the size of the threshold, it has been found that any such decrease in threshold results in a decrease in recall when positive samples are retrieved and an increase in recall when negative samples are retrieved. An increase in either threshold results in an increase in the accuracy of the negative sample search and a decrease in the accuracy of the positive sample search. Maximum accuracy can be produced at the default values of the threshold (α -0.4, β -15, γ -25), while decreasing or increasing the threshold decreases the accuracy of the method.
The results of the evaluation of 1818 abbreviations from 5 open source applications show that the method is accurate with up to 95% accuracy.
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.
Claims (2)
1. A method for automatically judging the expansion necessity of an abbreviation oriented to source codes is characterized by comprising the following steps:
step 1: analyzing the length of the identifier;
firstly, extracting all identifiers from a corpus of software source codes, and classifying the identifiers according to the types of entities, wherein the types comprise variable names, parameter names, method names, class names and field names; for each type of identifier, calculating a probability distribution of its length;
step 2: extracting abbreviations of which the size of the maximum cluster is not smaller than a predefined threshold value beta from a source code corpus;
and step 3: in step 1, for each type of identifier, a probability distribution of its length is calculated;
for a given abbreviation in an identifier id of type T, replacing the abbreviation with its full term will increase the length of id from k characters to j characters, k, j representing the number, if P (T, j) < P (T, k), the abbreviation is not expanded;
and 4, step 4: in step 2, identifying abbreviations in a source code corpus by a data mining technology, thereby generating a set of maximum cliques;
search contains acronyms abbiAll abbreviations in the project of (a), and calculating and abbiThe number of lexically identical abbreviations; if this number is greater than the threshold γ, the abbreviation is not expanded; if there is a lexically identical maximum clique as the abb abbreviation and the average contextual similarity between nodes within the clique and abb is greater than the threshold β, the abbreviation is not expanded;
and 5: if abbreviation abb and its full term appear on the same line of source code that defines the peripheral identifier, it is not expanded;
step A: first, for the abbreviation abbiThe entire line of source code is extracted, where its peripheral identifier is defined as the context of an abbreviation, denoted CTX (abb)i);
And B: the context CTX (abb) is then appliedi) The sequence tag is Seq (abb) by decomposing space, capital letters and special characters into a tag sequencei);
And C: let the full name of the abbreviation be < omega1,...,ωnIf Seq (abb)i) If all the words in the Chinese language have equivalent marks, the abbreviations are not expanded; two words are equivalent if they are the same or share the same root;
through steps A, B and C, it can be determined efficiently whether the abbreviation and its full term appear on the same line of source code that defines the peripheral identifier;
step 6: if none of the abbreviations have been expanded in the previous step, the abbreviations are eventually expanded abb.
2. The method for automatically determining the necessity of expanding abbreviations oriented to source codes as claimed in claim 1, wherein in step 2, the abbreviations are recognized and extracted from the source code corpus by using a graph-based data mining technology, and each obtained abbreviation is represented as a tuple, wherein the tuple comprises texts and contexts of the abbreviations; wherein, the context consists of an identifier, and the identifier is vectorized by using a Paragraph2Vector algorithm;
for each group of abbreviations, constructing an undirected graph G, wherein nodes represent the abbreviations in the group, and weights of edges represent the context similarity between the abbreviations;
deleting the edges of the two vertices if the context similarity of the two abbreviations is less than a predefined threshold α; the maximum clique of the result graph represents a popular abbreviation, and the size of the clique represents the popularity of the abbreviation; only abbreviations for which the size of the maximum clique is not smaller than a predefined threshold β size are retained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110762787.5A CN113419720B (en) | 2021-07-06 | 2021-07-06 | Automatic judgment method for necessity of abbreviation expansion for source code |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110762787.5A CN113419720B (en) | 2021-07-06 | 2021-07-06 | Automatic judgment method for necessity of abbreviation expansion for source code |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113419720A true CN113419720A (en) | 2021-09-21 |
CN113419720B CN113419720B (en) | 2022-01-07 |
Family
ID=77720356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110762787.5A Expired - Fee Related CN113419720B (en) | 2021-07-06 | 2021-07-06 | Automatic judgment method for necessity of abbreviation expansion for source code |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113419720B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115359797A (en) * | 2022-08-18 | 2022-11-18 | 北京有竹居网络技术有限公司 | Method, device, equipment and storage medium for voice recognition |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106168946A (en) * | 2016-06-24 | 2016-11-30 | 中国科学院信息工程研究所 | A kind of method identifying user initials phenomenon |
US20170132201A1 (en) * | 2014-11-12 | 2017-05-11 | International Business Machines Corporation | Contraction aware parsing system for domain-specific languages |
US20180276196A1 (en) * | 2017-03-27 | 2018-09-27 | International Business Machines Corporation | Domain-specific terminology extraction by boosting frequency metrics |
CN108628631A (en) * | 2018-05-14 | 2018-10-09 | 北京理工大学 | A method of the abbreviation in parameter is extended automatically |
CN108984159A (en) * | 2018-06-15 | 2018-12-11 | 浙江网新恒天软件有限公司 | A kind of breviary phrase extended method based on markov language model |
CN110069252A (en) * | 2019-04-11 | 2019-07-30 | 浙江网新恒天软件有限公司 | A kind of source code file multi-service label mechanized classification method |
CN110998588A (en) * | 2017-08-22 | 2020-04-10 | 微软技术许可有限责任公司 | Reducing text length while preserving meaning |
-
2021
- 2021-07-06 CN CN202110762787.5A patent/CN113419720B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170132201A1 (en) * | 2014-11-12 | 2017-05-11 | International Business Machines Corporation | Contraction aware parsing system for domain-specific languages |
CN106168946A (en) * | 2016-06-24 | 2016-11-30 | 中国科学院信息工程研究所 | A kind of method identifying user initials phenomenon |
US20180276196A1 (en) * | 2017-03-27 | 2018-09-27 | International Business Machines Corporation | Domain-specific terminology extraction by boosting frequency metrics |
CN110998588A (en) * | 2017-08-22 | 2020-04-10 | 微软技术许可有限责任公司 | Reducing text length while preserving meaning |
CN108628631A (en) * | 2018-05-14 | 2018-10-09 | 北京理工大学 | A method of the abbreviation in parameter is extended automatically |
CN108984159A (en) * | 2018-06-15 | 2018-12-11 | 浙江网新恒天软件有限公司 | A kind of breviary phrase extended method based on markov language model |
CN110069252A (en) * | 2019-04-11 | 2019-07-30 | 浙江网新恒天软件有限公司 | A kind of source code file multi-service label mechanized classification method |
Non-Patent Citations (3)
Title |
---|
EMILY HILL等: "AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools", 《MSR "08: PROCEEDINGS OF THE 2008 INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES》 * |
YANJIE JIANG等: "Automated Expansion of Abbreviations Based", 《 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING》 * |
沈筱彦: "Web信息检索若干关联挖掘问题的研究", 《中国博士学位论文全文数据库 (信息科技辑)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115359797A (en) * | 2022-08-18 | 2022-11-18 | 北京有竹居网络技术有限公司 | Method, device, equipment and storage medium for voice recognition |
Also Published As
Publication number | Publication date |
---|---|
CN113419720B (en) | 2022-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649783B (en) | Synonym mining method and device | |
EP3819785A1 (en) | Feature word determining method, apparatus, and server | |
CN107145516B (en) | Text clustering method and system | |
WO2021051864A1 (en) | Dictionary expansion method and apparatus, electronic device and storage medium | |
WO2022121163A1 (en) | User behavior tendency identification method, apparatus, and device, and storage medium | |
CN107688630B (en) | Semantic-based weakly supervised microbo multi-emotion dictionary expansion method | |
CN107193915A (en) | A kind of company information sorting technique and device | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
CN111666350A (en) | Method for extracting medical text relation based on BERT model | |
CN112633000A (en) | Method and device for associating entities in text, electronic equipment and storage medium | |
CN111680506A (en) | External key mapping method and device of database table, electronic equipment and storage medium | |
JPH0816620A (en) | Data sorting device/method, data sorting tree generation device/method, derivative extraction device/method, thesaurus construction device/method, and data processing system | |
Jiang et al. | Which abbreviations should be expanded? | |
CN107341142B (en) | Enterprise relation calculation method and system based on keyword extraction and analysis | |
CN114491062B (en) | Short text classification method integrating knowledge graph and topic model | |
CN104572632B (en) | A kind of method in the translation direction for determining the vocabulary with proper name translation | |
CN113419720B (en) | Automatic judgment method for necessity of abbreviation expansion for source code | |
CN115146062A (en) | Intelligent event analysis method and system fusing expert recommendation and text clustering | |
CN112395854B (en) | Standard element consistency inspection method | |
CN108462624A (en) | A kind of recognition methods of spam, device and electronic equipment | |
Putra et al. | Document Classification using Naïve Bayes for Indonesian Translation of the Quran | |
CN116166789A (en) | Method naming accurate recommendation and examination method | |
JPH06282587A (en) | Automatic classifying method and device for document and dictionary preparing method and device for classification | |
CN114610576A (en) | Log generation monitoring method and device | |
CN110069780B (en) | Specific field text-based emotion word recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220107 |
|
CF01 | Termination of patent right due to non-payment of annual fee |