CN111061864A - Automatic open source community Fork abstract generation method, system and medium based on feature extraction - Google Patents
Automatic open source community Fork abstract generation method, system and medium based on feature extraction Download PDFInfo
- Publication number
- CN111061864A CN111061864A CN201911338392.1A CN201911338392A CN111061864A CN 111061864 A CN111061864 A CN 111061864A CN 201911338392 A CN201911338392 A CN 201911338392A CN 111061864 A CN111061864 A CN 111061864A
- Authority
- CN
- China
- Prior art keywords
- fork
- abstract
- submitted
- data
- open source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method, a system and a medium for automatically generating open source community Fork abstract based on feature extraction, aiming at input submitted data; obtaining corresponding characteristic classification through a machine learning classification model trained in advance, and generating submission contents aiming at the submission data to obtain corresponding submission contents; classifying the characteristics of the submitted data and generating a submitted abstract according to the submitted content; the method comprises the steps of generating a natural language type Fork abstract according to submitted abstract, extracting Fork related data from open source projects based on a large amount of open source community project data aiming at the defect that the current open source community Fork information is opaque, screening and optimizing the extracted project contribution features, and automatically generating the natural language type Fork abstract through a machine learning algorithm.
Description
Technical Field
The invention relates to the field of open source software development, in particular to a method, a system and a medium for automatically generating an open source community Fork abstract based on feature extraction, which are used for extracting project contribution features aiming at the defects of current opaque open source community Fork information based on a large amount of open source community project data and automatically generating a natural language type Fork abstract through a machine learning algorithm.
Background
In Open Source Software (OSS) development, form (repeated edition, derivation, and branching)) based development has become an important component of group development. Fork's purpose is to make a full copy of a code repository, and the Fork mechanism allows developers to copy their code repository without the author's consent. Developers are free of Fork common repositories and make changes in Fork's repositories. Fork is a method of starting a new project.
However, the rapid development of the OSS community also presents some challenges to Fork-based development. On the one hand, the rapid growth of contributors has resulted in a large number of branches and contributions, especially many popular projects, which have enriched the ecological diversity of open source communities. On the other hand, as the number of Forks increases, existing Fork visualization tools are unable to maintain a good overview of Fork information, especially for changes in individual Forks. However, the development of an open source project cannot take a large amount of Fork data as reference, and because the existing tools cannot meet the requirement of developers on the transparency of Fork information, the developers must rely on a manual method to retrieve the Fork. In addition, due to the vast differences in developer experience and habits, a large number of Forks contain incomplete annotations, unclear properties, and opaque information. These Forks may consume some time and effort from developers, making them ineffective in understanding the goals and characteristics of other developers' contribution based on Forks. Thus, the opaque Fork information and lack of suitable tools make it difficult for manual methods to effectively identify many forks and for core developers to make proper decisions.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a method, a system and a medium for automatically generating the Fork abstract of the open source community based on feature extraction.
In order to solve the technical problems, the invention adopts the technical scheme that:
an automatic open source community Fork abstract generation method based on feature extraction comprises the following implementation steps:
1) acquiring input submission data;
2) obtaining corresponding characteristic classification of the submitted data through a machine learning classification model trained in advance, and generating the submitted content aiming at the submitted data to obtain the corresponding submitted content;
3) classifying the characteristics of the submitted data and generating a submitted abstract according to the submitted content;
4) and generating a Fork abstract of a natural language form according to the submitted abstract.
Optionally, step 2) is preceded by a step of training a machine learning classification model, and the detailed steps include:
s1) data preprocessing: firstly, cleaning linked data, repeated problem data and nonstandard format data in problem data, marking the problem data containing a specified special field and stopping deleting words; then marking the rest problem data as feature label feature, problem label bug and contribution distribution classification labels;
s2) converting the preprocessed problem data into a multidimensional vector;
s3) training the machine learning classification model by the multi-dimensional vector obtained by conversion and the corresponding characteristic classification label.
Optionally, the step S2) of converting the preprocessed data into the multidimensional vector includes:
s2.1) extracting text characteristics of the preprocessed problem data to obtain a word frequency counting matrix of words in the data;
s2.2) evaluating the weight of each word in the word frequency counting matrix by adopting a word frequency statistical method TF-IDF, and converting the word frequency matrix into a multidimensional vector in the form of a TF-IDF matrix by using the weight.
Optionally, the machine learning classification model is a random forest based machine learning classification model.
Optionally, the step 2) of generating the submission content for the submission data to obtain the corresponding submission content specifically means that the submission data is generated into the corresponding submission content by using an extraction keyword algorithm.
Optionally, the generating of the submission summary according to the classification of the submission features and the generated submission content in step 3) specifically includes generating the classification containing the submission features and the generated submission summary of the submission content by using a specified template, where the specified template includes the following information: @ commit represents the ith commit in Fork; @ author represents the submitter; @ feature is the classification of the obtained submission features, and comprises three characteristic classification tags of problem tag feature, no problem tag bug and contribution constraint; @ content is the resulting submission; @ status is status information extracted from the submission; @ change is change information extracted from a submission.
Optionally, the detailed steps of step 4) include:
3.1) scattering, classifying and re-counting a plurality of submitted summary data to respectively obtain the content and the quantity of submitted summaries corresponding to the classification labels with three characteristics of problem label feature, no problem label bug and contribution distribution;
and 3.2) putting submitted summaries corresponding to the classification labels with the three characteristics of feature label feature, problem label bug and contribution constraint obtained according to a preset rule at corresponding positions of the Fork summary template and obtaining a final Fork summary.
In addition, the invention also provides an open source community Fork abstract automatic generation system based on feature extraction, which comprises the following steps:
an input program unit for acquiring input submission data;
the input processing program unit is used for obtaining corresponding characteristic classification of the submitted data through a machine learning classification model trained in advance, and generating the submitted content aiming at the submitted data to obtain the corresponding submitted content;
the submitted abstract generating program unit is used for classifying the characteristics of the submitted data and generating the submitted abstract according to the submitted content;
and the Fork abstract generating program unit is used for generating a Fork abstract of a natural language form according to the submitted abstract.
In addition, the invention also provides an open source community Fork abstract automatic generation system based on feature extraction, which comprises a computer device, wherein the computer device is programmed or configured to execute the steps of the open source community Fork abstract automatic generation method based on feature extraction, or a computer program which is programmed or configured to execute the open source community Fork abstract automatic generation method based on feature extraction is stored on a memory of the computer device.
In addition, the invention also provides a computer readable storage medium, which stores a computer program programmed or configured to execute the method for automatically generating the Fork abstract of the open source community based on the feature extraction.
Compared with the prior art, the invention has the following advantages: the present invention addresses the input submitted data; obtaining corresponding characteristic classification of the submitted data through a machine learning classification model trained in advance, and generating the submitted content aiming at the submitted data to obtain the corresponding submitted content; classifying the characteristics of the submitted data and generating a submitted abstract according to the submitted content; and generating a natural language type Fork abstract according to the submitted abstract, thereby extracting data related to Fork from the open source community project, screening and optimizing the extracted project contribution characteristics based on a large amount of open source community project data aiming at the defect that the current open source community Fork information is opaque, and automatically generating the natural language type Fork abstract through a machine learning algorithm.
Drawings
FIG. 1 is a schematic diagram of the basic principle of the method according to the embodiment of the present invention.
FIG. 2 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a template of submission content generated in the embodiment of the present invention.
FIG. 4 is a schematic flow chart of step 4) in the embodiment of the present invention.
FIG. 5 is a diagram of a Fork abstract template in an embodiment of the present invention.
FIG. 6 is a diagram of rules for matching different default data according to an embodiment of the present invention.
FIG. 7 shows the results of the submitted data classification accuracy and Fork digest accuracy tests in the embodiment of the present invention.
Detailed Description
The method, system and medium for automatically generating the open source community Fork abstract based on feature extraction according to the present invention will be further described in detail below by taking python programming language as an example. Needless to say, on the basis, a person skilled in the art can also transplant the embodiment to other programming languages, and also can implement the method, the system and the medium for automatically generating the open source community Fork abstract based on feature extraction.
As shown in fig. 1 and fig. 2, the implementation steps of the method for automatically generating the open source community Fork abstract based on feature extraction in this embodiment include:
1) acquiring input submission data;
2) obtaining corresponding characteristic classification of the submitted data through a machine learning classification model trained in advance, and generating the submitted content aiming at the submitted data to obtain the corresponding submitted content;
3) classifying the characteristics of the submitted data and generating a submitted abstract according to the submitted content;
4) and generating a Fork abstract of a natural language form according to the submitted abstract.
Training a machine learning classification model may use the characteristic relationships between data with characteristic classification labels in the GitHub project and the input submission data to classify the submission characteristics. Thus, the problem data may be pre-processed as input to a trained machine learning classification model to train the trained machine learning classification model. Finally, the submitted data input by the user is input to a trained machine learning classification model to predict classes for which the characteristics are classified. In this embodiment, step 2) is preceded by a step of training a machine learning classification model, and the detailed steps include:
s1) data preprocessing: firstly, cleaning linked data, repeated problem data and nonstandard format data in problem data (issue), marking the problem data containing a specified special field and stopping deleting words; then marking the rest problem data as problem label feature, no problem label bug and contribution distribution classification labels; in this embodiment, three types of tags, "feature", "bug", and "distribution" are respectively adopted for the classification tags with three characteristics, namely, problem tag feature, no problem tag bug, and contribution distribution. Finally, common stop words (e.g., "the" and "a") will be re-moved, which occur frequently with little effect on distinguishing between different documents.
S2) converting the preprocessed problem data into a multidimensional vector;
s3) training the machine learning classification model by the multi-dimensional vector obtained by conversion and the corresponding characteristic classification label.
In this embodiment, the step S2) of converting the preprocessed data into the multidimensional vector includes:
s2.1) extracting text characteristics of the preprocessed problem data to obtain a word frequency counting matrix of words in the data;
the text feature extraction can be performed by using a known text feature extraction algorithm as required, for example, in this embodiment, a countvectorer model is used to convert words in a text into a word frequency count matrix, for example, a matrix containing element text [ i ] [ j ], which represents the word frequency of j words under a type i text;
s2.2) evaluating the weight of each word in the word Frequency counting matrix by adopting a word Frequency statistical method TF-IDF (Term Frequency-reverse Document Frequency), converting the weight into a multi-dimensional vector in a TF-IDF matrix form, and converting the counting matrix processed by the CountVectorizer into a standardized TF-IDF matrix
The data applicable to the invention is mostly in text form and short in data length. According to the characteristics of data, the machine learning classification model in the embodiment is a machine learning classification model based on random forest (RandomForest), and the experimental effect is corrected. As an optional implementation manner, in this embodiment, modules such as vectorization, acquisition coefficients, machine learning classification model training, and the like are integrated into a whole by using a pipeline technology, and are repeatedly executed in the process of circularly debugging parameters, so as to finally form a completed classification model, which can automatically classify according to input submitted data.
In this embodiment, the step 2) of generating the submission content for the submission data to obtain the corresponding submission content specifically means that the submission data is generated into the corresponding submission content by using a keyword extraction algorithm. As an optional implementation manner, the algorithm for extracting keywords is a TextRank algorithm in this embodiment, and in addition, other well-known algorithms for extracting keywords may also be used.
In this embodiment, the generating of the submission summary according to the classification of the submission features and the generated submission content in step 3) specifically refers to generating the classification containing the submission features and the generated submission summary of the submission content by using a specified template, where the specified template includes the following information: @ commit represents the ith commit in Fork; @ author represents the submitter; @ feature is the classification of the obtained submission features, and comprises three feature classification tags, namely feature tag feature, question tag bug and contribution tag constraint; @ content is the resulting submission; @ status is status information extracted from the submission; @ change is change information extracted from a submission. As an alternative embodiment, the form of the template in this embodiment is shown in fig. 3.
As shown in fig. 4, the detailed steps of step 4) of this embodiment include:
3.1) scattering, classifying and re-counting a plurality of submitted abstract data to respectively obtain the content and the quantity of submitted abstract corresponding to three characteristic classification labels of characteristic label feature, problem label bug and contribution distribution;
and 3.2) putting submitted summaries corresponding to the classification labels with the three characteristics of feature label feature, problem label bug and contribution constraint obtained according to a preset rule at corresponding positions of the Fork summary template and obtaining a final Fork summary.
The form abstract Template in this embodiment is shown in fig. 5, and includes two sub-modules, namely, Template1 and Template2, the sub-module Template1 shows the structure and elements of the final desired final result form summary, and the sub-module Template2 shows how the content of form is formed. According to the investigation of open source community developers, people generally pay attention to whether the fork abstract can accurately express fork information, important data are not omitted, and the change and contribution characteristics of each submission node can be highlighted.
In sub-module Template 1:
@ fork _ summary is the final desired end result;
@ b _ commit and @ e _ commit indicate the start commit data and end commit data selected by the user. For convenience, this embodiment typically uses the last four digits of the sha validation code of the commit data to represent the address of the commit data.
@ fork _ name is the name of fork obtained from input data in the present embodiment;
@ fork _ content is a specific content description of fork generated by the present embodiment.
In sub-module Template 2:
k is a combination of three elements feature, bug and constraint. The variables @ numk and @ content correspond to the number and content of each k condition, which is the data obtained in the previous statistical process.
@ feature is a property class of submission.
@ feature _ content is the content of each property;
@ fork _ content is the sum of all properties.
In general, Template2 shows the detailed work done by fork on a particular property.
To solve various error conditions of generating the Fork summary, in this embodiment, in consideration of the situations that the Fork category is null, the Fork feature is null, the feature of submitted data is repeated, and the like, the following rules are constructed to match different default data, so as to ensure the natural language fluency of the final result Fork summary that is finally desired, as shown in fig. 6, where:
rule1 indicates:
if Fork class @ num k0, then the sum of all properties @ fork _ content is null;
rule2 indicates:
if Fork characteristic @ contentkIf null, then the sum of all properties @ fork _ content is null;
rule3 indicates:
if the data is committedb_commitAnd submitting the datae_commitFeature repetition (characters between 4 th bit and 1 st bit are the same), interceptinge_commitIs simultaneously assigned to the submitted datab_commitAnd submitting the datae_commit;
Rule4 indicates:
if Fork class @ numkIf the sum of (1) is 0, the final desired final result for the query is the generated string of "pair-missing, non-contributing".
In order to further verify the automatic generation method of the open source community Fork abstract based on feature extraction in the embodiment, 30 sets of manual tests and questionnaire tests are performed in the embodiment, and example tests of 17 developers in the GitHub are performed, so that the classification accuracy of submitted data and the Fork abstract accuracy are shown in table 1 and fig. 7.
Table 1: submitting a data classification accuracy and a Fork abstract accuracy table.
Label | Precision | Recall | F1-score | support |
Contribution | 0.59 | 0.79 | 0.67 | 448 |
Feature | 0.66 | 0.78 | 0.58 | 343 |
Bug | 0.64 | 0.67 | 0.72 | 200 |
In table 1, Label, Precision, Recall, F1-score, and support respectively represent the Label type, accuracy, Recall, average of accuracy and Recall, and the number of support labels, and containment, Feature, and Bug respectively represent three Feature classification labels of Contribution distribution, Feature Label Feature, and problem Label Bug. As can be known from table 1 and fig. 7, the open source community Fork abstract automatic generation method based on feature extraction in the embodiment can achieve Fork abstract generation accuracy of 0.672, and is 47% helpful for development of developers.
In summary, the method for automatically generating the open source community Fork abstract based on feature extraction can automatically generate the Fork abstract, and the method for automatically generating the open source community Fork abstract based on feature extraction can immediately output the Fork abstract through simple initialization setting by using a submission address as an input. This embodiment will use this tool to test the production summary of a project in a real OSS community.
In addition, this embodiment also provides an automatic generation system of open source community Fork abstract based on feature extraction, including:
an input program unit for acquiring input submission data;
the input processing program unit is used for obtaining corresponding characteristic classification of the submitted data through a machine learning classification model trained in advance, and generating the submitted content aiming at the submitted data to obtain the corresponding submitted content;
the submitted abstract generating program unit is used for classifying the characteristics of the submitted data and generating the submitted abstract according to the submitted content;
a Fork abstract generating program unit for generating a Fork abstract of a natural language form according to the submitted abstract
In addition, the embodiment also provides an open source community Fork abstract automatic generation system based on feature extraction, which includes a computer device, where the computer device is programmed or configured to execute the steps of the aforementioned open source community Fork abstract automatic generation method based on feature extraction, or a memory of the computer device stores a computer program that is programmed or configured to execute the aforementioned open source community Fork abstract automatic generation method based on feature extraction.
In addition, the present embodiment also provides a computer readable storage medium, which stores thereon a computer program programmed or configured to execute the aforementioned automatic generation method of the open source community Fork abstract based on feature extraction.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.
Claims (10)
1. An automatic open source community Fork abstract generation method based on feature extraction is characterized by comprising the following implementation steps:
1) acquiring input submission data;
2) obtaining corresponding characteristic classification of the submitted data through a machine learning classification model trained in advance, and generating the submitted content aiming at the submitted data to obtain the corresponding submitted content;
3) classifying the characteristics of the submitted data and generating a submitted abstract according to the submitted content;
4) and generating a Fork abstract of a natural language form according to the submitted abstract.
2. The automatic generation method of open source community Fork abstract based on feature extraction according to claim 1, wherein step 2) is preceded by a step of training a machine learning classification model, and the detailed steps include:
s1) data preprocessing: firstly, cleaning linked data, repeated problem data and nonstandard format data in problem data, marking the problem data containing a specified special field and stopping deleting words; then marking the rest problem data as problem label feature, no problem label bug and contribution distribution classification labels;
s2) converting the preprocessed problem data into a multidimensional vector;
s3) training the machine learning classification model by the multi-dimensional vector obtained by conversion and the corresponding characteristic classification label.
3. The automatic open source community Fork abstract generation method based on feature extraction as claimed in claim 2, wherein the step S2) of converting the preprocessed data into the multidimensional vector comprises the following detailed steps:
s2.1) extracting text characteristics of the preprocessed problem data to obtain a word frequency counting matrix of words in the data;
s2.2) evaluating the weight of each word in the word frequency counting matrix by adopting a word frequency statistical method TF-IDF, and converting the word frequency matrix into a multidimensional vector in the form of a TF-IDF matrix by using the weight.
4. The automatic open source community Fork abstract generation method based on feature extraction as claimed in claim 1, wherein the machine learning classification model is a machine learning classification model based on random forest.
5. The method for automatically generating the Fork abstract of the open source community based on the feature extraction as claimed in claim 1, wherein the step 2) of generating the submitted data according to the submitted content to obtain the corresponding submitted content specifically means that the submitted data is generated into the corresponding submitted content by adopting a keyword extraction algorithm.
6. The method for automatically generating the Fork abstract of the open source community based on the feature extraction according to claim 1, wherein the step 3) of generating the submission abstract according to the classification of the submission features and the generated submission contents specifically means that a specified template is adopted to generate the submission abstract containing the classification of the submission features and the generated submission contents, and the specified template includes the following information: @ commit represents the ith commit in Fork; @ author represents the submitter; @ feature is the classification of the obtained submission features, and comprises three characteristic classification tags of problem tag feature, no problem tag bug and contribution constraint; @ content is the resulting submission; @ status is status information extracted from the submission; @ change is change information extracted from a submission.
7. The method for automatically generating the open source community Fork abstract based on the feature extraction as claimed in claim 1, wherein the detailed steps of the step 4) comprise:
3.1) scattering, classifying and re-counting a plurality of submitted summary data to respectively obtain the content and the quantity of submitted summaries corresponding to the classification labels with three characteristics of problem label feature, no problem label bug and contribution distribution;
and 3.2) putting submitted summaries corresponding to the classification labels with the three characteristics of feature label feature, problem label bug and contribution constraint obtained according to a preset rule at corresponding positions of the Fork summary template and obtaining a final Fork summary.
8. An open source community Fork abstract automatic generation system based on feature extraction is characterized by comprising:
an input program unit for acquiring input submission data;
the input processing program unit is used for obtaining corresponding characteristic classification of the submitted data through a machine learning classification model trained in advance, and generating the submitted content aiming at the submitted data to obtain the corresponding submitted content;
the submitted abstract generating program unit is used for classifying the characteristics of the submitted data and generating the submitted abstract according to the submitted content;
and the Fork abstract generating program unit is used for generating a Fork abstract of a natural language form according to the submitted abstract.
9. An open source community Fork abstract automatic generation system based on feature extraction comprises computer equipment, and is characterized in that the computer equipment is programmed or configured to execute the steps of the feature extraction-based open source community Fork abstract automatic generation method of any one of claims 1 to 7, or a computer program which is programmed or configured to execute the feature extraction-based open source community Fork abstract automatic generation method of any one of claims 1 to 7 is stored in a memory of the computer equipment.
10. A computer readable storage medium, characterized in that, the computer readable storage medium stores thereon a computer program programmed or configured to execute the method for automatically generating an open source community Fork abstract based on feature extraction according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911338392.1A CN111061864B (en) | 2019-12-23 | 2019-12-23 | Automatic open source community Fork abstract generation method, system and medium based on feature extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911338392.1A CN111061864B (en) | 2019-12-23 | 2019-12-23 | Automatic open source community Fork abstract generation method, system and medium based on feature extraction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111061864A true CN111061864A (en) | 2020-04-24 |
CN111061864B CN111061864B (en) | 2022-10-18 |
Family
ID=70300836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911338392.1A Active CN111061864B (en) | 2019-12-23 | 2019-12-23 | Automatic open source community Fork abstract generation method, system and medium based on feature extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111061864B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101541170B1 (en) * | 2014-10-21 | 2015-08-03 | (주)센솔로지 | Apparatus and method for summarizing text |
CN107102986A (en) * | 2017-04-23 | 2017-08-29 | 四川用联信息技术有限公司 | Multi-threaded keyword extraction techniques in document |
CN107391542A (en) * | 2017-05-16 | 2017-11-24 | 浙江工业大学 | A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates |
CN108459874A (en) * | 2018-03-05 | 2018-08-28 | 中国人民解放军国防科技大学 | Code automatic summarization method integrating deep learning and natural language processing |
CN108563433A (en) * | 2018-03-20 | 2018-09-21 | 北京大学 | A kind of device based on LSTM auto-complete codes |
US20180373507A1 (en) * | 2016-02-03 | 2018-12-27 | Cocycles | System for generating functionality representation, indexing, searching, componentizing, and analyzing of source code in codebases and method thereof |
-
2019
- 2019-12-23 CN CN201911338392.1A patent/CN111061864B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101541170B1 (en) * | 2014-10-21 | 2015-08-03 | (주)센솔로지 | Apparatus and method for summarizing text |
US20180373507A1 (en) * | 2016-02-03 | 2018-12-27 | Cocycles | System for generating functionality representation, indexing, searching, componentizing, and analyzing of source code in codebases and method thereof |
CN107102986A (en) * | 2017-04-23 | 2017-08-29 | 四川用联信息技术有限公司 | Multi-threaded keyword extraction techniques in document |
CN107391542A (en) * | 2017-05-16 | 2017-11-24 | 浙江工业大学 | A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates |
CN108459874A (en) * | 2018-03-05 | 2018-08-28 | 中国人民解放军国防科技大学 | Code automatic summarization method integrating deep learning and natural language processing |
CN108563433A (en) * | 2018-03-20 | 2018-09-21 | 北京大学 | A kind of device based on LSTM auto-complete codes |
Also Published As
Publication number | Publication date |
---|---|
CN111061864B (en) | 2022-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108256074B (en) | Verification processing method and device, electronic equipment and storage medium | |
CN102662930B (en) | Corpus tagging method and corpus tagging device | |
De Jonge et al. | An introduction to data cleaning with R | |
CA2775879C (en) | Systems and methods for processing data | |
CN109446221B (en) | Interactive data exploration method based on semantic analysis | |
CN112163553B (en) | Material price accounting method, device, storage medium and computer equipment | |
CN105824791B (en) | A kind of bibliography format checking method | |
CN107844558A (en) | The determination method and relevant apparatus of a kind of classification information | |
CN112052396A (en) | Course matching method, system, computer equipment and storage medium | |
CN109766416A (en) | A kind of new energy policy information abstracting method and system | |
Nikiforova et al. | User-Oriented Approach to Data Quality Evaluation. | |
US20220198133A1 (en) | System and method for validating tabular summary reports | |
Sannier et al. | Legal markup generation in the large: An experience report | |
CN111061864B (en) | Automatic open source community Fork abstract generation method, system and medium based on feature extraction | |
CN114281998B (en) | Event labeling system construction method for multi-level labeling person based on crowdsourcing technology | |
CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
CN115760495A (en) | Method and device for realizing automatic labeling of legal cases | |
CN114118098A (en) | Contract review method, equipment and storage medium based on element extraction | |
CN114860873A (en) | Method, device and storage medium for generating text abstract | |
CN110414819B (en) | Work order scoring method | |
CN115481240A (en) | Data asset quality detection method and detection device | |
Smirnova et al. | Evaluation of embedding models for automatic extraction and classification of acknowledged entities in scientific documents | |
Cholissodin et al. | Audit system development for government institution documents using stream deep learning to support smart governance | |
CN113722421A (en) | Contract auditing method and system and computer readable storage medium | |
CN114492419B (en) | Text labeling method, system and device based on newly added key words in labeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |