CN113743096A - Crowdsourcing test report similarity detection method based on natural language processing - Google Patents

Crowdsourcing test report similarity detection method based on natural language processing Download PDF

Info

Publication number
CN113743096A
CN113743096A CN202010487202.9A CN202010487202A CN113743096A CN 113743096 A CN113743096 A CN 113743096A CN 202010487202 A CN202010487202 A CN 202010487202A CN 113743096 A CN113743096 A CN 113743096A
Authority
CN
China
Prior art keywords
similarity
word
crowdsourcing
report
reports
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010487202.9A
Other languages
Chinese (zh)
Inventor
房春荣
曹振飞
王旭
虞圣呈
恽叶霄
李彤宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010487202.9A priority Critical patent/CN113743096A/en
Publication of CN113743096A publication Critical patent/CN113743096A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

A method for detecting similarity of crowdsourcing test reports based on natural language processing comprises the steps of detecting similarity of complex test reports submitted by crowdsourcing workers by adopting a natural language processing technology, wherein the function is to carry out preprocessing of Chinese Word segmentation, stop Word removal and the like on the crowdsourcing reports, representing sentences represented by Word groups after preprocessing into Word vectors by using a Word2Vec technology, selecting a cosine similarity measurement mode to calculate distances among the Word vectors, training by adopting a semantic model trained according to a large amount of previous crowdsourcing report data, taking the Word vectors as input of K-Means cluster analysis, carrying out cluster analysis on the Word vectors, classifying similar reports into the same class according to a set similarity threshold, and accurately measuring the similarity between the crowdsourcing test reports.

Description

Crowdsourcing test report similarity detection method based on natural language processing
Technical Field
The invention belongs to the field of software engineering, and relates to application of natural language processing in the field of software engineering, which is used for detecting code similarity.
Background
Similar crowdsourcing test report detection is a key technology for improving the utilization rate of the crowdsourcing report and reducing the workload of reading repeated reports by testers. And the crowdsourcing test report is a result fed back to a tester after crowdsourcing workers complete a task formulated by an initiator, and the tester guides the reproduction and positioning of the Bug according to the crowdsourcing report. If a large number of duplicate content descriptions in the numerous test reports describe the same Bug content, the tester cannot know in advance whether the Bug described in the report is mentioned before, so that the tester needs to waste a large amount of time on reading the duplicate reports, which is not helpful for the tester to duplicate and locate the Bug. Researchers are therefore very concerned with the problem of detecting similar crowdsourcing test reports to help improve the effectiveness of crowdsourcing test reports.
The similarity of the crowdsourcing test reports may be due to two reasons:
1) the first reason is as follows: since each crowdsourcing worker participates in all testing tasks, it is inevitable that multiple crowdsourcing workers find the same Bug, and multiple workers adopt similar words and sentences to deformably describe the same Bug, thereby resulting in duplicate content in multiple crowdsourcing reports.
2) The second reason is that: since crowdsourced testing provides a monetary incentive, there may be some behavior for malicious workers to copy others' test reports to cheat on rewards.
For the crowdsourcing test report caused by the second reason, the similarity is usually very high, and most text contents are completely the same, so that the detection effect of the traditional text similarity analysis on the similar report is better. However, for the reason one, the words and sentences in the plurality of test reports are similar in meaning and are not identical. For similar crowdsourcing test reports caused by such reasons, the detection effect of the traditional plain text similarity analysis is not ideal.
There are many methods of detecting similar Bug reports today. Runeson et al analyzed that it was difficult to compare the similarity of two reports to a canonical method for the feature that defect reports were mostly written in structured natural language, so they identified duplicate terms in the reports by a plain text natural language processing technique. Sun et al propose a search tool to test the similarity between two reports, which not only exploits the relevance of the textual content of the abstract and description fields in the report, but also reports the similarity of non-textual fields, such as products, components, versions, etc. The tool also expands an effective similarity formula in the BM 25F-information retrieval community, uses a two-round stochastic echelon descent method, automatically optimizes the retrieval process aiming at a specific Bug library by a supervised learning method, and further improves the accuracy of detection of the similarity report. The first method has a good detection effect on similar reports with most of text contents identical, but does not consider the first reason. The second method considers the situation of the first reason, but because the specific Bug library involved in the first method is not specific to the field of the crowdsourcing report, the similarity detection of the crowdsourcing report is still not ideal.
Disclosure of Invention
The invention aims to solve the problems that: in the similarity detection of the current crowdsourcing test report, the detection effect of a similar report which has the same meaning but has different text contents is not good.
The technical scheme of the invention is as follows: a method for detecting similarity of crowdsourcing test reports based on natural language processing comprises the following steps:
1) firstly, establishing a crowdsourcing test corpus, and training a supervised semantic model on the basis of the crowdsourcing test corpus:
1.1) pick typical Bug scenarios based on past massive crowdsourcing test report data, first give a typical description of the Bug by an expert team for each particular scenario, and collect two types of descriptions for the scenario. The description of the first category is not exactly the same as the typical description, but the concrete representation of the Bug in the scene is well illustrated by the expression with similar meaning. The second type describes the same text content as the typical description with most, but describes completely different bugs by modifying a small amount of content (e.g., subject, predicate).
1.2) aiming at the collected data, constructing a proper noun corpus of the crowdsourcing test, and summarizing a crowdsourcing test synonym library.
1.3) aiming at the collected data, artificially marking whether the descriptions are similar or not, and training a crowdsourcing test report semantic model based on a calculation model of a neural network. And (5) through multiple iterations and parameter adjustment, an ideal detection effect is achieved on the test set, and the semantic model is trained.
2) Next, input processing is performed. The input crowdsourcing test report is preprocessed firstly, and the crowdsourcing test report is processed by the stop word list summarized previously. And then, reporting and segmenting words by using a Chinese word segmentation tool JieBa, selecting only words contained in the corpus summarized in the step (1) for segmented results, and then replacing the near-meaning words according to a near-meaning word library. And finishing the input preprocessing work and outputting the word group corresponding to the report.
3) And then, taking a Word group list corresponding to each report as input, representing the Word groups corresponding to the reports after Word segmentation into types capable of being calculated by a computer by using the Word2Vec technology, and calculating a Word embedding vector of each Word group.
4) Selecting the characteristics of Word frequency, n-gram, part of speech and the like, vectorizing the characteristics, (Word2Vec utilizes the co-occurrence characteristics of texts in a window) and taking the characteristics as the input of a semantic model for calculation.
5) And (3) taking the word embedding vector and the feature vector corresponding to the word group as the input of the semantic model trained in the step (1) for training. The distance between vectors, i.e. the similarity between reports, is calculated using cosine similarity as a criterion for the measure of distance between vectors.
6) And performing clustering analysis by using a K-means method according to the calculated result, classifying the reports with high similarity into the same class, and finally obtaining a clustering result and a similar report cluster.
The invention is characterized in that: 1. and constructing a crowdsourcing test corpus and a synonym library by collecting typical Bug scenes and descriptions in the crowdsourcing test report field. 2. A crowd-sourced test report semantic model for supervised learning based on a neural network is trained. 3. Word embedding vectors are computed from the phrase library using Word2 Vec. 4. And calculating the similarity between reports by using cosine similarity. 5. And carrying out clustering analysis according to a K-means method, and inducing high similarity report clusters.
The invention has the beneficial effects that: through the summarized typical Bug scenes and descriptions in the field of crowd-sourced test reports, and the semantic model based on supervised learning of neural networks, similarities between multiple reports that are not identical in content but describe the same Bug can be effectively identified.
Drawings
FIG. 1 is an overall flow chart of the present invention.
FIG. 2 is a partial data example of a crowdsourcing report data set in accordance with the present invention
FIGS. 3-6 are examples of clusters of partially similar report classes after cluster analysis of a data set in accordance with the present invention.
Detailed Description
The invention relates to several key technologies, namely jieba Word segmentation, a Word2Vec model, K-means clustering and an LSTM-DSSM deep learning model.
1. jieba word segmentation
jieba is the best current Python Chinese word segmentation component, which mainly has the following 3 characteristics. 1. Support 3-middle word segmentation model
Formula (II): precision mode, full mode, search engine mode. 2. Support traditional Chinese character segmentation. 3. Custom dictionaries are supported.
2. Word2Vec model
Word2Vec, a group of correlation models used to generate Word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic such text. The network is represented in words and input words in adjacent positions are guessed. Under the assumption of the bag of words model in Word2Vec, the order of the words is unimportant. After training is completed, the Word2Vec model can be used to map each Word to a vector, which can be used to represent the Word-to-Word relationship, and the vector is a neural network hidden layer.
3. K-means clustering
The K-means clustering algorithm is a clustering analysis algorithm for iterative solution, and comprises the steps of randomly selecting K objects as initial clustering centers, then calculating the distance between each object and each seed clustering center, and allocating each object to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no object is reassigned to a different cluster, no cluster center changes again, and the sum of squared errors is locally minimal.
4. LSTM-DSSM deep learning model
The LSTM-DSSM inherits the RNN, mainly aiming at the defect that the CNN-DSSM cannot capture the context features at a longer distance. The method is a modification of a DSSM model, and is mainly applied to the calculation of semantic similarity in the field of natural language processing.
The following describes the steps of the method with a specific example and shows the results.
We picked 20000 Bug reports of different content written in a uniform format.
The experimental environment is as follows: ubuntu 16.04 LTS, running memory 8GB, storing 5126B SSD
The overall process of the invention is shown in fig. 1, and the specific implementation steps are as follows:
1) and carrying out input preprocessing. Firstly, stopping word processing is carried out on 20000 crowdsourcing test reports, then, a Chinese word segmentation tool jieba is used for carrying out report word segmentation, only words which are received into a corpus are reserved, near-meaning word replacement is carried out, and a word group corresponding to the report is output;
2) taking a word group corresponding to the report as the input of a word2vec algorithm, calculating a word embedding vector corresponding to each report, wherein the generated word vector is 200-dimensional, 10 words in the upper 5 words and the lower 5 words are considered, and a skip-gram method is adopted;
3) and calculating the variance of each feature by adopting a method selection method, selecting the feature with the variance larger than a threshold value according to the threshold value, and removing the feature with small value change. And respectively and independently calculating a certain statistical index of each variable, and judging which indexes are important to be removed according to the indexes. Outputting the vectorized feature vector after the feature selection is finished;
4) taking the word embedding vector and the feature vector corresponding to the report as the input of a trained semantic model for training;
5) cosine similarity is selected as a measurement standard of the similarity, the distance between the vectors is measured by using the size of an included angle between the two vectors, and the distance is smaller when the vector between the vectors is smaller, namely the two vectors are more similar. Therefore, the size of an included angle of the word embedding vector corresponding to the report is calculated, namely the similarity of the two reports;
6) taking the result of model training as the input of a K-means clustering method, firstly selecting a K value according to an empirical value formula
Figure BSA0000210360510000041
Since our experimental sample is 20000 pieces of data, the K value is 100. Therefore, we randomly select 100 points as cluster centers, calculate clusters from each point to 100 cluster centers, and then assign the point to the nearest cluster center, thus forming 100 clusters. The mean value for each cluster is then recalculated. The above steps are repeated until the mean value no longer changes. Examples of the class clusters output after clustering are shown in fig. 3-6.

Claims (5)

1. A method for detecting similarity of crowdsourcing test reports based on natural language processing is characterized by constructing a crowdsourcing test corpus, training a supervised semantic model on the basis of the crowdsourcing test corpus, performing input preprocessing by utilizing corpus words, stop words and a near-sense Word library, calculating Word embedding vectors corresponding to reports by utilizing a Word2Vec technology, selecting feature vectorization, calculating report similarity by taking cosine similarity as measurement, performing clustering analysis by adopting K-means, and finally obtaining report clusters with high similarity.
2. The method for detecting similarity of crowdsourcing test reports based on natural language processing as claimed in claim 1, wherein a crowdsourcing test corpus is constructed, a supervised semantic model is trained on the basis of the crowdsourcing test corpus, and the crowdsourcing test corpus is divided mainly by the following steps:
1) first, a typical Bug scenario is picked based on past massive crowdsourced test report data, a typical description of the Bug is first given by an expert team for each particular scenario, and two types of descriptions for the scenario are collected. The description of the first category is not identical to the typical description, but the concrete performance of the Bug in the scene is well clarified through expressions with similar meanings; the second type describes the same text content as the typical description with most part, but describes completely different bugs by modifying a small amount of content (e.g. subject, predicate);
2) aiming at the collected data, a proper noun corpus of crowdsourcing test is constructed, and a crowdsourcing test word library is summarized;
3) aiming at the collected data, artificially marking whether the descriptions are similar or not, and training a crowdsourcing test report semantic model based on a calculation model of a neural network; and (5) through multiple iterations and parameter adjustment, an ideal detection effect is achieved on the test set, and the semantic model is trained.
3. The method for detecting similarity of crowdsourcing test reports based on natural language processing as claimed in claim 1, wherein the method comprises performing input preprocessing using corpus words, stop words and a thesaurus and calculating Word embedding vectors corresponding to reports using Word2Vec technology; firstly preprocessing an input crowdsourcing test report, and performing stop word processing on the crowdsourcing test report through a stop word list summarized previously; then, reporting and segmenting words by using a Chinese word segmentation tool jieba, selecting only words stored in the summarized corpus as to the segmented results, and then replacing the near meaning words according to a near meaning word bank; finishing input preprocessing work and outputting word groups corresponding to the reports; taking the word group corresponding to the report as the input of a word2vec algorithm, calculating a word embedding vector corresponding to each report, wherein the generated word vector is 200-dimensional, 10 words in the upper 5 words and the lower 5 words are considered, and a skip-gram method is adopted.
4. The method for detecting similarity of crowdsourcing test reports based on natural language processing as recited in claim 1, wherein feature vectorization is selected and report similarity is calculated using cosine similarity measurement, variance of each feature is calculated using method selection, features with variance larger than a threshold are selected according to the threshold, and features with small change in value are removed; respectively and independently calculating a certain statistical index of each variable, and judging which indexes are important to be removed according to the indexes; outputting vectorized feature vectors after feature selection is finished, calculating report similarity by taking cosine similarity as measurement, measuring the distance between the vectors by using the size of an included angle between the two vectors, wherein the smaller the vector between the vectors is, the smaller the distance is, namely the two vectors are more similar; therefore, the size of the included angle of the word embedding vector corresponding to the report is calculated, namely the similarity of the two reports.
5. The method for detecting similarity of crowdsourcing test reports based on natural language processing as claimed in claim 1, wherein K-means is used for clustering analysis to obtain report clusters with high similarity; taking the result of model training as the input of a K-means clustering method, firstly selecting a K value according to an empirical value formula
Figure FSA0000210360500000021
Since our experimental sample is 20000 pieces of data, the K value is 100; therefore, 100 points are randomly selected as clustering centers, clustering from each point to 100 clustering centers is calculated, and then the point is divided into the nearest clustering centers, so that 100 clusters are formed; the mean value for each cluster is then recalculated. The above steps are repeated until the mean value no longer changes.
CN202010487202.9A 2020-05-27 2020-05-27 Crowdsourcing test report similarity detection method based on natural language processing Pending CN113743096A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010487202.9A CN113743096A (en) 2020-05-27 2020-05-27 Crowdsourcing test report similarity detection method based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010487202.9A CN113743096A (en) 2020-05-27 2020-05-27 Crowdsourcing test report similarity detection method based on natural language processing

Publications (1)

Publication Number Publication Date
CN113743096A true CN113743096A (en) 2021-12-03

Family

ID=78727943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010487202.9A Pending CN113743096A (en) 2020-05-27 2020-05-27 Crowdsourcing test report similarity detection method based on natural language processing

Country Status (1)

Country Link
CN (1) CN113743096A (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970666A (en) * 2014-05-29 2014-08-06 重庆大学 Method for detecting repeated software defect reports
CN107273426A (en) * 2017-05-18 2017-10-20 四川新网银行股份有限公司 A kind of short text clustering method based on deep semantic route searching
CN107562919A (en) * 2017-09-13 2018-01-09 云南大学 A kind of more indexes based on information retrieval integrate software component retrieval method and system
CN108427720A (en) * 2018-02-08 2018-08-21 中国科学院计算技术研究所 System log sorting technique
US20180349105A1 (en) * 2017-06-05 2018-12-06 Devfactory Fz-Llc Method and System for Arbitrary-Granularity Execution Clone Detection
CN109165382A (en) * 2018-08-03 2019-01-08 南京工业大学 A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
CN110363248A (en) * 2019-07-22 2019-10-22 苏州大学 The computer identification device and method of mobile crowdsourcing test report based on image
US20190325300A1 (en) * 2018-04-19 2019-10-24 Siemens Healthcare Gmbh Artificial intelligence querying for radiology reports in medical imaging
CN110377901A (en) * 2019-06-20 2019-10-25 湖南大学 A kind of text mining method for making a report on case for distribution line tripping
CN110390363A (en) * 2019-07-29 2019-10-29 上海海事大学 A kind of Image Description Methods
CN110502361A (en) * 2019-08-29 2019-11-26 扬州大学 Fine granularity defect positioning method towards bug report
CN110597997A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario text event extraction corpus iterative construction method and device
CN110928764A (en) * 2019-10-10 2020-03-27 中国人民解放军陆军工程大学 Automated mobile application crowdsourcing test report evaluation method and computer storage medium
US20200133756A1 (en) * 2018-10-26 2020-04-30 EMC IP Holding Company LLC Method, apparatus and computer storage medium for error diagnostics of an application
CN111104510A (en) * 2019-11-15 2020-05-05 南京中新赛克科技有限责任公司 Word embedding-based text classification training sample expansion method
CN111177374A (en) * 2019-12-13 2020-05-19 航天信息股份有限公司 Active learning-based question and answer corpus emotion classification method and system

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970666A (en) * 2014-05-29 2014-08-06 重庆大学 Method for detecting repeated software defect reports
CN107273426A (en) * 2017-05-18 2017-10-20 四川新网银行股份有限公司 A kind of short text clustering method based on deep semantic route searching
US20180349105A1 (en) * 2017-06-05 2018-12-06 Devfactory Fz-Llc Method and System for Arbitrary-Granularity Execution Clone Detection
CN107562919A (en) * 2017-09-13 2018-01-09 云南大学 A kind of more indexes based on information retrieval integrate software component retrieval method and system
CN108427720A (en) * 2018-02-08 2018-08-21 中国科学院计算技术研究所 System log sorting technique
US20190325300A1 (en) * 2018-04-19 2019-10-24 Siemens Healthcare Gmbh Artificial intelligence querying for radiology reports in medical imaging
CN109165382A (en) * 2018-08-03 2019-01-08 南京工业大学 A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
US20200133756A1 (en) * 2018-10-26 2020-04-30 EMC IP Holding Company LLC Method, apparatus and computer storage medium for error diagnostics of an application
CN110377901A (en) * 2019-06-20 2019-10-25 湖南大学 A kind of text mining method for making a report on case for distribution line tripping
CN110597997A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario text event extraction corpus iterative construction method and device
CN110363248A (en) * 2019-07-22 2019-10-22 苏州大学 The computer identification device and method of mobile crowdsourcing test report based on image
CN110390363A (en) * 2019-07-29 2019-10-29 上海海事大学 A kind of Image Description Methods
CN110502361A (en) * 2019-08-29 2019-11-26 扬州大学 Fine granularity defect positioning method towards bug report
CN110928764A (en) * 2019-10-10 2020-03-27 中国人民解放军陆军工程大学 Automated mobile application crowdsourcing test report evaluation method and computer storage medium
CN111104510A (en) * 2019-11-15 2020-05-05 南京中新赛克科技有限责任公司 Word embedding-based text classification training sample expansion method
CN111177374A (en) * 2019-12-13 2020-05-19 航天信息股份有限公司 Active learning-based question and answer corpus emotion classification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯样: "高可信众包群体构建方法", 中国科学 :信息科学, vol. 49, no. 11, pages 1412 - 1426 *
胡龙茂 等: "基于多维相似度和情感词扩充的相同产品特征识别", 山东大学学报, vol. 50, no. 2, pages 50 - 59 *

Similar Documents

Publication Publication Date Title
CN108519971B (en) Cross-language news topic similarity comparison method based on parallel corpus
CN107844533A (en) A kind of intelligent Answer System and analysis method
CN111427775B (en) Method level defect positioning method based on Bert model
CN111124487B (en) Code clone detection method and device and electronic equipment
CN110162630A (en) A kind of method, device and equipment of text duplicate removal
CN111222330B (en) Chinese event detection method and system
CN110728313B (en) Classification model training method and device for intention classification recognition
CN113672931B (en) Software vulnerability automatic detection method and device based on pre-training
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN111930933A (en) Detection case processing method and device based on artificial intelligence
CN113407721A (en) Method, device and computer storage medium for detecting log sequence abnormity
CN112989813A (en) Scientific and technological resource relation extraction method and device based on pre-training language model
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN116150757A (en) Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model
CN115526234A (en) Cross-domain model training and log anomaly detection method and device based on transfer learning
CN113138920B (en) Software defect report allocation method and device based on knowledge graph and semantic role labeling
Skoumas et al. On quantifying qualitative geospatial data: A probabilistic approach
CN112286799B (en) Software defect positioning method combining sentence embedding and particle swarm optimization algorithm
CN113742396A (en) Mining method and device for object learning behavior pattern
CN111339258B (en) University computer basic exercise recommendation method based on knowledge graph
CN114969334B (en) Abnormal log detection method and device, electronic equipment and readable storage medium
Revindasari et al. Traceability between business process and software component using Probabilistic Latent Semantic Analysis
CN112685374A (en) Log classification method and device and electronic equipment
CN115858785A (en) Sensitive data identification method and system based on big data
CN115936003A (en) Software function point duplicate checking method, device, equipment and medium based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination