CN112905186A - High signal-to-noise ratio code classification method and device suitable for open-source software supply chain - Google Patents

High signal-to-noise ratio code classification method and device suitable for open-source software supply chain Download PDF

Info

Publication number
CN112905186A
CN112905186A CN202110168454.XA CN202110168454A CN112905186A CN 112905186 A CN112905186 A CN 112905186A CN 202110168454 A CN202110168454 A CN 202110168454A CN 112905186 A CN112905186 A CN 112905186A
Authority
CN
China
Prior art keywords
node
path
ast
code
syntax tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110168454.XA
Other languages
Chinese (zh)
Other versions
CN112905186B (en
Inventor
李浩晨
吴敬征
武延军
罗天悦
杨牧天
崔星
段旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202110168454.XA priority Critical patent/CN112905186B/en
Publication of CN112905186A publication Critical patent/CN112905186A/en
Application granted granted Critical
Publication of CN112905186B publication Critical patent/CN112905186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • G06F8/437Type checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a high signal-to-noise ratio code classification method and a device suitable for an open source software supply chain, wherein the method comprises the following steps: the method comprises the steps of converting a code to be predicted into PE-AST, digitizing each node, extracting a PE-AST path in the PE-AST, converting the PE-AST path into a tuple capable of being operated, calculating a correlation coefficient WS, updating path representation and predicting according to a PE-AST feature vector. The invention can improve the signal-to-noise ratio in the code representation process, thereby improving the accuracy of machine classification codes; according to the classification of the codes, the working efficiency of programmers in the aspects of code understanding and code maintenance is improved.

Description

High signal-to-noise ratio code classification method and device suitable for open-source software supply chain
Technical Field
The invention belongs to the technical field of computers, and relates to a high signal-to-noise ratio code classification method and device suitable for an open source software supply chain.
Background
Over the past decade, a large amount of open source software has emerged, which is the core that constitutes the supply chain for open source software. For a programmer, correctly classifying a large amount of source code contained in source software is helpful to improve work efficiency. First, grouping applications with similar functionality may make it more convenient for programmers to find functions that need to be implemented in applications that belong to the same group or category. Secondly, the same code loopholes often widely exist in codes of the same type of functions, namely, the same type of codes often have common loopholes, and when a programmer finds the loopholes in a section of codes, the programmer can quickly locate other places where similar errors may occur, so that the maintenance efficiency is improved.
Currently, the code classification method is generally based on a neural network, and through the learning of a large number of samples, the neural network model finds specific rules in data, and then classifies the codes according to the rules in practical use. However, if the sample and the program to be tested are complex, that is, the number of tokens (words obtained by segmenting the sentence of the program) is large, the noise contained in the code is significantly increased, and this kind of method cannot completely capture the effective rule applicable to code classification, thereby reducing the accuracy of code classification. At this time, the signal-to-noise ratio should be increased to obtain the semantic representation of the code containing the key information of the program, thereby avoiding the reduction of the accuracy.
In summary, the prior art has a problem of insufficient code classification accuracy for the current open source software supply chain.
Disclosure of Invention
The invention aims to provide a high signal-to-noise ratio code classification method and device suitable for an open source software supply chain.
In order to achieve the purpose, the invention adopts the following technical scheme:
a high signal-to-noise ratio code classification method suitable for an open source software supply chain comprises the following steps:
1) analyzing the syntax tree of the program to be predicted to generate an abstract syntax tree T of the code of the program to be predictedASTAnd construct tuples<TAST,pos>Wherein T isAST(N, T, s, δ, Φ), which is a non-end node set, T is an end node set, s is a root node, X is an actual value of each node in the abstract syntax tree, δ is a correspondence between a parent node and a child node in the abstract syntax tree, Φ is a correspondence between each node in the abstract syntax tree and the actual value, and pos is a position coordinate of each node in the abstract syntax tree;
2) will tuple<TAT,pos>Inputting a code classification model to obtain a classification prediction result of a program to be predicted;
wherein the code classification model is based on classification indexes of a plurality of sample programs and corresponding tuples<T′AST,pos′>And is obtained by deep learning method training; the code classification model parses tuples by<TAST,pos>:
a) Coding a corresponding relation phi (n) of each node, mapping a coded result to a vector space, and obtaining a final vector representation v (n) of each node according to the obtained node vector representation and the distance from the corresponding position coordinate pos to a root node s;
b) according to the non-end node set N, the end node set T, the root node s and the corresponding relation delta, the abstract syntax tree T is pairedASTExtracting the path, and combining the final vector representation v (n) to construct a path LiVector representation of (L)i) Wherein i is the end node number;
c) the vector representation emb (L) is updated by computing the correlation coefficient WS of each path with other pathsq) Obtaining a path representation ziAnd for each path, represents ziPerforming maximum pooling to obtain a final vector representation e of the program code to be predictedcode
d) According to the final vector representation ecodeAnd obtaining a classification index to obtain a classification prediction result of the program to be predicted.
Further, the method for parsing the syntax tree includes: javalang packets in Python are used.
Further, the position coordinates pos are obtained by:
1) by node n in an abstract syntax tree TASTDepth n ofdepthWith the abstract syntax tree TASTDepth T ofdepthCalculating the coordinates of node n
Figure BDA0002938320880000021
2) By the parent x value of node n, the number of siblings and the position n of node n in the siblingsqTo obtain the coordinates of the node n
Figure BDA0002938320880000022
3) And obtaining the position coordinate pos of the node n according to the coordinate x and the coordinate y.
Further, the framework for training the code classification model includes: PyTorch frame.
Further, the result E (Φ (n)) ═ W after encodingEPhi (n), where WE∈RN′*EN' is the number of different node types, and E is the embedding dimension.
Further, the final vector representation
Figure BDA0002938320880000023
Where (x, y) is the coordinate of node n in the coordinate system with the root node s as the origin.
Further, the association coefficient WS is calculated by:
1) for any two paths LiAnd LjComputing end node semantic similarity
Figure BDA0002938320880000024
Wherein v (L)i1) Represents a path LiEnd node of v (L)j1) Represents a path LjJ is the serial number of the path, and j is not equal to i;
2) for the two paths LiAnd LjCalculating the path semantic similarity WSpath=sigmoid(Wpath·[emb(Li),emb(Lj)]+bpath) Wherein W ispath∈R6E*1E is the embedding dimension when encoding the correspondence phi (n), bpathIs an offset;
3) correlation coefficient WSi,j=α*WStoken+β*WSpathWhere α is the first coefficient and β is the second coefficient.
Further, the path representation
Figure BDA0002938320880000031
Wherein WvFor linear transformation, NLIs divided by path LiOther path sets than the above.
Further, the classification index is obtained by:
1) for the final vector representation ecodePerforming linear and nonlinear transformation;
2) classifying the transformed result through a Softmax () function to obtain the probability distribution P of the predicted resultd
3) Selecting a probability distribution PdObtaining the classification index of the program to be predicted according to the index corresponding to the medium maximum value
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
Compared with the prior art, the invention has the following advantages:
1) the signal-to-noise ratio in the code representation process can be improved, so that the accuracy of machine classification codes is improved;
2) according to the classification of the codes, the working efficiency of programmers in the aspects of code understanding and code maintenance is improved.
Drawings
FIG. 1 is a flow chart of a high signal-to-noise ratio code classification method suitable for use in an open source software supply chain.
Fig. 2 is a diagram illustrating a PE-AST path corresponding to a code.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings and examples.
The general flow of the high snr code classification method of the present embodiment is shown in fig. 1, and mainly includes the following steps:
1. and converting the code to be predicted into the PE-AST. The concrete description is as follows:
1a) AST analysis is carried out on the java program by adopting a java bag in Python, and an abstract syntax tree of the code segment is generated and recorded as TAST,TASTN is a set of non-end nodes, T is a set of end nodes, X is an actual value of each node, s is a root node, δ is a correspondence between a parent node and a child node, and Φ is a correspondence between a node and an actual value. Go to 1 b).
1b) Traverse TASTThe position coordinates pos of each node are generated, and pos is (x, y). For a certain node n, the node n,
Figure BDA0002938320880000041
wherein n isdepthFor node n at TASTDepth of middle, TdepthIs TASTDepth of (i.e. T)ASTThe depth of the deepest node in the cluster.
Figure BDA0002938320880000042
Wherein xpFor the value x, n, of the parent node of node nnumIs the number of n siblings of node n, niThe positions of the node n in the sibling nodes are 1 from the left, and the positions are sequentially increased. Go to 1 c).
1c) Building tuples<TAST,pos>PE-AST as the segment code.
2. Each node is digitized. The concrete description is as follows:
2a) using a matrix WE∈RN*EEncode φ (n), i.e.:
E(φ(n))=WE·φ(n)
wherein, N is the number of different node types, the types include: there are many types of initialization, stabilization, identification, etc., and E is the embedding dimension. Go to 2 b).
2b) Calculating an embedding weight coefficient according to the distance of the node n relative to the origin, and multiplying the embedding weight coefficient by the result in a) to obtain a final node vector representation, namely:
Figure BDA0002938320880000043
where v (n) is the final vector representation for node n, with the origin being the root node of the AST.
3. And extracting a PE-AST path in the PE-AST. A PE-AST path is a sequence n of length k1…nk-1s, wherein n1Is one end node in PE-AST, s is root node; for i e [2, k-1 ]]N of (A) to (B)iAre all non-end nodes; for a PE-AST path, the form is<n1,p,s>P represents the removal of n from the sequence1And the section of s. A PE-AST path starts at an end node, ends at a root node, and traverses a series of non-end nodes. For example, the portion enclosed by the dashed box in fig. 2 is an example of the PE-AST path.
4. Converting the PE-AST path into a tuple that can be operated on, the tuple being in the form of<v(n1),v(p),v(s)>. The concrete description is as follows:
4a) for the calculation mode of v (p), it is expressed as summing the vectors of other nodes except the starting point and the end point on the PE-AST path, that is:
Figure BDA0002938320880000044
go to 4 b).
4b) Building triplets<v(n1),v(p),v(s)>As a vector representation for the PE-AST path for calculation.
5. And selecting any PE-AST path, and calculating the association coefficient WS of other PE-AST paths and the path. The correlation coefficient is a coefficient indicating the degree of co-operation of different PE-AST paths in the same segment of code. The method comprises the following specific steps:
5a) for two PE-AST paths L1,L2Calculating end node semantic similarity WStokenNamely:
Figure BDA0002938320880000051
wherein, v (L)11) Represents L1End node of v (L)21) Represents L2The end node of (1). Go to 5 b).
5b) For two PE-AST paths L1,L2Calculating the path semantic similarity WSpathNamely:
WSpath=sigmoid(Wpath·[emb(L1),emb(L2)]+bpath)
wherein, Wpath∈R6E*1,bpathIs the offset. Go to 5 c).
5c) By end node semantic similarity WStokenAnd path semantic similarity WSpathThe average summation results in a path correlation coefficient WS, which is:
WS=0.5*WStoken+0.5*WSpath
6. the path representation is updated. Handle
Figure BDA0002938320880000052
Expressed as input, the PE-AST path set, NLThe number of PE-AST paths, if emb (l) is input and the updated path indicates z is output, it can be calculated by the following formula:
Figure BDA0002938320880000053
wherein, WvFor linear transformation, 1 × 1 convolution kernel is adopted, i represents a certain PE-AST path, and j represents the rest PE-AST paths.
7. And predicting according to the PE-AST characteristic vector, and specifically comprising the following steps:
6a) all the updated PE-AST paths are subjected to maximum pooling to obtain the final vector representation of the whole code, namely ecode=[max(zi,1),max(zi,2),...,max(zi,E)]Wherein i ∈ [1, N ]p]Go to 6 b).
6b) E to be the feature vector of the whole segment codecodeAfter linear and nonlinear transformation, probability distribution P of prediction result is obtained by adopting Softmax () functiondSelecting the index corresponding to the maximum value in the results as the final prediction result to obtain the classification prediction result, namely Pd=Softmax(ReLU(Wcode·ecode+bcode) In a batch process), wherein,
Figure BDA0002938320880000054
Figure BDA0002938320880000055
Nrnumber of possible answers, bcodeIs the offset.
8. And (3) training the model described in the steps by using a data set to obtain a trained deep learning model, wherein a PyTorch framework is used in the training process.
The inventor trains the model described in the above steps with a java14m data set, the data in the java14m data set is derived from a project of 10,072 GitHub, and comprises 12,636,998 training samples, 371,362 verification samples and 368,445 test samples in total, and the cloned codes are removed, so that the model has strong specificity and rigor. An Adams optimizer is used in the training process, the initial learning rate is set to be 0.01, a trained deep learning model is obtained after 10 times of training of the whole data set, and a PyTorch framework is used in the training process.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A high signal-to-noise ratio code classification method suitable for an open source software supply chain comprises the following steps:
1) analyzing the syntax tree of the program to be predicted to generate an abstract syntax tree T of the code of the program to be predictedASTAnd construct tuples<TAST,pos>Wherein T isASTN is a non-end node set, T is an end node set, s is a root node, X is an actual value of each node in the abstract syntax tree, δ is a correspondence between a parent node and a child node in the abstract syntax tree, Φ is a correspondence between each node in the abstract syntax tree and the actual value, and pos is a position coordinate of each node in the abstract syntax tree;
2) will tuple<TAST,pos>Inputting a code classification model to obtain a classification prediction result of a program to be predicted;
wherein the code classification model is based on classification indexes of a plurality of sample programs and corresponding tuples<T′AST,pos′>And is obtained by deep learning method training; the code classification model parses tuples by<TAST,pos>:
a) Coding a corresponding relation phi (n) of each node, mapping a coded result to a vector space, and obtaining a final vector representation v (n) of each node according to the obtained node vector representation and the distance from the corresponding position coordinate pos to a root node s;
b) according to the non-end node set N, the end node set T, the root node s and the corresponding relation delta, the abstract syntax tree T is pairedASTExtracting the path, and combining the final vector representation v (n) to construct a path LiVector representation of (L)i) Wherein i is the end node number;
c) the vector representation emb (L) is updated by computing the correlation coefficient WS of each path with other pathsq) Obtaining a path representation ziAnd for each path, represents ziPerforming maximum pooling to obtain a final vector representation e of the program code to be predictedcode
d) According to the final vector representation ecodeAnd obtaining a classification index to obtain a classification prediction result of the program to be predicted.
2. The method of claim 1, wherein the method of syntax tree parsing comprises: javalang packets in Python are used.
3. The method of claim 1, wherein the position coordinates pos are obtained by:
1) by node n in an abstract syntax tree TASTDepth n ofdepthWith the abstract syntax tree TASTDepth T ofdepthCalculating the coordinates of node n
Figure FDA0002938320870000011
2) By the parent x value of node n, the number of siblings and the position n of node n in the siblingsqTo obtain the coordinates of the node n
Figure FDA0002938320870000012
3) And obtaining the position coordinate pos of the node n according to the coordinate x and the coordinate y.
4. The method of claim 1, wherein training a framework of a code classification model comprises: PyTorch frame.
5. The method of claim 1, wherein a result after encoding E (Φ (n)) ═ WEPhi (n), final vector representation
Figure FDA0002938320870000013
Wherein WE∈RN′*EN' is the number of different node types, E is the embedding dimension, and (x, y) is the coordinate of node N in the coordinate system with root node s as the origin.
6. The method of claim 1, wherein the correlation coefficient WS is calculated by:
1) for any two paths LiAnd LjComputing end node semantic similarity
Figure FDA0002938320870000021
Wherein v (L)i1) Represents a path LiEnd node of v (L)j1) Represents a path LjJ is the serial number of the path, and j is not equal to i;
2) for the two paths LiAnd LjCalculating the path semantic similarity WSpath=sigmoid(Wpath·[emb(Li),emb(Lj)]+bpath) Wherein W ispath∈R6E*1E is the embedding dimension when encoding the correspondence phi (n), bpathIs an offset;
3) correlation coefficient WSi,j=α*WStoken+β*WSpathWhere α is the first coefficient and β is the second coefficient.
7. The method of claim 6, wherein a path representation
Figure FDA0002938320870000022
Figure FDA0002938320870000023
Wherein WvFor linear transformation, NLIs divided by path LiOther path sets than the above.
8. The method of claim 1, wherein the classification index is obtained by:
1) for the final vector representation ecodePerforming linear and nonlinear transformation;
2) classifying the transformed result through a Softmax () function to obtain the probability distribution P of the predicted resultd
3) Selecting a probability distribution PdAnd obtaining the classification index of the program to be predicted according to the index corresponding to the medium maximum value.
9. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when run, perform the method of any of claims 1-8.
10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-8.
CN202110168454.XA 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain Active CN112905186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110168454.XA CN112905186B (en) 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110168454.XA CN112905186B (en) 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain

Publications (2)

Publication Number Publication Date
CN112905186A true CN112905186A (en) 2021-06-04
CN112905186B CN112905186B (en) 2023-04-07

Family

ID=76123652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110168454.XA Active CN112905186B (en) 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain

Country Status (1)

Country Link
CN (1) CN112905186B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222419A1 (en) * 2005-12-06 2009-09-03 National Ict Australia Limited Succinct index structure for xml
CN101697121A (en) * 2009-10-26 2010-04-21 哈尔滨工业大学 Method for detecting code similarity based on semantic analysis of program source code
US9262406B1 (en) * 2014-05-07 2016-02-16 Google Inc. Semantic frame identification with distributed word representations
CN107729925A (en) * 2017-09-26 2018-02-23 中国科学技术大学 The automatic method classified with scoring is done according to solution approach to program competition type source code
US20190005163A1 (en) * 2017-06-29 2019-01-03 International Business Machines Corporation Extracting a knowledge graph from program source code
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
CN110597735A (en) * 2019-09-25 2019-12-20 北京航空航天大学 Software defect prediction method for open-source software defect feature deep learning
CN112181428A (en) * 2020-09-28 2021-01-05 北京航空航天大学 Abstract syntax tree-based open-source software defect data classification method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222419A1 (en) * 2005-12-06 2009-09-03 National Ict Australia Limited Succinct index structure for xml
CN101697121A (en) * 2009-10-26 2010-04-21 哈尔滨工业大学 Method for detecting code similarity based on semantic analysis of program source code
US9262406B1 (en) * 2014-05-07 2016-02-16 Google Inc. Semantic frame identification with distributed word representations
US20190005163A1 (en) * 2017-06-29 2019-01-03 International Business Machines Corporation Extracting a knowledge graph from program source code
CN107729925A (en) * 2017-09-26 2018-02-23 中国科学技术大学 The automatic method classified with scoring is done according to solution approach to program competition type source code
CN109445834A (en) * 2018-10-30 2019-03-08 北京计算机技术及应用研究所 The quick comparative approach of program code similitude based on abstract syntax tree
CN110597735A (en) * 2019-09-25 2019-12-20 北京航空航天大学 Software defect prediction method for open-source software defect feature deep learning
CN112181428A (en) * 2020-09-28 2021-01-05 北京航空航天大学 Abstract syntax tree-based open-source software defect data classification method and system

Also Published As

Publication number Publication date
CN112905186B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
CN109063021B (en) Knowledge graph distributed expression method capable of coding relation semantic diversity structure
CN110673840A (en) Automatic code generation method and system based on tag graph embedding technology
CN109871454B (en) Robust discrete supervision cross-media hash retrieval method
US11900250B2 (en) Deep learning model for learning program embeddings
CN113987174A (en) Core statement extraction method, system, equipment and storage medium for classification label
CN113412492A (en) Quantum algorithm for supervised training of quantum Boltzmann machine
CN115658846A (en) Intelligent search method and device suitable for open-source software supply chain
CN115617614A (en) Log sequence anomaly detection method based on time interval perception self-attention mechanism
Jahanshahi et al. nTreeClus: A tree-based sequence encoder for clustering categorical series
CN117077586B (en) Register transmission level resource prediction method, device and equipment for circuit design
CN111737694B (en) Malicious software homology analysis method based on behavior tree
CN113076545A (en) Deep learning-based kernel fuzzy test sequence generation method
CN112905186B (en) High signal-to-noise ratio code classification method and device suitable for open-source software supply chain
CN116861373A (en) Query selectivity estimation method, system, terminal equipment and storage medium
CN117271701A (en) Method and system for extracting system operation abnormal event relation based on TGGAT and CNN
CN116226864A (en) Network security-oriented code vulnerability detection method and system
CN116361788A (en) Binary software vulnerability prediction method based on machine learning
CN112735604B (en) Novel coronavirus classification method based on deep learning algorithm
CN113392929A (en) Biological sequence feature extraction method based on word embedding and self-encoder fusion
CN112381280A (en) Algorithm prediction method based on artificial intelligence
Su et al. A wavelet transform based protein sequence similarity model
Zhang et al. Reducing Test Cases with Attention Mechanism of Neural Networks
Wu et al. Discovering Mathematical Expressions Through DeepSymNet: A Classification-Based Symbolic Regression Framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant