CN112905186B - High signal-to-noise ratio code classification method and device suitable for open-source software supply chain - Google Patents

High signal-to-noise ratio code classification method and device suitable for open-source software supply chain Download PDF

Info

Publication number
CN112905186B
CN112905186B CN202110168454.XA CN202110168454A CN112905186B CN 112905186 B CN112905186 B CN 112905186B CN 202110168454 A CN202110168454 A CN 202110168454A CN 112905186 B CN112905186 B CN 112905186B
Authority
CN
China
Prior art keywords
node
path
ast
code
vector representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110168454.XA
Other languages
Chinese (zh)
Other versions
CN112905186A (en
Inventor
李浩晨
吴敬征
武延军
罗天悦
杨牧天
崔星
段旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202110168454.XA priority Critical patent/CN112905186B/en
Publication of CN112905186A publication Critical patent/CN112905186A/en
Application granted granted Critical
Publication of CN112905186B publication Critical patent/CN112905186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • G06F8/437Type checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a high signal-to-noise ratio code classification method and a device suitable for an open source software supply chain, wherein the method comprises the following steps: converting a code to be predicted into PE-AST, digitizing all nodes, extracting a PE-AST path in the PE-AST, converting the PE-AST path into a tuple capable of being operated, calculating a correlation coefficient WS, updating path representation and predicting according to a PE-AST characteristic vector. The invention can improve the signal-to-noise ratio in the code representation process, thereby improving the accuracy of machine classification codes; according to the classification of the codes, the working efficiency of programmers in the aspects of code understanding and code maintenance is improved.

Description

High signal-to-noise ratio code classification method and device suitable for open-source software supply chain
Technical Field
The invention belongs to the technical field of computers, and relates to a high signal-to-noise ratio code classification method and device suitable for an open source software supply chain.
Background
Over the past decade, a large amount of open source software has emerged, which is the core that constitutes the supply chain for open source software. For programmers, it is greatly helpful to correctly classify the large amount of source code contained in the source software to improve the work efficiency. First, grouping applications with similar functionality may make it more convenient for programmers to find functions that need to be implemented in applications that belong to the same group or category. Secondly, the same code loopholes often widely exist in codes of the same function, namely, the codes of the same function often have common loopholes, and when a programmer finds the loopholes in a section of codes, the programmer can quickly locate other places where similar errors possibly occur, so that the maintenance efficiency is improved.
Currently, the code classification method is generally based on a neural network, and through the learning of a large number of samples, the neural network model finds specific rules in data, and then classifies the codes according to the rules in practical use. However, if the sample and the program to be tested are complex, that is, there are many tokens (words obtained by segmenting the sentence of the program), noise included in the code will increase significantly, and this kind of method cannot completely capture the effective rule applicable to code classification, thereby resulting in a decrease in the accuracy of code classification. At this time, the signal-to-noise ratio should be increased to obtain the semantic representation of the code containing the key information of the program, thereby avoiding the reduction of the accuracy.
In summary, the prior art has a problem of insufficient code classification accuracy for the current open source software supply chain.
Disclosure of Invention
The invention aims to provide a high signal-to-noise ratio code classification method and device suitable for an open source software supply chain.
In order to achieve the purpose, the invention adopts the following technical scheme:
a high signal-to-noise ratio code classification method suitable for an open source software supply chain comprises the following steps:
1) Analyzing the syntax tree of the program to be predicted to generate an abstract syntax tree T of the code of the program to be predicted AST And construct tuples<T AST ,pos>Wherein T is AST = (N, T, s, δ, Φ), non-end node set, T end node set, s root node, X actual value of each node in abstract syntax tree, δ correspondence between parent node and child node in abstract syntax tree, Φ correspondence between each node and actual value in abstract syntax tree, pos abstractionPosition coordinates of each node in the syntax tree;
2) Will tuple<T AT ,pos>Inputting a code classification model to obtain a classification prediction result of a program to be predicted;
wherein the code classification model is based on classification indexes of a plurality of sample programs and corresponding tuples<T′ AST ,pos′>And is obtained by deep learning method training; the code classification model parses tuples by<T AST ,pos>:
a) Coding a corresponding relation phi (n) of each node, mapping a coded result to a vector space, and obtaining a final vector representation v (n) of each node according to the obtained node vector representation and the distance from the corresponding position coordinate pos to a root node s;
b) According to the non-end node set N, the end node set T, the root node s and the corresponding relation delta, the abstract syntax tree T is paired AST Path extraction is carried out, and a path L is constructed by combining the final vector representation v (n) i Vector representation of (L) i ) Wherein i is the end node number;
c) The vector representation emb (L) is updated by computing the correlation coefficient WS of each path with other paths q ) Obtaining a path representation z i And for each path, represents z i Performing maximum pooling to obtain a final vector representation e of the program code to be predicted code
d) According to the final vector representation e code And obtaining a classification index to obtain a classification prediction result of the program to be predicted.
Further, the method for parsing the syntax tree includes: the javalang packet in Python is used.
Further, the position coordinates pos are obtained by:
1) By node n in an abstract syntax tree T AST Depth n of depth With the abstract syntax tree T AST Depth T of depth Calculating the coordinates of node n
Figure BDA0002938320880000021
2) Through node nParent node x value, number of siblings and position n of node n in siblings q To obtain the coordinates of the node n
Figure BDA0002938320880000022
3) And obtaining the position coordinate pos of the node n according to the coordinate x and the coordinate y.
Further, the framework for training the code classification model includes: pyTorch frame.
Further, the post-encoding result E (Φ (n)) = W E Phi (n), where W E ∈R N′*E N' is the number of different node types, and E is the embedding dimension.
Further, the final vector representation
Figure BDA0002938320880000023
Where (x, y) is the coordinate of node n in the coordinate system with the root node s as the origin.
Further, the association coefficient WS is calculated by:
1) For any two paths L i And L j Computing end node semantic similarity
Figure BDA0002938320880000024
Wherein v (L) i1 ) Represents a path L i End node of v (L) j1 ) Represents a path L j J is the serial number of the path, and j is not equal to i;
2) For the two paths L i And L j Calculating the path semantic similarity WS path =sigmoid(W path ·[emb(L i ),emb(L j )]+b path ) Wherein W is path ∈R 6E*1 E is the embedding dimension when encoding the correspondence phi (n), b path Is an offset;
3) Correlation coefficient WS i,j =α*WS token +β*WS path Where α is the first coefficient and β is the second coefficient.
Further, the path representation
Figure BDA0002938320880000031
Wherein W v For linear transformation, N L Is divided by path L i Other path sets than the above.
Further, a classification index is obtained by:
1) For the final vector representation e code Performing linear and nonlinear transformation;
2) Classifying the transformed result through a Softmax () function to obtain the probability distribution P of the predicted result d
3) Selecting a probability distribution P d Obtaining the classification index of the program to be predicted according to the index corresponding to the medium maximum value
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
Compared with the prior art, the invention has the following advantages:
1) The signal-to-noise ratio in the code representation process can be improved, so that the accuracy of machine classification codes is improved;
2) According to the classification of the codes, the working efficiency of programmers in the aspects of code understanding and code maintenance is improved.
Drawings
FIG. 1 is a flow chart of a high signal-to-noise ratio code classification method suitable for use in an open source software supply chain.
Fig. 2 is a diagram illustrating a PE-AST path corresponding to a code.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings and examples.
The general flow of the high snr code classification method of the present embodiment is shown in fig. 1, and mainly includes the following steps:
1. and converting the code to be predicted into the PE-AST. The concrete description is as follows:
1a) AST analysis is carried out on the java program by adopting a java bag in Python, and an abstract syntax tree of the code segment is generated and recorded as T AST ,T AST = (= N, T, X, s, δ, Φ), N is a set of non-end nodes, T is a set of end nodes, X is an actual value of each node, s is a root node, δ is a correspondence between a parent node and a child node, and Φ is a correspondence between a node and an actual value. Go to 1 b).
1b) Traverse T AST Generates the position coordinates pos of each node, pos = (x, y). For a certain node n, the node n,
Figure BDA0002938320880000041
wherein n is depth For node n at T AST Depth of middle, T depth Is T AST Depth of (i.e. T) AST The depth of the deepest node in the cluster. />
Figure BDA0002938320880000042
Wherein x p For the value x, n, of the parent node of node n num Is the number of n siblings of node n, n i The positions of the node n in the sibling nodes are 1 from the left, and the positions are sequentially increased. Go to 1 c).
1c) Building tuples<T AST ,pos>PE-AST as the segment code.
2. Each node is digitized. The concrete description is as follows:
2a) Using a matrix W E ∈R N*E Encode φ (n), i.e.:
E(φ(n))=W E ·φ(n)
wherein, N is the number of different node types, the types include: there are many types of initialization, stabilization, identification, etc., and E is the embedding dimension. Go to 2 b).
2b) Calculating an embedding weight coefficient according to the distance of the node n relative to the origin, and multiplying the embedding weight coefficient by the result in a) to obtain a final node vector representation, namely:
Figure BDA0002938320880000043
where v (n) is the final vector representation of node n, with the origin being the root node of the AST.
3. And extracting a PE-AST path in the PE-AST. A PE-AST path is a sequence n of length k 1 …n k-1 s, wherein n 1 Is one end node in PE-AST, s is root node; for i e [2,k-1]N of (A) to (B) i Are all non-end nodes; for a PE-AST path, the form is<n 1 ,p,s>P represents the removal of n from the sequence 1 And the section of s. A PE-AST path starts at an end node, ends at a root node, and traverses a series of non-end nodes. For example, the portion enclosed by the dashed box in fig. 2 is an example of the PE-AST path.
4. Converting the PE-AST path into a tuple that can be operated on, the tuple being in the form of<v(n 1 ),v(p),v(s)>. The concrete description is as follows:
4a) For the calculation of v (p), the calculation is expressed as summing the vectors of other nodes except the starting point and the end point on the PE-AST path, that is:
Figure BDA0002938320880000044
go to 4 b).
4b) Building triplets<v(n 1 ),v(p),v(s)>As a vector representation for the PE-AST path for calculation.
5. And selecting any PE-AST path, and calculating the association coefficient WS of other PE-AST paths and the path. The correlation coefficient is a coefficient indicating the degree of co-operation of different PE-AST paths in the same segment of code. The method comprises the following specific steps:
5a) For two PE-AST paths L 1 ,L 2 Calculating end node semantic similarity WS token Namely:
Figure BDA0002938320880000051
wherein, v (L) 11 ) Represents L 1 End node of v (L) 21 ) Represents L 2 The end node of (1). Go to 5 b).
5b) For two PE-AST paths L 1 ,L 2 Calculating the path semantic similarity WS path Namely:
WS path =sigmoid(W path ·[emb(L 1 ),emb(L 2 )]+b path )
wherein, W path ∈R 6E*1 ,b path Is the offset. Go to 5 c).
5c) By end node semantic similarity WS token And path semantic similarity WS path The average sum results in a path correlation coefficient WS, which is:
WS=0.5*WS token +0.5*WS path
6. the path representation is updated. Handle
Figure BDA0002938320880000052
Expressed as input, the PE-AST path set, N L Is the number of PE-AST paths, if emb (L) is input and the updated path representation z is output, it can be calculated by the following formula:
Figure BDA0002938320880000053
wherein, W v For linear transformation, 1 × 1 convolution kernel is adopted, i represents a certain PE-AST path, and j represents the rest PE-AST paths.
7. And predicting according to the PE-AST characteristic vector, and specifically comprising the following steps:
6a) All the updated PE-AST paths are subjected to maximum pooling to obtain the final vector representation of the whole code, namely e code =[max(z i,1 ),max(z i,2 ),...,max(z i,E )]Wherein i ∈ [1,N p ]Go to 6 b).
6b) E to be the feature vector of the whole segment code code After linear and nonlinear transformation, probability distribution P of prediction result is obtained by adopting Softmax () function d Selecting the index corresponding to the maximum value in the results as the final prediction result to obtain the classification prediction result, namely P d =Softmax(ReLU(W code ·e code +b code ) In a batch process), wherein,
Figure BDA0002938320880000054
Figure BDA0002938320880000055
N r number of possible answers, b code Is the offset.
8. And (3) training the model described in the steps by using a data set to obtain a trained deep learning model, wherein a PyTorch framework is used in the training process.
The inventor trains the model described in the above steps with a java14m data set, the data in the java14m data set is derived from 10,072 GitHub items, which collectively comprise 12,636,998 training samples, 371,362 verification samples and 368,445 test samples, and the cloned codes are eliminated, so that the model has strong specialization and rigor. An Adams optimizer is used in the training process, the initial learning rate is set to be 0.01, a trained deep learning model is obtained after 10 times of training of the whole data set, and a PyTorch framework is used in the training process.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A high signal-to-noise ratio code classification method suitable for an open source software supply chain comprises the following steps:
1) Analyzing the syntax tree of the program to be predicted to generate an abstract syntax tree T of the code of the program to be predicted AST And construct tuples<T AST ,pos>Wherein T is AST = (= N, T, X, s, δ, Φ), N is a non-end node set, T is an end node set, s is a root node, X is an actual value of each node in the abstract syntax tree, δ is a correspondence between a parent node and a child node in the abstract syntax tree, Φ is a correspondence between each node in the abstract syntax tree and an actual value, pos is a position coordinate of each node in the abstract syntax tree;
2) Will tuple<T AST ,pos>Inputting a code classification model to obtain a classification prediction result of a program to be predicted;
wherein the code classification model is based on classification indexes of a plurality of sample programs and corresponding tuples<T′ AST ,pos′>And is obtained by deep learning method training; the code classification model parses tuples by<T AST ,pos>:
a) Coding a corresponding relation phi (n) of each node, mapping a coded result to a vector space, and obtaining a final vector representation v (n) of each node according to the obtained node vector representation and the distance from the corresponding position coordinate pos to a root node s;
b) According to the non-end node set N, the end node set T, the root node s and the corresponding relation delta, the abstract syntax tree T is paired AST Path extraction is carried out, and a path L is constructed by combining the final vector representation v (n) i Vector representation of (c) emb (L) i ) Wherein i is the end node number;
c) The vector representation emb (L) is updated by computing the correlation coefficient WS of each path with other paths i ) Obtaining a path representation z i And for each path, represents z i Performing maximum pooling to obtain a final vector representation e of the program code to be predicted code (ii) a Wherein the vector representation emb (L) is updated by calculating the correlation coefficient WS of each path with other paths i ) Obtaining a path representation z i The method comprises the following steps:
for any two paths L i And L j Computing end node semanticsDegree of similarity
Figure FDA0003890274640000011
Wherein v (L) i1 ) Represents a path L i End node of v (L) j1 ) Represents the path L j J is the serial number of the path, and j is not equal to i;
for the two paths L i And L j Calculating the path semantic similarity WS path =sigmoid(W path ·[emb(L i ),emb(L j )]+b path ) Wherein W is path ∈R 6E*1 E is the embedding dimension when encoding the correspondence phi (n), b path Is an offset;
calculating the correlation coefficient WS i,j =α*WS token +β*WS path Where α is a first coefficient and β is a second coefficient;
using said correlation coefficient WS i,j Updating the vector representation emb (L) i ) To obtain a path representation
Figure FDA0003890274640000012
Figure FDA0003890274640000013
Wherein W v For linear transformation, N L Is divided by a path L i Other sets of paths than;
d) According to the final vector representation e code And obtaining a classification index to obtain a classification prediction result of the program to be predicted.
2. The method of claim 1, wherein the method of syntax tree parsing comprises: javalang packets in Python are used.
3. The method of claim 1, wherein the position coordinates pos are obtained by:
1) By node n in an abstract syntax tree T AST Depth n of depth With the abstract syntax tree T AST Depth T of depth Calculating the coordinates of node n
Figure FDA0003890274640000021
2) By the parent x value of node n, the number of siblings and the position n of node n in the siblings q To obtain the coordinates of the node n
Figure FDA0003890274640000022
3) And obtaining the position coordinate pos of the node n according to the coordinate x and the coordinate y.
4. The method of claim 1, wherein training a framework of a code classification model comprises: pyTorch frame.
5. The method of claim 1, wherein a result after encoding E (Φ (n)) = W E Phi (n), final vector representation
Figure FDA0003890274640000023
Wherein W E ∈R N′*E N' is the number of different node types, E is the embedding dimension, and (x, y) is the coordinate of node N in the coordinate system with root node s as the origin.
6. The method of claim 1, wherein the classification index is obtained by:
1) For the final vector representation e code Performing linear and nonlinear transformation;
2) Classifying the transformed result through a Softmax () function to obtain the probability distribution P of the predicted result d
3) Selecting a probability distribution P d Obtaining the classification index of the program to be predicted according to the index corresponding to the medium maximum value
7. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when executed, perform the method of any of claims 1-6.
8. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-6.
CN202110168454.XA 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain Active CN112905186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110168454.XA CN112905186B (en) 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110168454.XA CN112905186B (en) 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain

Publications (2)

Publication Number Publication Date
CN112905186A CN112905186A (en) 2021-06-04
CN112905186B true CN112905186B (en) 2023-04-07

Family

ID=76123652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110168454.XA Active CN112905186B (en) 2021-02-07 2021-02-07 High signal-to-noise ratio code classification method and device suitable for open-source software supply chain

Country Status (1)

Country Link
CN (1) CN112905186B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9262406B1 (en) * 2014-05-07 2016-02-16 Google Inc. Semantic frame identification with distributed word representations

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2006322637B2 (en) * 2005-12-06 2011-07-28 National Ict Australia Limited A succinct index structure for XML
CN101697121A (en) * 2009-10-26 2010-04-21 哈尔滨工业大学 Method for detecting code similarity based on semantic analysis of program source code
US11334692B2 (en) * 2017-06-29 2022-05-17 International Business Machines Corporation Extracting a knowledge graph from program source code
CN107729925B (en) * 2017-09-26 2020-03-31 中国科学技术大学 Method for automatically classifying and scoring program competition type source codes according to problem solving method
CN109445834B (en) * 2018-10-30 2021-04-30 北京计算机技术及应用研究所 Program code similarity rapid comparison method based on abstract syntax tree
CN110597735B (en) * 2019-09-25 2021-03-05 北京航空航天大学 Software defect prediction method for open-source software defect feature deep learning
CN112181428B (en) * 2020-09-28 2021-10-22 北京航空航天大学 Abstract syntax tree-based open-source software defect data classification method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9262406B1 (en) * 2014-05-07 2016-02-16 Google Inc. Semantic frame identification with distributed word representations

Also Published As

Publication number Publication date
CN112905186A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
Wilson et al. Quantum kitchen sinks: An algorithm for machine learning on near-term quantum computers
CN111368920B (en) Quantum twin neural network-based classification method and face recognition method thereof
CN109871454B (en) Robust discrete supervision cross-media hash retrieval method
CN112199670B (en) Log monitoring method for improving IFOREST (entry face detection sequence) to conduct abnormity detection based on deep learning
CN113987174A (en) Core statement extraction method, system, equipment and storage medium for classification label
CN113076545A (en) Deep learning-based kernel fuzzy test sequence generation method
CN115658846A (en) Intelligent search method and device suitable for open-source software supply chain
CN117633811A (en) Code vulnerability detection method based on multi-view feature fusion
CN116361788A (en) Binary software vulnerability prediction method based on machine learning
Ye et al. PTaRL: Prototype-based tabular representation learning via space calibration
CN117077586B (en) Register transmission level resource prediction method, device and equipment for circuit design
CN111737694B (en) Malicious software homology analysis method based on behavior tree
CN112381280B (en) Algorithm prediction method based on artificial intelligence
CN113392929A (en) Biological sequence feature extraction method based on word embedding and self-encoder fusion
Wu et al. Discovering Mathematical Expressions Through DeepSymNet: A Classification-Based Symbolic Regression Framework
CN112905186B (en) High signal-to-noise ratio code classification method and device suitable for open-source software supply chain
CN117271701A (en) Method and system for extracting system operation abnormal event relation based on TGGAT and CNN
CN116861373A (en) Query selectivity estimation method, system, terminal equipment and storage medium
KR20220129120A (en) Using genetic programming to create generic building blocks
CN116483437A (en) Cross-language or cross-library application program interface mapping method based on representation learning
CN110825707A (en) Data compression method
CN116226864A (en) Network security-oriented code vulnerability detection method and system
CN114565063A (en) Software defect prediction method based on multi-semantic extractor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant