CN112328743A - Code searching method and device, readable storage medium and electronic equipment - Google Patents

Code searching method and device, readable storage medium and electronic equipment Download PDF

Info

Publication number
CN112328743A
CN112328743A CN202011211683.7A CN202011211683A CN112328743A CN 112328743 A CN112328743 A CN 112328743A CN 202011211683 A CN202011211683 A CN 202011211683A CN 112328743 A CN112328743 A CN 112328743A
Authority
CN
China
Prior art keywords
code
feature
search
candidate
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011211683.7A
Other languages
Chinese (zh)
Inventor
任雷鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN202011211683.7A priority Critical patent/CN112328743A/en
Publication of CN112328743A publication Critical patent/CN112328743A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a code searching method, a code searching device, a readable storage medium and electronic equipment. And further extracting the characteristics of the corresponding method name, calling sequence and symbol mark. And simultaneously, extracting the characteristics of the search information to obtain search characteristics, and determining matched target code segments according to the corresponding method names, calling sequence, extraction results of the symbol marks and candidate code characteristics corresponding to the candidate code segments of the search characteristics and the similarity of the candidate code characteristics and the search characteristics. The embodiment of the invention divides the candidate code segment and extracts the candidate code characteristics, simultaneously extracts the search characteristics of the search information, and carries out code search based on the candidate code characteristics and the search characteristics, thereby improving the accuracy of the code search result.

Description

Code searching method and device, readable storage medium and electronic equipment
Technical Field
The present invention relates to the field of data processing, and in particular, to a code search method, apparatus, readable storage medium, and electronic device.
Background
With the rapid development of computer science and technology, the research and application of computer science in various fields are going on in deeper and wider directions, and like other disciplines, the theoretical research of computer science also does not leave the scientific and effective experiment, but as the key part of the experiment, the open source code becomes an important bridge of academic communication, playing a very important role in the development and progress of the discipline. The open source service provides convenience for people to learn and exchange and brings much trouble to people to use, and most typically, users need to pay huge time and effort to filter recommendation codes returned by search engines. Therefore, how to reasonably and effectively evaluate the relevance of the code and the retrieval query is an urgent problem to be solved in the field of code search.
Disclosure of Invention
In view of this, embodiments of the present invention provide a code search method, a code search device, a readable storage medium, and an electronic device, and aim to provide an efficient code search method and improve accuracy of a code search result.
In a first aspect, an embodiment of the present invention discloses a code searching method, where the method includes:
determining search information and a plurality of candidate code segments, wherein the search information is used for describing corresponding target code segments;
determining a method name, a calling sequence and a symbol mark corresponding to each candidate code segment;
for each candidate code segment, respectively performing feature extraction on the corresponding method name, calling sequence and symbol mark to determine a first code feature, a second code feature and a third code feature;
performing feature extraction on the search information to determine search features;
determining candidate code characteristics corresponding to the candidate code segments according to the corresponding first code characteristics, second code characteristics, third code characteristics and search characteristics;
and determining the similarity of each candidate code feature and the search feature so as to determine the target code segment matched with the search information.
In a second aspect, an embodiment of the present invention provides a code searching apparatus, where the apparatus includes:
the information determining module is used for determining search information and a plurality of candidate code segments, wherein the search information is used for describing corresponding target code segments;
the information splitting module is used for determining the method name, the calling sequence and the symbol mark corresponding to each candidate code segment;
the first extraction module is used for respectively extracting the characteristics of the corresponding method name, calling sequence and symbol mark for each candidate code segment so as to determine a first code characteristic, a second code characteristic and a third code characteristic;
the second extraction module is used for extracting the characteristics of the search information to determine the search characteristics;
the characteristic determining module is used for determining candidate code characteristics corresponding to the candidate code segments according to the corresponding first code characteristics, second code characteristics, third code characteristics and searching characteristics;
and the information matching module is used for determining the similarity of each candidate code characteristic and the search characteristic so as to determine a target code segment matched with the search information.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer program instructions, which when executed by a processor implement the method according to the first aspect.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, wherein the memory is configured to store one or more computer program instructions, and wherein the one or more computer program instructions are executed by the processor to implement the method according to the first aspect.
According to the method and the device for dividing the target code segment, the plurality of candidate code segments and the search information for describing the corresponding target code segments are determined, and the candidate code segments are divided to obtain the corresponding method names, calling sequences and symbol marks. And further extracting the characteristics of the corresponding method name, calling sequence and symbol mark. And simultaneously, extracting the characteristics of the search information to obtain search characteristics, and determining matched target code segments according to the corresponding method names, calling sequence, extraction results of the symbol marks and candidate code characteristics corresponding to the candidate code segments of the search characteristics and the similarity of the candidate code characteristics and the search characteristics. The embodiment of the invention divides the candidate code segment and extracts the candidate code characteristics, simultaneously extracts the search characteristics of the search information, and carries out code search based on the candidate code characteristics and the search characteristics, thereby improving the accuracy of the code search result.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a code search method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a candidate code segment splitting process according to an embodiment of the present invention;
FIG. 3 is a flow chart of a candidate code feature determination process according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a candidate code segment and search information similarity determination process according to an embodiment of the present invention;
FIG. 5 is a diagram of a code search apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
The code searching method provided by the embodiment of the invention can be applied to any equipment such as terminal equipment or a server and the like which can be deployed with a searching framework applying the code searching method provided by the embodiment of the invention. When receiving the input search information, the server or the terminal device deploying the search framework applying the code search method of the embodiment of the invention searches the matched code segments, and further displays or outputs the searched code segments. The terminal device may be a general-purpose data processing terminal with an acceleration sensor, such as a smart phone or a tablet computer, capable of running a computer program. The server may be a single server or a cluster of servers configured in a distributed manner.
Fig. 1 is a flowchart of a code search method according to an embodiment of the present invention. As shown in fig. 1, the code search method according to the embodiment of the present invention includes the following steps:
and step S100, determining search information and a plurality of candidate code segments.
Specifically, the search information is used to describe a corresponding target code segment, and the candidate code segment is content to be searched and is used to implement a corresponding function. Optionally, each candidate code segment includes at least one function for implementing the corresponding function. When code searching is carried out, the searching information is information which is input by a user and used for describing an object code segment which the user wants to obtain. Alternatively, the search information may be a function corresponding to the target code fragment, such as "serialization of an image object", one or more function names included in the target code fragment, such as "number form. The candidate code segment may be a code segment pre-stored in a terminal device or a server that performs a code search method, or a code segment stored in a database connected to the terminal device or the server, and a code segment acquired from an open source website.
Step S200, determining the method name, calling sequence and symbol mark corresponding to each candidate code segment.
Specifically, each candidate code segment includes a corresponding method name, a call order, and a symbol mark, and after a plurality of candidate code segments are determined, each candidate code segment is split, so that the method name, the call order, and the symbol mark corresponding to each candidate code segment are determined by the split method. The method name is used for representing functions realized by the corresponding candidate code segment, the calling sequence is used for representing the sequence of calling each function included in the corresponding candidate code segment by an application program interface, and the symbolic mark is used for representing other underwear except the method name and the calling sequence in the candidate code segment, such as a keyword set, wherein the keywords include keywords obtained by segmenting all functions and parameters in the candidate code segment. The following code segment is taken as an example of a candidate code segment in the embodiment of the present invention for explanation:
private static NumberFormant getPercentInstance
(int minimumFractionDigits,Locale locale){
NumberFormant nf=NumberFormant.getPercentInstance(locale);
nf.setMinimumFractionDigits(minimumFractionDigits);
return nf;
}
in the embodiment of the present invention, after the candidate code segment is split, a corresponding method is determined to be named "getfacentinstance", a calling sequence is "number command.
Fig. 2 is a schematic diagram of a candidate code fragment splitting process according to an embodiment of the present invention. As shown in fig. 2, the candidate code segment 20 includes a plurality of feature parameters, and the plurality of feature parameters may be split into a method name 21, a calling order 22, and a symbol mark 23.
Step S300, for each candidate code segment, respectively performing feature extraction on the corresponding method name, calling order, and symbol mark to determine a first code feature, a second code feature, and a third code feature.
Specifically, after determining the method name, the calling sequence and the symbol mark corresponding to each candidate code segment, respectively performing feature extraction on the method name, the calling sequence and the symbol mark to obtain a first code feature, a second code feature and a third code feature. In order to improve the accuracy of the feature extraction process, feature extraction can be performed by three different feature extraction methods, or feature extraction can be performed on information corresponding to the three candidate code segments by using the same feature extraction model trained respectively, so as to obtain a first code feature, a second code feature and a third code feature.
In the embodiment of the present invention, the feature extraction process may be a first feature extraction layer, a second feature extraction layer, and a third feature extraction layer, which are obtained by inputting a method name, a calling order, and a symbol label into a pre-training device, respectively, to output a first code feature, a second code feature, and a third code feature, where at least one of the first feature extraction layer, the second feature extraction layer, and the third feature extraction layer is a bidirectional long-short term memory network. Optionally, the first feature extraction layer, the second feature extraction layer, and the third feature extraction layer in the embodiment of the present invention are all bidirectional long-short term memory networks, and may be obtained by separate training or by joint training. The bidirectional long and short term memory network comprises a forward long and short term memory network and a backward long and short term memory network, wherein the forward long and short term memory network transmits t according to the input information and the state information at the previous moment0Time tnThe information of the time is calculated once in the forward direction and each time is storedHiding the output result of the layer from time to time. The backward long-short term memory network also outputs t according to the input information and state information of the previous momentnTime t0And the information of the time is reversely calculated once, the output result of the backward hidden layer at each time is stored, and finally the final output is obtained by combining the output results of the forward hidden layer and the backward hidden layer at each time.
The first feature extraction layer, the second feature extraction layer and the third feature extraction layer are all bidirectional long-short term memory networks for explanation. In the feature extraction process, the method names, calling orders and symbol marks corresponding to the candidate code segments are input into an embedding layer respectively to obtain corresponding first vector representation, second vector representation and third vector representation. And then, the first vector representations are respectively used as forward input and backward input, input into the first feature extraction layer for feature extraction, and output into a first feature vector corresponding to the method name. And respectively taking the second vector representations as forward input and backward input, inputting the second vector representations into the second feature extraction layer for feature extraction, and outputting second feature vectors corresponding to the calling sequence. And respectively taking the third vector representations as forward input and backward input, inputting the third vector representations into the third feature extraction layer for feature extraction, and outputting a third feature vector corresponding to the symbol mark.
Further, in the embodiment of the present invention, the features obtained after the feature extraction is performed by the first feature extraction layer, the second feature extraction layer, and the third feature extraction layer also need to be input into the hidden layer, so that the first code feature, the second code feature, and the third code feature with specific dimensions are output after the conversion by the hidden layer.
And step S400, extracting the characteristics of the search information to determine the search characteristics.
Specifically, the process of extracting the features of the search information is to input the search information into a trained fourth feature extraction layer to output corresponding search features. In an embodiment of the present invention, the fourth feature extraction layer is a bidirectional long-short term memory network. In the feature extraction process, the search is converted into a fourth vector representation corresponding to the search information through an embedding layer, the fourth vector representation is used as vector input and backward input respectively, and the fourth vector representation is input into the fourth feature extraction layer for feature extraction so as to input the search features corresponding to the search information. Further, in the embodiment of the present invention, the features obtained by performing feature extraction on the fourth feature extraction layer also need to be input into the hidden layer, so that the search features of a specific dimension are output after conversion by the hidden layer.
Step S500, determining candidate code characteristics corresponding to the candidate code segments according to the corresponding first code characteristics, second code characteristics, third code characteristics and search characteristics.
Specifically, after the search features of the search information and the first code features, the second code features, and the third code features corresponding to the candidate code segments are determined, in order to improve the degree of correlation between the features corresponding to the candidate code segments and the search features, features related to the search features may be extracted from at least one of the first code features, the second code features, and the third code features corresponding to the candidate code segments by introducing an attention mechanism. Furthermore, the candidate code characteristics corresponding to the candidate code segments are determined according to the characteristics output by the attention mechanism.
Fig. 3 is a flowchart of a candidate code feature determination process according to an embodiment of the present invention. As shown in fig. 3, the process of determining candidate code features includes the following steps:
step S510, inputting the search feature and each third code feature into an attention mechanism layer to determine a fourth code feature corresponding to each candidate code segment.
Specifically, in the embodiment of the present invention, the search feature and the third code feature corresponding to each candidate code segment are input to the attention mechanism layer, and the fourth code feature corresponding to each candidate code segment is output. The attention mechanism is used to select important information among the input information while ignoring useless information. For example, when an attention mechanism is introduced into the long-short term memory network model, the attention mechanism assigns an attention weight to an output item corresponding to each input item in the long-short term memory network model, and different weight magnitudes reflect the contribution levels of different parts of input information in the training process, so that the model can capture important features in the input information to distinguish differences between the input information. That is to say, after the search feature and the third code feature extracted based on the symbolic sign are input to the attention mechanism layer in the embodiment of the present invention, the attention mechanism layer retains the information with higher degree of correlation with the search feature in the third code feature, and discards the information with lower degree of correlation with the search feature to obtain the final output as the fourth code feature. Therefore, the fourth code feature is feature information having a higher degree of correlation with the search information among the symbol marks of the corresponding candidate code segments.
And step S520, determining candidate code characteristics according to the corresponding first code characteristics, second code characteristics and fourth code characteristics.
Specifically, the candidate code feature determination method corresponding to the candidate code segment may be obtained by directly inputting the corresponding first code feature, second code feature and fourth code feature into the feature splicing layer to splice the features. Furthermore, because the first code feature, the second code feature and the fourth code feature respectively represent the corresponding method name feature, the calling sequence feature and the symbol marking feature, and the functions of the code features in the searching process are different, before the feature splicing is carried out on the first code feature, the second code feature and the fourth code feature, the code features can be weighted respectively, and the weighted first code feature, the second code feature and the fourth code feature are input into the feature splicing layer to obtain the candidate code features corresponding to the candidate code segments through the splicing features.
Further, the candidate code features may also be determined in other manners, for example, the first code feature, the second code feature, and the fourth code feature are input to a neural network layer obtained by pre-training, and corresponding candidate code features are output.
And S600, determining the similarity of each candidate code feature and each search feature to determine a target code segment matched with the search information.
Specifically, after candidate code features corresponding to each candidate code segment are determined, the corresponding candidate code segments are screened by determining the similarity of each candidate code feature and the search feature, so that the candidate code segments with the corresponding similarity meeting the preset conditions are determined to be the target code segments matched with the search information. The similarity calculation process may be a process of directly calculating cosine similarities of the candidate code features and the search features, and the pre-similarity is used as a corresponding similarity. Or, each candidate code feature and each search feature may be input to a similarity calculation layer, and the corresponding similarity is obtained after the input candidate code features and the input search features are processed by an activation function (softmax).
After determining the similarity corresponding to each candidate code feature and the search feature, the candidate code segment with the corresponding similarity greater than the threshold may be further determined as the target code segment matched with the search information. Alternatively, the target code segment may be searched for among the candidate code segments by other filtering conditions. For example, the candidate code segments are sorted from large to small according to the corresponding similarity, so as to determine that the N candidate code segments with the maximum similarity are the target code segments matched with the search information.
Fig. 4 is a schematic diagram of a candidate code segment and search information similarity determination process according to an embodiment of the present invention. As shown in fig. 4, in the process of searching codes based on the code searching method of the embodiment of the present invention, a plurality of candidate code segments 40 to be searched and search information 41 for describing a target code segment are determined. Splitting each candidate code segment 41 to obtain a corresponding function name 42, a calling sequence 43 and a symbol mark 44, converting each split information into corresponding vector representation through an embedding layer, respectively inputting the vector representation into a first feature extraction layer 45, a second feature extraction layer 46 and a third feature extraction layer 47, and converting the vector representation into a first code feature, a second code feature and a third code feature of a specific dimension through a hiding layer after obtaining corresponding output. Meanwhile, the search information 41 is converted into a corresponding vector representation through an embedding layer and then input into a fourth feature extraction layer 48, and a corresponding output is obtained and then converted into a search feature with a specific dimension through a hiding layer. In the embodiment of the present invention, the search features and the respective third code features are also input to the attention mechanism layer 49 to output the fourth code features having strong correlation with the search information. For each candidate code segment, the corresponding first code feature, second code feature and fourth code feature are weighted or directly input into the feature splicing layer 4A to obtain the corresponding candidate code feature. And inputting the candidate code characteristics corresponding to the candidate code segments and the search characteristics corresponding to the search information into the similarity calculation layer 4B together to obtain corresponding similarity 4C. And further screening the candidate code segments according to the corresponding similarity to obtain the target code segments matched with the search information.
The code searching method of the embodiment of the invention extracts the searching characteristics of the searching information, simultaneously divides the candidate code segments to obtain a plurality of code information, and respectively extracts the characteristics based on the code information so as to determine the candidate code characteristics with higher correlation degree with the searching information in the candidate code segments according to the characteristics and the searching characteristics extracted from the code information. The code searching is carried out based on the candidate code characteristics and the searching characteristics with high correlation degree with the searching information, and the accuracy of the code searching result is improved.
Fig. 5 is a schematic diagram of a code search apparatus according to an embodiment of the present invention. As shown in fig. 5, the code search apparatus includes an information determination module 50, an information splitting module 51, a first extraction module 52, a second extraction module 53, a feature determination module 54, and an information matching module 55.
In particular, the information determination module 50 is configured to determine search information describing a corresponding target code segment and a plurality of candidate code segments. The information splitting module 51 is configured to determine a method name, a calling order, and a symbol mark corresponding to each candidate code segment. The first extraction module 52 is configured to, for each candidate code segment, perform feature extraction on the corresponding method name, calling order, and symbolic label to determine a first code feature, a second code feature, and a third code feature. The second extraction module 53 is configured to perform feature extraction on the search information to determine a search feature. The feature determining module 54 is configured to determine candidate code features corresponding to the candidate code segments according to the corresponding first code features, second code features, third code features and search features. The information matching module 55 is configured to determine similarity of each candidate code feature and the search feature to determine a target code segment matching the search information.
The code searching device of the embodiment of the invention extracts the searching characteristics of the searching information, simultaneously divides the candidate code segments to obtain a plurality of code information, and respectively extracts the characteristics based on the code information so as to determine the candidate code characteristics with higher correlation degree with the searching information in the candidate code segments according to the characteristics and the searching characteristics extracted from the code information. The code searching is carried out based on the candidate code characteristics and the searching characteristics with high correlation degree with the searching information, and the accuracy of the code searching result is improved.
Fig. 6 is a schematic diagram of an electronic device according to an embodiment of the invention. As shown in fig. 6, the electronic device shown in fig. 6 is a general address query device, which includes a general computer hardware structure, which includes at least a processor 60 and a memory 61. The processor 60 and the memory 61 are connected by a bus 62. The memory 61 is adapted to store instructions or programs executable by the processor 60. Processor 60 may be a stand-alone microprocessor or a collection of one or more microprocessors. Thus, processor 60 implements the processing of data and the control of other devices by executing instructions stored by memory 61 to thereby perform the method flows of embodiments of the present invention as described above. The bus 62 connects the above components together, and also connects the above components to a display controller 63 and a display device and an input/output (I/O) device 64. Input/output (I/O) devices 64 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 64 are connected to the system through input/output (I/O) controllers 65.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device) or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may employ a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions.
These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.
These computer program instructions may also be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.
Another embodiment of the invention is directed to a non-transitory storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be accomplished by specifying the relevant hardware through a program, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (16)

1. A method of code searching, the method comprising:
determining search information and a plurality of candidate code segments, wherein the search information is used for describing corresponding target code segments;
determining a method name, a calling sequence and a symbol mark corresponding to each candidate code segment;
for each candidate code segment, respectively performing feature extraction on the corresponding method name, calling sequence and symbol mark to determine a first code feature, a second code feature and a third code feature;
performing feature extraction on the search information to determine search features;
determining candidate code characteristics corresponding to the candidate code segments according to the corresponding first code characteristics, second code characteristics, third code characteristics and search characteristics;
and determining the similarity of each candidate code feature and the search feature so as to determine the target code segment matched with the search information.
2. The method according to claim 1, wherein the determining the method name, the calling order, and the symbol mark corresponding to each candidate code segment specifically comprises:
and splitting each candidate code segment to determine the corresponding method name, calling sequence and symbol mark.
3. The method according to claim 1, wherein the feature extraction is performed on the corresponding method name, the call order, and the symbolic label to determine the first code feature, the second code feature, and the third code feature specifically:
and respectively inputting corresponding method names, calling sequences and symbol marks into a first feature extraction layer, a second feature extraction layer and a third feature extraction layer obtained by training so as to output a first code feature, a second code feature and a third code feature, wherein at least one of the first feature extraction layer, the second feature extraction layer and the third feature extraction layer is a bidirectional long-short term memory network.
4. The method according to claim 1, wherein the feature extraction is performed on the search information to determine search features specifically as:
and inputting the search information into a fourth feature extraction layer obtained by training to output corresponding search features, wherein the fourth feature extraction layer is a bidirectional long-short term memory network.
5. The method of claim 1, wherein determining candidate code features corresponding to each of the candidate code snippets based on the corresponding first, second, third, and search features comprises:
inputting the search features and the third code features into an attention mechanism layer to determine fourth code features corresponding to the candidate code segments;
and determining candidate code features according to the corresponding first code features, second code features and fourth code features.
6. The method according to claim 5, wherein the determining candidate code features from the corresponding first, second and fourth code features is specifically:
inputting the corresponding first code features, second code features and fourth code features into the feature concatenation layer to determine candidate code features.
7. The method of claim 1, wherein determining a similarity between each of the candidate code features and the search feature to determine a target code segment matching the search information comprises:
calculating cosine similarity of each candidate code feature and each search feature as corresponding similarity;
and determining the candidate code segments with the corresponding similarity larger than the threshold value as the candidate code segments matched with the search information.
8. A code search apparatus, characterized in that the apparatus comprises:
the information determining module is used for determining search information and a plurality of candidate code segments, wherein the search information is used for describing corresponding target code segments;
the information splitting module is used for determining the method name, the calling sequence and the symbol mark corresponding to each candidate code segment;
the first extraction module is used for respectively extracting the characteristics of the corresponding method name, calling sequence and symbol mark for each candidate code segment so as to determine a first code characteristic, a second code characteristic and a third code characteristic;
the second extraction module is used for extracting the characteristics of the search information to determine the search characteristics;
the characteristic determining module is used for determining candidate code characteristics corresponding to the candidate code segments according to the corresponding first code characteristics, second code characteristics, third code characteristics and searching characteristics;
and the information matching module is used for determining the similarity of each candidate code characteristic and the search characteristic so as to determine a target code segment matched with the search information.
9. The apparatus of claim 8, wherein the information splitting module:
and the information splitting submodule is used for splitting each candidate code segment so as to determine the corresponding method name, calling sequence and symbol mark.
10. The apparatus according to claim 8, wherein the first extraction module is specifically:
and the first extraction submodule is used for respectively inputting corresponding method names, calling sequences and symbol marks into a first feature extraction layer, a second feature extraction layer and a third feature extraction layer obtained by training so as to output a first code feature, a second code feature and a third code feature, and at least one of the first feature extraction layer, the second feature extraction layer and the third feature extraction layer is a bidirectional long-short term memory network.
11. The apparatus according to claim 8, wherein the second extraction module is specifically:
and the second extraction submodule is used for inputting the search information into a fourth feature extraction layer obtained by training so as to output corresponding search features, and the fourth feature extraction layer is a bidirectional long-short term memory network.
12. The apparatus of claim 8, wherein the feature determination module comprises:
a first feature determining sub-module, configured to input the search feature and each third code feature into an attention mechanism layer to determine a fourth code feature corresponding to each candidate code segment;
and the second characteristic determining submodule is used for determining candidate code characteristics according to the corresponding first code characteristics, second code characteristics and fourth code characteristics.
13. The apparatus according to claim 12, wherein the second characteristic determining submodule is specifically:
and the feature splicing unit is used for inputting the corresponding first code feature, the second code feature and the fourth code feature into the feature splicing layer so as to determine the candidate code feature.
14. The apparatus of claim 8, wherein the information matching module comprises:
the similarity operator module is used for calculating cosine similarity of each candidate code characteristic and the search characteristic as corresponding similarity;
and the information matching sub-module is used for determining the candidate code segments with the corresponding similarity larger than the threshold as the candidate code segments matched with the search information.
15. A computer readable storage medium storing computer program instructions which, when executed by a processor, implement the method of any one of claims 1-7.
16. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-7.
CN202011211683.7A 2020-11-03 2020-11-03 Code searching method and device, readable storage medium and electronic equipment Pending CN112328743A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011211683.7A CN112328743A (en) 2020-11-03 2020-11-03 Code searching method and device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011211683.7A CN112328743A (en) 2020-11-03 2020-11-03 Code searching method and device, readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112328743A true CN112328743A (en) 2021-02-05

Family

ID=74324568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011211683.7A Pending CN112328743A (en) 2020-11-03 2020-11-03 Code searching method and device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112328743A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344023A (en) * 2021-03-25 2021-09-03 苏宁金融科技(南京)有限公司 Code recommendation method, device and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462399A (en) * 2014-06-30 2017-02-22 微软技术许可有限责任公司 Code recommendation
CN108491407A (en) * 2018-01-24 2018-09-04 大连理工大学 A kind of enquiry expanding method of code-oriented retrieval
CN109062792A (en) * 2018-07-21 2018-12-21 东南大学 A kind of Open Source Code detection method based on String matching and characteristic matching
CN110716749A (en) * 2019-09-03 2020-01-21 东南大学 Code searching method based on function similarity matching
CN111142850A (en) * 2019-12-23 2020-05-12 南京航空航天大学 Code segment recommendation method and device based on deep neural network
CN111191002A (en) * 2019-12-26 2020-05-22 武汉大学 Neural code searching method and device based on hierarchical embedding
US20200327118A1 (en) * 2020-06-27 2020-10-15 Intel Corporation Similarity search using guided reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462399A (en) * 2014-06-30 2017-02-22 微软技术许可有限责任公司 Code recommendation
CN108491407A (en) * 2018-01-24 2018-09-04 大连理工大学 A kind of enquiry expanding method of code-oriented retrieval
CN109062792A (en) * 2018-07-21 2018-12-21 东南大学 A kind of Open Source Code detection method based on String matching and characteristic matching
CN110716749A (en) * 2019-09-03 2020-01-21 东南大学 Code searching method based on function similarity matching
CN111142850A (en) * 2019-12-23 2020-05-12 南京航空航天大学 Code segment recommendation method and device based on deep neural network
CN111191002A (en) * 2019-12-26 2020-05-22 武汉大学 Neural code searching method and device based on hierarchical embedding
US20200327118A1 (en) * 2020-06-27 2020-10-15 Intel Corporation Similarity search using guided reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI XUAN; WANG QIANXIANG; JIN ZHI: "Description Reinforcement Based Code Search", JOURNAL OF SOFTWARE, 1 June 2017 (2017-06-01), pages 1405 - 1417 *
张开乐: "基于机器学习的代码搜索方法综述", 无线通信技术, 31 March 2020 (2020-03-31), pages 48 - 53 *
王婷;牟永敏;张志华;: "代码克隆检测方法研究进展", 现代计算机, no. 13, 5 May 2019 (2019-05-05), pages 32 - 38 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344023A (en) * 2021-03-25 2021-09-03 苏宁金融科技(南京)有限公司 Code recommendation method, device and system

Similar Documents

Publication Publication Date Title
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
CN103699625B (en) Method and device for retrieving based on keyword
CN111291210B (en) Image material library generation method, image material recommendation method and related devices
US8577882B2 (en) Method and system for searching multilingual documents
JP5212610B2 (en) Representative image or representative image group display system, method and program thereof, and representative image or representative image group selection system, method and program thereof
CN112463976B (en) Knowledge graph construction method taking crowd sensing task as center
US9251270B2 (en) Grouping search results into a profile page
CN111190997A (en) Question-answering system implementation method using neural network and machine learning sequencing algorithm
JP6053131B2 (en) Information processing apparatus, information processing method, and program
CN111368048A (en) Information acquisition method and device, electronic equipment and computer readable storage medium
CN106096028A (en) Historical relic indexing means based on image recognition and device
KR20200087977A (en) Multimodal ducument summary system and method
CN111666766A (en) Data processing method, device and equipment
CN103399862A (en) Method and equipment for confirming searching guide information corresponding to target query sequences
CN112347223A (en) Document retrieval method, document retrieval equipment and computer-readable storage medium
CN116975340A (en) Information retrieval method, apparatus, device, program product, and storage medium
CN111259115B (en) Training method and device for content authenticity detection model and computing equipment
CN113515589A (en) Data recommendation method, device, equipment and medium
CN113449066A (en) Method, processor and storage medium for storing cultural relic data by using knowledge graph
CN117763126A (en) Knowledge retrieval method, device, storage medium and apparatus
CN112328743A (en) Code searching method and device, readable storage medium and electronic equipment
CN116226526A (en) Intellectual property intelligent retrieval platform and method
CN115186240A (en) Social network user alignment method, device and medium based on relevance information
CN116257877A (en) Data classification grading method for privacy calculation
KR102609616B1 (en) Method and apparatus for image processing, electronic device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination