CN115658067A - Leakage code retrieval method and device and computer readable storage medium - Google Patents

Leakage code retrieval method and device and computer readable storage medium Download PDF

Info

Publication number
CN115658067A
CN115658067A CN202211296187.5A CN202211296187A CN115658067A CN 115658067 A CN115658067 A CN 115658067A CN 202211296187 A CN202211296187 A CN 202211296187A CN 115658067 A CN115658067 A CN 115658067A
Authority
CN
China
Prior art keywords
retrieval
sensitive
code
statement
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211296187.5A
Other languages
Chinese (zh)
Inventor
裴伟伟
万振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seczone Technology Co Ltd
Original Assignee
Seczone Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seczone Technology Co Ltd filed Critical Seczone Technology Co Ltd
Priority to CN202211296187.5A priority Critical patent/CN115658067A/en
Publication of CN115658067A publication Critical patent/CN115658067A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a leaked code retrieval processing method and device and a computer readable storage medium. The method comprises the following steps: when a retrieval instruction corresponding to the leakage code keyword is received, crawling project information corresponding to the leakage code keyword; constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; searching the search sentences in the item information to obtain search results corresponding to the search sentences; by implementing the scheme, the retrieval statement is constructed according to the type of the retrieval request, the item information corresponding to the leakage code keyword is retrieved, and the position information of the leakage code is obtained from the retrieval result, so that the efficiency of the retrieval of the leakage code is effectively improved.

Description

Leakage code retrieval method and device and computer readable storage medium
Technical Field
The present application relates to the field of information security technologies, and in particular, to a method and an apparatus for retrieving a leakage code, and a computer-readable storage medium.
Background
At present, when leakage codes are monitored and detected, for example, when the leakage codes on a GitHub platform are detected, the existing related technology generally adopts a mode of calling an API interface of the GitHub platform, and searches open source projects and project codes related to keywords through a search function of the interface.
The method is limited and influenced by the API interface of the GitHub platform, for example, when codes are searched, the frequency limit of the API interface of the GitHub is very large (only 30 times/minute after authentication), and quick search cannot be met; during actual search, search result items of certain item keywords may exceed 100, and after the per _ page parameter is set, the API supports to display 100 result items, and cannot completely display the search results; due to the limitation of the interface function, only the item information related to the keywords provided by the interface can be detected, and the context code segments of the items where the keywords are located cannot be provided; and the parallel search can not be carried out, thereby greatly reducing the efficiency of keyword search.
Disclosure of Invention
The embodiment of the application provides a leaked code retrieval method, a leaked code retrieval device and a computer readable storage medium, which can at least solve the problems that the related technology is low in efficiency and cannot perform parallel retrieval when leaked codes are retrieved through keywords.
A first aspect of an embodiment of the present application provides a leaked code retrieval method, including:
when a retrieval instruction corresponding to a leakage code keyword is received, crawling project information corresponding to the leakage code keyword;
constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein the retrieval request type comprises: a sensitive path retrieval request, a sensitive code keyword retrieval request and a sensitive file name retrieval request;
retrieving the retrieval statement in the item information to obtain a retrieval result corresponding to the retrieval statement; and the retrieval result is position information corresponding to the leakage code keyword.
A second aspect of the present embodiment provides a leaked code retrieving apparatus, including:
the crawling module is used for crawling project information corresponding to the leakage code keyword when a retrieval instruction of the leakage code keyword is received;
the construction module is used for constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein the retrieval request type comprises: a sensitive path retrieval request, a sensitive code keyword retrieval request and a sensitive file name retrieval request;
the retrieval module is used for retrieving the retrieval statement in the item information to obtain a retrieval result corresponding to the retrieval statement; and the retrieval result is position information corresponding to the leakage code keyword.
A third aspect of embodiments of the present application provides an electronic apparatus, including: the disclosure provides a leak code search method, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the leak code search method provided in the first aspect of the embodiment of the disclosure are implemented.
A fourth aspect of the present embodiment provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the leaked code retrieval method provided in the first aspect of the present embodiment.
In view of the above, according to the leak code retrieval method, the leak code retrieval device and the computer-readable storage medium provided by the scheme of the application, when a retrieval instruction corresponding to a leak code keyword is received, item information corresponding to the leak code keyword is crawled; constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein, the retrieval request type comprises: sensitive path retrieval request, sensitive code keyword retrieval request and sensitive file name retrieval request; retrieving the retrieval sentences in the project information to obtain retrieval results corresponding to the retrieval sentences; and the retrieval result is position information corresponding to the leaked code key words. By implementing the scheme, the search statement is constructed according to the type of the search request, the item information corresponding to the leakage code keyword is searched, and the position information of the leakage code is obtained from the search result, so that the efficiency of searching the leakage code is effectively improved.
Drawings
Fig. 1 is a schematic basic flow chart of a leaked code retrieval method according to a first embodiment of the present application;
FIG. 2 is a flowchart illustrating a detailed process of a leaked code retrieval method according to a second embodiment of the present application;
fig. 3 is a schematic block diagram of a leaked code retrieving apparatus according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to solve the problem that the API interface mode is limited and parallel search cannot be performed when searching for a leaking code in the related art, a first embodiment of the present application provides a leaking code search method, which is implemented on a GitHub platform, and as shown in fig. 1, which is a basic flowchart of the leaking code search method provided in this embodiment, the leaking code search method includes the following steps:
step 101, when a retrieval instruction corresponding to the leakage code keyword is received, crawling the project information corresponding to the leakage code keyword.
Specifically, in the present embodiment, according to a keyword related to the leakage code input by the user, all items related to the keyword are crawled, and item information corresponding to the keyword, such as an item name, item creator information, and the like, is obtained.
In some embodiments of the present embodiment, before the step of crawling the item information corresponding to the leaked code keyword, the method further includes: acquiring an account name and an account password, and performing identity authentication on the account name and the account password according to a crawler technology; and when the identity authentication is passed, a step of crawling project information corresponding to the leaked code keyword is executed.
Specifically, in this embodiment, identity authentication is performed before crawling the project information corresponding to the leakage code, an account name and an account password for logging in the GitHub platform are acquired from a pre-configured file, such as a Config file, identity authentication is performed through a crawler technology request, and when the authentication is passed, a step of crawling the project information corresponding to the keyword of the leakage code is performed.
In other embodiments of this embodiment, after the step of crawling the item information corresponding to the leaked code keyword, the method further includes: removing the weight of the item information obtained by crawling; and generating a project list based on the project information obtained by the deduplication.
Specifically, in this embodiment, there may be a case where relevant item information crawled according to keywords corresponding to the leak codes is duplicated, so that the crawled item information may be subjected to deduplication processing, and the item list total _ project _ list may be generated according to the deduplicated item information.
And 102, constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request.
Specifically, the type of the retrieval request in this embodiment includes a sensitive path retrieval request, a sensitive code keyword retrieval request, and a sensitive file name retrieval request. During actual retrieval, multiple items of information in a retrieval result need to be processed again to obtain more detailed information, so that corresponding retrieval statements can be reconstructed through different types of retrieval requests to obtain more accurate information.
In some embodiments of this embodiment, the step of constructing, according to the type of the retrieval request, a retrieval statement corresponding to the type of the retrieval request includes: acquiring a preset storage file; the storage file is used for storing search terms corresponding to the type of the search request; the search term comprises: sensitive path names, sensitive code keywords and sensitive file names; reading a search word in a storage file corresponding to the type of the search request according to the type of the search request; splicing the search terms and the search grammar to construct search sentences corresponding to the type of the search request; wherein, retrieve the grammar and include: a disclosed search grammar, an unpublished search grammar.
Specifically, in this embodiment, a search statement corresponding to a search request type is obtained by splicing a search term of the search request type with a search grammar, and a search term of the corresponding search request type is obtained by reading a content in a pre-configured storage file. Usually, a technical company has a set of naming mode and code specification mode, the specified retrieval information is stored in a corresponding file, and the retrieval range can be narrowed by using the retrieval words in the configured files, so that the retrieval speed is increased, and the technical company can conveniently manage the retrieval information. The retrieval grammars comprise retrieval grammars disclosed by a GitHub platform, such as filename, in: file, AND undisclosed retrieval grammars, such as-repo, OR, AND, NOT, AND can be used for batch retrieval AND exclusion of item information in an item list.
Further, in some embodiments of this embodiment, the step of splicing the search term with the search grammar to construct the search statement corresponding to the type of the search request includes: splicing the sensitive path name and the public retrieval grammar to construct a sensitive path retrieval statement; splicing the sensitive code key words with the undisclosed retrieval grammar to construct sensitive code key word retrieval sentences; and splicing the sensitive file name with the public retrieval grammar to construct a sensitive file name retrieval statement.
Specifically, in this embodiment, the sensitive path retrieval statement is obtained by splicing and constructing a sensitive path name read from a path.db file and a public retrieval syntax filename, the sensitive code keyword retrieval statement is obtained by splicing and constructing a sensitive code keyword read from an info.db file and an unpublished retrieval syntax OR, and the sensitive filename retrieval statement is obtained by splicing and constructing a sensitive filename read from a file.db file and a public retrieval syntax filename.
Step 103, searching the item information for the search term to obtain a search result corresponding to the search term
Specifically, the search result in this embodiment is position information corresponding to the leak code keyword, and the search statement is searched in the obtained item information, so as to obtain position information of the leak code keyword, for example, a path name of a file in which the leak code is located, a name of the file in which the leak code is located, a position of the leak code in the entire code content, and upper and lower code segment information of the position.
In some embodiments of this embodiment, the step of retrieving the search statement in the item information includes: and searching the search sentence in the item information in the item list.
Specifically, the search term in this embodiment is searched in the item information range in the item list obtained after deduplication.
Further, in some embodiments of this embodiment, the step of retrieving the search term in the item information to obtain a search result corresponding to the search term includes: retrieving the sensitive path retrieval statement in the item information to obtain a retrieval result corresponding to the sensitive path retrieval statement; retrieving the sensitive file name retrieval statement in the retrieval result of the sensitive path retrieval statement to obtain the retrieval result corresponding to the sensitive file name retrieval statement; and searching the sensitive code keyword search statement in the search result of the sensitive file name search statement to obtain the search result corresponding to the sensitive code keyword search statement.
Specifically, in this embodiment, the sensitive path information related to the leak code can be obtained by searching the sensitive path search statement in the item information range in the item list, the sensitive file information related to the leak code, such as the sensitive file name, can be obtained by searching the sensitive file name search statement in the search result of the sensitive path search statement, and the position of the sensitive code keyword in the entire code content and the upper and lower code segment information of the position can be obtained by searching the sensitive code keyword search statement in the search result of the sensitive file name search statement. Through different types of retrieval statements, sensitive paths related to the leaked codes can be retrieved, so that sensitive file names related to the leaked codes can be retrieved, and sensitive code contents related to the leaked codes can be retrieved.
Still further, in some embodiments of this embodiment, the step of retrieving the sensitive code keyword search statement in the search result of the sensitive filename search statement to obtain the search result corresponding to the sensitive code keyword search statement includes: sequentially extracting a preset number of sensitive code keywords from all the read sensitive code keywords; and splicing the extracted sensitive code keywords with the undisclosed retrieval grammar, and sequentially retrieving in the retrieval results of the sensitive file name retrieval statements to obtain the retrieval results corresponding to the sensitive code keyword retrieval statements.
Specifically, in this embodiment, when the sensitive code keyword search statement is searched in the search result of the sensitive file name search statement, a preset number of sensitive code keywords are spliced with the unpublished search grammar, for example, 5 sensitive code keywords are extracted each time and spliced with the unpublished search grammar OR, and then the search result of the sensitive file name search statement is sequentially searched, so as to obtain the position of the sensitive code keyword in the whole code content and the upper and lower code segment information of the position.
Based on the technical scheme of the embodiment of the application, when a retrieval instruction corresponding to the leakage code keyword is received, the project information corresponding to the leakage code keyword is crawled; constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein, the retrieval request type comprises: sensitive path retrieval request, sensitive code keyword retrieval request and sensitive file name retrieval request; searching the search sentences in the item information to obtain search results corresponding to the search sentences; and the retrieval result is position information corresponding to the leaked code keyword. By implementing the scheme, the search statement is constructed according to the type of the search request, the item information corresponding to the leakage code keyword is searched, and the position information of the leakage code is obtained from the search result, so that the efficiency of searching the leakage code is effectively improved.
The method in fig. 2 is a refined leaked code retrieval method provided in a second embodiment of the present application, and the leaked code retrieval method includes:
step 201, when a retrieval instruction corresponding to the leakage code keyword is received, item information corresponding to the leakage code keyword is crawled.
In the embodiment, according to the keywords related to the leakage codes input by the user, all items related to the keywords are crawled, and item information corresponding to the keywords, such as item names, item creator information and the like, is obtained.
And 202, splicing the sensitive path name and the public retrieval grammar to construct a sensitive path retrieval statement.
And step 203, splicing the sensitive code keywords with the undisclosed retrieval grammar to construct a sensitive code keyword retrieval statement.
And 204, splicing the sensitive file name and the public retrieval grammar to construct a sensitive file name retrieval statement.
In this embodiment, the sensitive path retrieval statement is constructed by splicing a sensitive path name read from a path.db file and a public retrieval syntax file, the sensitive code keyword retrieval statement is constructed by splicing a sensitive code keyword read from an info.db file and an unpublished retrieval syntax OR, and the sensitive file name retrieval statement is constructed by splicing a sensitive file name read from a file.db file and a public retrieval syntax file.
And step 205, retrieving the sensitive path retrieval statement in the item information to obtain a retrieval result corresponding to the sensitive path retrieval statement.
And step 206, retrieving the sensitive file name retrieval statement in the retrieval result of the sensitive path retrieval statement to obtain the retrieval result corresponding to the sensitive file name retrieval statement.
And step 207, retrieving the sensitive code keyword retrieval statement in the retrieval result of the sensitive file name retrieval statement to obtain the retrieval result corresponding to the sensitive code keyword retrieval statement.
In this embodiment, the sensitive path information related to the leak code can be obtained by searching the sensitive path search statement in the item information range in the item list, the sensitive file information related to the leak code, such as the sensitive file name, can be obtained by searching the sensitive file name search statement in the search result of the sensitive path search statement, and the position of the sensitive code keyword in the entire code content and the upper and lower code fragment information of the position can be obtained by searching the sensitive code keyword search statement in the search result of the sensitive file name search statement. Through different types of retrieval statements, sensitive paths related to the leaked codes can be retrieved, so that sensitive file names related to the leaked codes can be retrieved, and sensitive code contents related to the leaked codes can be retrieved.
It should be understood that, the size of the serial number of each step in this embodiment does not mean the execution sequence of the step, and the execution sequence of each step should be determined by its function and inherent logic, and should not be limited uniquely to the implementation process of the embodiment of the present application.
Based on the technical scheme of the embodiment of the application, when a retrieval instruction corresponding to the leakage code keyword is received, item information corresponding to the leakage code keyword is crawled; constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein, the retrieval request type comprises: a sensitive path retrieval request, a sensitive code keyword retrieval request and a sensitive file name retrieval request; searching the search sentences in the item information to obtain search results corresponding to the search sentences; and the retrieval result is position information corresponding to the leaked code keyword. By implementing the scheme, the search statement is constructed according to the type of the search request, the item information corresponding to the leakage code keyword is searched, and the position information of the leakage code is obtained from the search result, so that the efficiency of searching the leakage code is effectively improved.
Fig. 3 is a leaked code retrieving apparatus according to a third embodiment of the present application. The leaked code retrieval apparatus can be used to implement the leaked code retrieval method in the foregoing embodiments. As shown in fig. 3, the leaked code retrieval mainly includes:
a crawling module 301, configured to crawl the item information corresponding to the leakage code keyword when receiving a retrieval instruction of the leakage code keyword.
A building module 302, configured to build a search statement corresponding to the type of the search request according to the type of the search request; wherein, the retrieval request type comprises: sensitive path retrieval request, sensitive code keyword retrieval request and sensitive file name retrieval request.
A retrieval module 303, configured to retrieve the retrieval statement in the item information, and obtain a retrieval result corresponding to the retrieval statement; and the retrieval result is position information corresponding to the leaked code key words.
In some implementations of this embodiment, the leaked code retrieving apparatus further includes: the generating module is used for carrying out duplicate removal on the crawled project information after crawling the project information corresponding to the leaked code key words; and generating a project list based on the project information obtained by the deduplication.
In some embodiments of this embodiment, the retrieving module 303 is specifically configured to: and searching the item information by the search sentence.
In some implementations of this embodiment, the building module 302 is specifically configured to: acquiring a preset storage file; the storage file is used for storing search terms corresponding to the type of the search request; the search term comprises: sensitive path names, sensitive code keywords and sensitive file names; reading a search word in a storage file corresponding to the type of the search request according to the type of the search request; splicing the search terms and the search grammar to construct search sentences corresponding to the type of the search request; wherein, retrieve the grammar and include: a disclosed search grammar, an unpublished search grammar.
Further, in some implementations of the present embodiment, the building module 302 is further configured to: splicing the sensitive path name and the open retrieval grammar to construct a sensitive path retrieval statement; splicing the sensitive code keywords with the undisclosed retrieval grammar to construct sensitive code keyword retrieval sentences; and splicing the sensitive file name with the public retrieval syntax to construct the sensitive file name retrieval statement.
In other embodiments of this embodiment, the retrieving module 303 is specifically configured to: retrieving the sensitive path retrieval statement in the item information in the item list to obtain a retrieval result corresponding to the sensitive path retrieval statement; retrieving the sensitive file name retrieval statement in the retrieval result of the sensitive path retrieval statement to obtain the retrieval result corresponding to the sensitive file name retrieval statement; and searching the sensitive code keyword search statement in the search result of the sensitive file name search statement to obtain the search result corresponding to the sensitive code keyword search statement.
Further, in other embodiments of the present embodiment, the retrieving module 303 is further configured to: sequentially extracting a preset number of sensitive code keywords from all the read sensitive code keywords; and splicing the extracted sensitive code keywords with the undisclosed retrieval grammar in sequence, and retrieving in the retrieval results of the sensitive file name retrieval sentences in sequence to obtain the retrieval results corresponding to the sensitive code keyword retrieval sentences.
It should be noted that, the leaked code retrieval methods in the first and second embodiments can be implemented based on the leaked code retrieval device provided in this embodiment, and it can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the leaked code retrieval device described in this embodiment may refer to the corresponding process in the foregoing method embodiment, and details are not repeated here.
According to the leak code retrieval apparatus provided by the present embodiment, when a retrieval instruction corresponding to a leak code keyword is received, item information corresponding to the leak code keyword is crawled; constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein, the retrieval request type comprises: sensitive path retrieval request, sensitive code keyword retrieval request and sensitive file name retrieval request; searching the search sentences in the item information to obtain search results corresponding to the search sentences; and the retrieval result is position information corresponding to the leaked code key words. By implementing the scheme, the retrieval statement is constructed according to the type of the retrieval request, the item information corresponding to the leakage code keyword is retrieved, and the position information of the leakage code is obtained from the retrieval result, so that the efficiency of the retrieval of the leakage code is effectively improved.
Referring to fig. 4, fig. 4 is an electronic device according to a fourth embodiment of the present disclosure. The electronic device may be used to implement the leaked code retrieval method in the foregoing embodiments. As shown in fig. 4, the electronic device mainly includes:
a memory 401, a processor 402 and a computer program 403 stored on the memory 401 and executable on the processor 402, the memory 401 and the processor 402 being connected by a bus. The processor 402, when executing the computer program 403, implements the automatic detection method in the foregoing embodiments. Wherein the number of processors may be one or more.
The Memory 401 may be a Random Access Memory (RAM) Memory or a non-volatile Memory (non-volatile Memory), such as a disk Memory. A memory 401 is used to store executable program code and a processor 402 is coupled to the memory 401.
Further, an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium may be provided in an electronic device in the foregoing embodiments, and the computer-readable storage medium may be the memory in the foregoing embodiment shown in fig. 4.
The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the shortcut operation method in the foregoing embodiments. Further, the computer-readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a readable storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned readable storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk, and various media capable of storing program codes.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art will appreciate that the embodiments described in this specification are presently considered to be preferred embodiments and that acts and modules are not required in the present application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In view of the above description of the fast operation method, apparatus and computer readable storage medium provided by the present application, those skilled in the art will recognize that changes may be made in the embodiments and applications of the method and apparatus according to the teachings of the present application.

Claims (10)

1. A leaked code retrieval method, comprising:
when a retrieval instruction corresponding to a leakage code keyword is received, crawling project information corresponding to the leakage code keyword;
constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein the retrieval request type comprises: a sensitive path retrieval request, a sensitive code keyword retrieval request and a sensitive file name retrieval request;
retrieving the retrieval statement in the item information to obtain a retrieval result corresponding to the retrieval statement; and the retrieval result is position information corresponding to the leakage code keyword.
2. The leak code retrieval method of claim 1, wherein after the step of crawling item information corresponding to the leak code keyword, further comprising:
removing the weight of the obtained project information;
generating a project list based on the project information obtained by the duplicate removal;
the step of retrieving the retrieval statement in the item information includes:
and searching the search sentence in the item information in the item list.
3. The leaked code retrieving method according to claim 1, wherein the step of constructing the retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request comprises:
acquiring a preset storage file; the storage file is used for storing search terms corresponding to the search request type; the search term comprises: sensitive path names, sensitive code keywords and sensitive file names;
reading a search word in a storage file corresponding to the type of the search request according to the type of the search request;
splicing the search terms and the search grammar to construct a search statement corresponding to the search request type; wherein the search grammar comprises: a published search grammar, an unpublished search grammar.
4. The leaked code retrieval method according to claim 3, wherein the step of building a retrieval statement corresponding to the type of the retrieval request by splicing the retrieval word with a retrieval grammar comprises:
splicing the sensitive path name and the public retrieval grammar to construct a sensitive path retrieval statement;
splicing the sensitive code key words and the undisclosed retrieval grammar to construct sensitive code key word retrieval sentences;
and splicing the sensitive file name and the public retrieval grammar to construct a sensitive file name retrieval statement.
5. The leaked code retrieval method according to claim 4, wherein the step of retrieving the retrieval statement in the item information to obtain a retrieval result corresponding to the retrieval statement includes:
retrieving the sensitive path retrieval statement in the item information to obtain a retrieval result corresponding to the sensitive path retrieval statement;
retrieving the sensitive file name retrieval statement in the retrieval result of the sensitive path retrieval statement to obtain the retrieval result corresponding to the sensitive file name retrieval statement;
and retrieving the sensitive code keyword retrieval statement in the retrieval result of the sensitive file name retrieval statement to obtain the retrieval result corresponding to the sensitive code keyword retrieval statement.
6. The leaked code retrieval method according to claim 5, wherein the step of retrieving the sensitive code keyword retrieval statement in the retrieval result of the sensitive filename retrieval statement to obtain the retrieval result corresponding to the sensitive code keyword retrieval statement comprises:
sequentially extracting a preset number of sensitive code keywords from all the read sensitive code keywords;
and splicing the extracted sensitive code keywords with the undisclosed retrieval grammar in sequence, and retrieving in the retrieval results of the sensitive file name retrieval statements in sequence to obtain the retrieval results corresponding to the sensitive code keywords.
7. The leaked code retrieving method according to any one of claims 1 to 6, wherein the step of crawling item information corresponding to the leaked code keyword is preceded by:
acquiring an account name and an account password, and performing identity authentication on the account name and the account password according to a crawler technology;
and when the identity authentication is passed, the step of crawling the project information corresponding to the leaked code keyword is executed.
8. A leaked code retrieving apparatus, comprising:
the crawling module is used for crawling project information corresponding to the leakage code keyword when a retrieval instruction of the leakage code keyword is received;
the construction module is used for constructing a retrieval statement corresponding to the type of the retrieval request according to the type of the retrieval request; wherein the retrieval request type comprises: a sensitive path retrieval request, a sensitive code keyword retrieval request and a sensitive file name retrieval request;
the retrieval module is used for retrieving the retrieval statement in the item information to obtain a retrieval result corresponding to the retrieval statement; and the retrieval result is position information corresponding to the leakage code keyword.
9. An electronic device, comprising: a memory and a processor, wherein:
the processor is configured to execute a computer program stored on the memory;
the processor, when executing the computer program, performs the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202211296187.5A 2022-10-21 2022-10-21 Leakage code retrieval method and device and computer readable storage medium Pending CN115658067A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211296187.5A CN115658067A (en) 2022-10-21 2022-10-21 Leakage code retrieval method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211296187.5A CN115658067A (en) 2022-10-21 2022-10-21 Leakage code retrieval method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115658067A true CN115658067A (en) 2023-01-31

Family

ID=84989278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211296187.5A Pending CN115658067A (en) 2022-10-21 2022-10-21 Leakage code retrieval method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115658067A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182338A (en) * 2020-11-02 2021-01-05 国网北京市电力公司 Monitoring method and device for hosting platform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182338A (en) * 2020-11-02 2021-01-05 国网北京市电力公司 Monitoring method and device for hosting platform

Similar Documents

Publication Publication Date Title
US10552462B1 (en) Systems and methods for tokenizing user-annotated names
US10452691B2 (en) Method and apparatus for generating search results using inverted index
US10558754B2 (en) Method and system for automating training of named entity recognition in natural language processing
US7945552B2 (en) System of effectively searching text for keyword, and method thereof
US8468146B2 (en) System and method for creating search index on cloud database
US11544300B2 (en) Reducing storage required for an indexing structure through index merging
US20160196342A1 (en) Plagiarism Document Detection System Based on Synonym Dictionary and Automatic Reference Citation Mark Attaching System
US8521711B2 (en) Providing persistent refined intermediate results selected from dynamic iterative filtering
CN110532347B (en) Log data processing method, device, equipment and storage medium
US20220019739A1 (en) Item Recall Method and System, Electronic Device and Readable Storage Medium
WO2020242570A1 (en) A proximity information retrieval boost method for medical knowledge question answering systems
US9684726B2 (en) Realtime ingestion via multi-corpus knowledge base with weighting
CN110347573B (en) Application program analysis method, device, electronic equipment and computer readable medium
CN110618999A (en) Data query method and device, computer storage medium and electronic equipment
CN115658067A (en) Leakage code retrieval method and device and computer readable storage medium
CN111666278A (en) Data storage method, data retrieval method, electronic device and storage medium
US10789067B2 (en) System and method for identifying open source usage
US10255349B2 (en) Requesting enrichment for document corpora
US10318507B2 (en) Optimizing tables with too many columns in a database
CN117591624B (en) Test case recommendation method based on semantic index relation
US20220398291A1 (en) Smart browser history search
US11971891B1 (en) Accessing siloed data across disparate locations via a unified metadata graph systems and methods
US10579696B2 (en) Save session storage space by identifying similar contents and computing difference
WO2024021874A1 (en) Vulnerability analysis method and apparatus, and device and computer-readable storage medium
WO2022265744A1 (en) Smart browser history search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination