CN115225413A - Method and device for extracting defect index, electronic equipment and storage medium - Google Patents

Method and device for extracting defect index, electronic equipment and storage medium Download PDF

Info

Publication number
CN115225413A
CN115225413A CN202211140696.9A CN202211140696A CN115225413A CN 115225413 A CN115225413 A CN 115225413A CN 202211140696 A CN202211140696 A CN 202211140696A CN 115225413 A CN115225413 A CN 115225413A
Authority
CN
China
Prior art keywords
index
collapse
defect
sample
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211140696.9A
Other languages
Chinese (zh)
Other versions
CN115225413B (en
Inventor
王鹏云
樊兴华
薛锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ThreatBook Technology Co Ltd
Original Assignee
Beijing ThreatBook Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ThreatBook Technology Co Ltd filed Critical Beijing ThreatBook Technology Co Ltd
Priority to CN202211140696.9A priority Critical patent/CN115225413B/en
Publication of CN115225413A publication Critical patent/CN115225413A/en
Application granted granted Critical
Publication of CN115225413B publication Critical patent/CN115225413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The embodiment of the application provides a method and a device for extracting a defect index, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring sample data containing a defect index; preprocessing the sample data to obtain a positive sample and a negative sample; inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a collapse index detection model; matching the data to be detected with a pre-established credible threat information white list to obtain a collapse index; and inputting the collapse index into the collapse index detection model to obtain the collapse index which accords with threat intelligence. Implement this application embodiment, can improve detection efficiency, can not cause the omission of the index of caving in, reduce the emergence that the false detection detected, need not rely on artifical the detection, reduce the human cost.

Description

Method and device for extracting collapse index, electronic equipment and storage medium
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for extracting a failure indicator, an electronic device, and a computer-readable storage medium.
Background
As threat intelligence matures, more and more threat intelligence for different data sources is generated. Because different threat information article formats are different, the traditional method for collecting open source information is to manually screen the articles, remove the domain name and IP without correlation in the articles, and manually extract the collapse index for integration. This causes two problems: on the one hand, artificial subjective initiative is limited, the whole process of article publication, manual screening and collapse index extraction usually needs one to two days at the fastest speed, quick response cannot be achieved, and meanwhile, manual screening is limited in energy and easily causes omission. On the other hand, the traditional automatic batch extraction method often extracts a plurality of defect indexes which are irrelevant to articles and can be misreported after being used. The program cannot automatically judge which articles are safe articles and which articles are not safe articles, so that the program can determine which collapse indexes can be extracted as safety information, which cannot be extracted and the like.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for extracting a missing index, an electronic device, and a computer-readable storage medium, which can improve detection efficiency, avoid missing the missing index, reduce false detection, do not need to rely on manual detection, and reduce labor cost.
In a first aspect, an embodiment of the present application provides a method for extracting a defect index, where the method includes:
acquiring sample data containing a defect index;
preprocessing the sample data to obtain a positive sample and a negative sample;
matching the data to be detected with a pre-established credible threat information white list to obtain a defect loss index;
and inputting the collapse index into the collapse index detection model to obtain the collapse index meeting threat information.
In the implementation process, the semantic detection model is used for training the positive sample and the negative sample, the extraction capacity of the defect index detection model for the defect index is improved, the defect index is further matched according to a credible threat information white list, the accuracy of detecting the defect index is improved, the detection efficiency is improved, the omission of the defect index is avoided, the occurrence of misdetection is reduced, the manual detection is not needed, and the labor cost is reduced.
Further, after the step of inputting the defect index into the defect index detection model to obtain a defect index meeting threat intelligence, the method further includes:
verifying according to the collapse index to obtain a verification result;
obtaining a verification positive sample and a verification negative sample according to the verification result;
and inputting the verification positive sample and the verification negative sample into the defect index detection model for secondary training so as to improve the precision of the defect index detection model.
In the implementation process, after the defect index is obtained, the defect index is verified, secondary training is carried out according to the verification positive sample and the verification negative sample, and the robustness and the accuracy of the defect index detection model are improved.
Further, the step of preprocessing the sample data to obtain a positive sample and a negative sample includes:
filtering the sample data;
performing regular extraction on the filtered sample data to obtain a collapse index;
judging whether the domain name type collapse indexes in the collapse indexes are malicious collapse indexes or not;
if yes, obtaining the positive sample according to the sample data of the domain name type collapse indexes which are judged as the malicious collapse indexes;
if not, obtaining the negative sample according to the filtered sample data.
In the implementation process, the positive sample and the negative sample are obtained according to the filtered sample data, so that the positive sample and the negative sample contain more characteristics of the defect index, and the defect index detection model is favorable for rapidly detecting the defect index.
Further, the step of inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a defect index detection model includes:
inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain an initial collapse index detection model;
obtaining a test positive sample and a test negative sample;
inputting the test positive sample and the test negative sample into the initial collapse index detection model for iterative training to obtain the collapse index detection model.
In the implementation process, an initial defect index detection model is obtained according to the positive sample and the negative sample, and then the positive test sample and the negative test sample are input into the initial defect index detection model for iterative training, so that the identification rate of the defect index detection model on defect indexes can be improved, and the detection time can be shortened.
Further, the step of matching the data to be detected with a pre-established trusted threat information white list to obtain a collapse index comprises:
extracting host information in the data to be detected;
judging whether the trusted threat information white list can be matched with the host information or not;
if so, extracting an initial collapse index in the data to be detected;
and verifying the initial collapse index, and if the initial collapse index passes the verification, obtaining the collapse index.
In the implementation process, the matching is further carried out according to the host information and the credible threat information white list, and meanwhile, the defect index is verified, so that the accuracy of the defect index is ensured, and the error is reduced.
Further, the step of verifying the initial defect index and obtaining the defect index if the initial defect index passes the verification includes:
verifying the initial collapse index by utilizing a plurality of search engines to obtain a plurality of verification results;
and if the number of malicious verification results in the verification results reaches a threshold value, the initial defect index corresponding to the verification results is valid, and the initial defect index is used as the defect index.
In the implementation process, the initial collapse index is verified through the search engine, so that the time required by verification can be reduced, and the verification accuracy is improved.
Further, after the step of verifying the initial failure index and obtaining the failure index if the initial failure index passes the verification, the method further includes:
acquiring domain name information of the collapse index;
carrying out false alarm judgment on the collapse index according to the domain name age and the access page data in the domain name information;
and if the collapse index is false report, not taking the collapse index as the collapse index which accords with threat information.
In the implementation process, the fault report judgment is carried out on the collapse index, so that the error can be further reduced.
In a second aspect, an embodiment of the present application further provides an apparatus for extracting a failure indicator, where the apparatus includes:
the acquisition module is used for acquiring sample data containing the collapse index;
the preprocessing module is used for preprocessing the sample data to obtain a positive sample and a negative sample;
the model training module is used for inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a collapse index detection model;
the matching module is used for matching the data to be detected with a pre-established credible threat information white list to obtain a collapse index;
and the detection module is used for inputting the defect index into the defect index detection model to obtain the defect index which accords with threat information.
In the implementation process, the semantic detection model is used for training the positive sample and the negative sample, the extraction capacity of the defect index detection model for the defect index is improved, the defect index is further matched according to a credible threat information white list, the accuracy of detecting the defect index is improved, the detection efficiency is improved, the omission of the defect index is avoided, the occurrence of misdetection is reduced, the manual detection is not needed, and the labor cost is reduced.
In a third aspect, an electronic device provided in an embodiment of the present application includes: memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to any of the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having instructions stored thereon, which, when executed on a computer, cause the computer to perform the method according to any one of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to perform the method according to any one of the first aspect.
Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.
The present invention can be implemented in accordance with the teachings of the specification, which is to be read in conjunction with the following detailed description of the presently preferred embodiments of the invention.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a method for extracting a defect index according to an embodiment of the present disclosure;
fig. 2 is a schematic structural component diagram of an extraction device for providing a defect index according to an embodiment of the present application;
fig. 3 is a schematic structural component diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not construed as indicating or implying relative importance.
The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application, but are not intended to limit the scope of the present application.
Example one
Fig. 1 is a schematic flow chart of an extraction method for a defect index provided in an embodiment of the present application, and as shown in fig. 1, the method includes:
s1, acquiring sample data containing a defect index;
s2, preprocessing the sample data to obtain a positive sample and a negative sample;
s3, inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a defect index detection model;
s4, matching the data to be detected with a pre-established credible threat information white list to obtain a defect index;
and S5, inputting the collapse index into a collapse index detection model to obtain the collapse index which accords with threat information.
In the implementation process, the semantic detection model is used for training the positive sample and the negative sample, the extraction capacity of the defect index detection model for the defect index is improved, the defect index is further matched according to a credible threat information white list, the accuracy of detecting the defect index is improved, the detection efficiency is improved, the omission of the defect index is avoided, the occurrence of misdetection is reduced, the manual detection is not needed, and the labor cost is reduced.
Further, after S5, the method further includes:
verifying according to the collapse index to obtain a verification result;
obtaining a verification positive sample and a verification negative sample according to a verification result;
and inputting the verification positive sample and the verification negative sample into the defect index detection model for secondary training so as to improve the precision of the defect index detection model.
In the implementation process, after the defect index is obtained, the defect index is verified, secondary training is carried out according to the verification positive sample and the verification negative sample, and the robustness and the accuracy of the defect index detection model are improved.
In S1, a crawler may be used to crawl threat intelligence articles containing a collapse index, such as news reports, mainstream threat intelligence information sites, etc., and extract elements such as article titles, classifications, abstracts, main contents, access links, etc., to form json format data for storage as sample data.
Further, S2 includes:
filtering the sample data;
performing regular extraction on the filtered sample data to obtain a collapse index;
judging whether the domain name type collapse indexes in the collapse indexes are malicious collapse indexes or not;
if yes, obtaining a positive sample according to the sample data of the domain name type collapse index which is judged as the malicious collapse index;
and if not, obtaining a negative sample according to the filtered sample data.
In the implementation process, the positive sample and the negative sample are obtained according to the filtered sample data, so that the positive sample and the negative sample contain more characteristics of the defect index, and the defect index detection model is favorable for rapidly detecting the defect index.
And after filtering, regularly extracting a defect index in the sample data, and storing the sample data and the defect index as negative samples. And verifying the domain name type collapse index based on TLD, colliding a mainstream threat intelligence manufacturer intelligence library meeting the conditions, and if judging the domain name type collapse index to be a malicious collapse index, storing corresponding sample data and the collapse index as a positive sample. For example, a context paragraph with a certain length (e.g. 20) corresponding to the missing index may be extracted, and the document is converted into, for example, 20 and the corresponding context paragraph as partial sample data. Specifically, the first 20 characters and the last 20 characters of the collapse index at the position in the article are combined to form a context paragraph after combination, and a paragraph set is formed.
Specifically, the filtering the sample data further includes judging the language of the sample data based on unicode, and if chinese-type unicode codes appear in the sample data, identifying the sample data as a chinese-type article, and if other languages appear, corresponding to other articles. Two mainstream languages are used in the examples of this application: chinese and English are used as two sample data, and the training is carried out separately according to the language. Paragraphs that fit the encoding range of Chinese characters (4E 00-9FA 5) are classified as Chinese sample data, and paragraphs that fit the integral of English encoding are classified as English sample data.
Symbols except common symbols (such as commas, periods, semicolons and the like) are subjected to generalized coding, so that the probability of the accuracy reduction of the algorithm semantic task recognition caused by special symbols is reduced.
And filtering the articles of special types in the sample data to remove documents without actual semantics and messy code classes in the whole process.
Optionally, the miss indicator is present in the sample data as a tag of the sample data.
Exemplarily, 3 ten thousand scientific articles, 3 ten thousand human articles, 3 ten thousand news articles and 3 ten thousand technical articles can be selected from the sample data to form a 12 ten thousand negative sample set; 12 ten thousand threat intelligence articles are collected to form a positive sample set.
Further, S3 includes:
inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain an initial collapse index detection model;
obtaining a test positive sample and a test negative sample;
inputting the test positive sample and the test negative sample into the initial collapse index detection model for iterative training to obtain the collapse index detection model.
In the implementation process, an initial defect index detection model is obtained according to the positive sample and the negative sample, and then the positive test sample and the negative test sample are input into the initial defect index detection model for iterative training, so that the identification rate of the defect index detection model on defect indexes can be improved, and the detection time can be shortened.
And performing model verification by using the subsequently crawled data as a positive test sample and a negative test sample, and performing result verification on the collapse index predicted by the initial collapse index detection model based on an intelligence library. And inputting the sample data with the wrong prediction, the test positive sample and the test negative sample into the training set for iterative training, verifying again until the accuracy of the obtained collapse index detection model meets the requirement, and storing the model.
Illustratively, the embodiments of the present application semantically encode the positive sample and the negative sample, and encode the positive sample and the negative sample by using a Bert Tokenizer algorithm. Wherein, part of parameters are as follows: the sentence length is set to 40 (the sentence length is the same as the paragraph length), the encoded paragraphs are input into a pre-trained semantic detection model for training, an Adam algorithm is used as an optimizer, and the number of training rounds is 3. And verifying the trained model, and performing accuracy verification by using the test positive sample and the test negative sample. The final index is, precision: 97.45 percent, 96.98 percent of accuracy and better effect.
Further, S4 includes:
extracting host information in the data to be detected;
judging whether the credible threat information white list can be matched with the host information or not;
if yes, extracting a defect index in the data to be detected;
and verifying the collapse index, and if the verification is passed, obtaining the collapse index.
In the implementation process, the matching is further carried out according to the host information and the credible threat information white list, and meanwhile, the defect index is verified, so that the accuracy of the defect index is ensured, and the error is reduced.
Further, the step of verifying the initial defect index and obtaining the defect index if the initial defect index passes the verification comprises the following steps:
verifying the initial collapse index by utilizing a plurality of search engines to obtain a plurality of verification results;
and if the number of malicious verification results in the multiple verification results reaches a threshold value, the initial defect index corresponding to the multiple verification results is valid, and the initial defect index is used as a defect index.
In the implementation process, the initial collapse index is verified through the search engine, so that the time required by verification can be reduced, and the verification accuracy is improved.
And automatically verifying the extracted initial collapse index by using threat information engines of a plurality of mainstream security manufacturers, if more than half of engines are identified as malicious, judging that the initial collapse index is effective, and storing the initial collapse index serving as a collapse index into an information library for a product.
Further, after the step of verifying the initial defect index and obtaining the defect index if the initial defect index passes the verification, the method further comprises the following steps:
acquiring domain name information of a defect index;
carrying out false alarm judgment on the collapse index according to the domain name age and the access page data in the domain name information;
if the defect index is false report, the defect index is not used as the defect index in accordance with threat information.
In the implementation process, the fault report judgment is carried out on the collapse index, so that the error can be further reduced.
And establishing a credible threat information white list, and storing the white list by using a database. Crawling is carried out on information websites related to the mainstream threat intelligence, and information such as titles, links, abstracts and contents of articles is stored to be used as data to be detected. And extracting host information in the data to be detected, comparing the host information with a credible threat information white list in a database, and if the host information is not in the credible threat information white list, discarding the host information.
And extracting a collapse index in the data to be detected based on regular scanning, extracting characters with certain lengths above and below the collapse index as context paragraphs, verifying whether the domain name is legal or not based on TLD (transport layer discovery) for the domain name collapse index, and discarding the domain name collapse index if the domain name collapse index is not legal.
Inputting the collapse index and the context thereof into a collapse index detection model for judgment, and judging whether the collapse index meets the relevant standards of threat information according to the context.
If the result is consistent with the result, false alarm judgment is carried out, whether the domain name has a normal page or not is visited, whether the domain name is identified as a credible threat information white list or not is judged, false alarm judgment is carried out, and if the result is judged to be false alarm, the defect index and the corresponding context are abandoned and are not used as the defect index which accords with threat information.
The defect index passing the verification can be marked as a positive sample again; the defect index judged to be false can be re-labeled as a negative sample. When the positive and negative samples are accumulated to a certain number, the positive and negative samples are superposed to the original positive and negative samples for retraining, and the accuracy of the model is improved.
The embodiment of the application can solve the problems that in the prior art, the process of manual judgment is needed, the standard is not uniform due to the experience and the energy of people, and the response speed is low. Compared with the manual method, the automatic judgment based on the semantics has no main observation activity upper limit, the speed is obviously improved compared with the manual method, the second-level extraction can be realized, the accuracy is ensured, and the purpose of intelligently identifying the collapse index C is realized.
In addition, the extraction method for screening the data source based on the white list database is used in the embodiment of the application, compared with a full-network blind crawling scheme, the collected articles are stronger in relevance with threat information and higher in quality, and a large amount of filtering work is omitted.
Meanwhile, semantic recognition is carried out in a mode of forming paragraphs based on a certain length of the context of the collapse index, compared with full-text recognition, the relevance of the semantic recognition method to the corresponding collapse index is stronger, the calculation complexity is lower, and the influence of irrelevant sentences in the full-text on the result is smaller. And moreover, compared with the traditional manual verification mode, the scheme of automatically verifying the result based on the threat information engine has the advantages of more accurate result and higher recognizable quantity, and can adapt to the sample requirements of massive training and studying and judging.
Example two
In order to implement a corresponding method of the above embodiment to achieve corresponding functions and technical effects, the following provides an apparatus for extracting a defect index, as shown in fig. 2, the apparatus including:
the acquisition module 1 is used for acquiring sample data containing a defect index;
the preprocessing module 2 is used for preprocessing the sample data to obtain a positive sample and a negative sample;
the model training module 3 is used for inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a collapse index detection model;
the matching module 4 is used for matching the data to be detected with a pre-established credible threat information white list to obtain a collapse index;
and the detection module 5 is used for inputting the defect index into the defect index detection model to obtain the defect index which accords with threat information.
In the implementation process, the semantic detection model is used for training the positive sample and the negative sample, the extraction capacity of the defect index detection model for the defect index is improved, the defect index is further matched according to a credible threat information white list, the accuracy of detecting the defect index is improved, the detection efficiency is improved, the omission of the defect index is avoided, the occurrence of misdetection is reduced, the manual detection is not needed, and the labor cost is reduced.
Further, the apparatus also includes a verification module to:
verifying according to the collapse index to obtain a verification result;
obtaining a verification positive sample and a verification negative sample according to a verification result;
and inputting the verification positive sample and the verification negative sample into the defect index detection model for secondary training so as to improve the precision of the defect index detection model.
In the implementation process, after the defect index is obtained, the defect index is verified, secondary training is carried out according to the verification positive sample and the verification negative sample, and the robustness and the accuracy of the defect index detection model are improved.
Further, the preprocessing module 2 is further configured to:
filtering the sample data;
performing regular extraction on the filtered sample data to obtain a collapse index;
judging whether the domain name type collapse indexes in the collapse indexes are malicious collapse indexes or not;
if yes, obtaining a positive sample according to the sample data of the domain name type collapse index which is judged as the malicious collapse index;
if not, obtaining a negative sample according to the filtered sample data.
Further, the model training module 3 is further configured to:
inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain an initial collapse index detection model;
obtaining a test positive sample and a test negative sample;
inputting the test positive sample and the test negative sample into the initial collapse index detection model for iterative training to obtain the collapse index detection model.
Further, the matching module 4 is further configured to:
extracting host information in the data to be detected;
judging whether the credible threat information white list can be matched with the host information or not;
if so, extracting an initial defect index in the data to be detected;
and verifying the initial collapse index, and if the initial collapse index passes the verification, obtaining the collapse index.
Further, the matching module 4 is further configured to:
verifying the initial collapse index by utilizing a plurality of search engines to obtain a plurality of verification results;
and if the number of malicious verification results in the multiple verification results reaches a threshold value, the initial defect index corresponding to the multiple verification results is valid, and the initial defect index is used as a defect index.
Further, the apparatus further includes a misjudgment module, configured to:
acquiring domain name information of a defect index;
carrying out false alarm judgment on the collapse index according to the domain name age and the access page data in the domain name information;
if the defect index is false alarm, the defect index is not used as the defect index meeting threat information.
The above apparatus for extracting a defect index may implement the method of the first embodiment. The options in the first embodiment above are also applicable to the present embodiment, and are not described in detail here.
The rest of the embodiments of the present application may refer to the contents of the first embodiment, and in this embodiment, details are not repeated.
EXAMPLE III
An embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory is used to store a computer program, and the processor runs the computer program to enable the electronic device to execute an embodiment of a method for extracting a missing indicator.
Alternatively, the electronic device may be a server.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device may include a processor 31, a communication interface 32, a memory 33, and at least one communication bus 34. Wherein the communication bus 34 is used for realizing direct connection communication of these components. The communication interface 32 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The processor 31 may be an integrated circuit chip having signal processing capabilities.
The Processor 31 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 31 may be any conventional processor or the like.
The Memory 33 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 33 has stored therein computer readable instructions which, when executed by the processor 31, enable the apparatus to perform the various steps involved in the method embodiment of fig. 1 described above.
Optionally, the electronic device may further include a memory controller, an input output unit. The memory 33, the memory controller, the processor 31, the peripheral interface, and the input/output unit are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components may be electrically connected to each other via one or more communication buses 34. The processor 31 is adapted to execute executable modules stored in the memory 33, such as software functional modules or computer programs comprised by the device.
The input and output unit is used for providing a task for a user to create and start an optional time period or preset execution time for the task creation so as to realize the interaction between the user and the server. The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.
It will be appreciated that the configuration shown in fig. 3 is merely illustrative and that the electronic device may include more or fewer components than shown in fig. 3 or have a different configuration than shown in fig. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.
In addition, an embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method for extracting a defect index according to the embodiment.
Embodiments of the present application further provide a computer program product, which when running on a computer, causes the computer to execute the method described in the method embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for extracting a defect index is characterized by comprising the following steps:
acquiring sample data containing a defect index;
preprocessing the sample data to obtain a positive sample and a negative sample;
inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a defect index detection model;
matching the data to be detected with a pre-established credible threat information white list to obtain a defect loss index;
and inputting the collapse index into the collapse index detection model to obtain the collapse index meeting threat information.
2. The method for extracting a loss index according to claim 1, wherein after the step of inputting the loss index into the loss index detection model to obtain a loss index that meets threat intelligence, the method further comprises:
verifying according to the collapse index to obtain a verification result;
obtaining a verification positive sample and a verification negative sample according to the verification result;
and inputting the verification positive sample and the verification negative sample into the defect index detection model for secondary training.
3. The method for extracting a defect index according to claim 2, wherein the step of preprocessing the sample data to obtain a positive sample and a negative sample comprises:
filtering the sample data;
performing regular extraction on the filtered sample data to obtain a collapse index;
judging whether the domain name type defect index in the defect indexes is a malicious defect index;
if yes, obtaining the positive sample according to the sample data of the domain name type collapse index which is judged as the malicious collapse index;
and if not, obtaining the negative sample according to the filtered sample data.
4. The method for extracting a defect index according to claim 3, wherein the step of inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a defect index detection model comprises:
inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain an initial collapse index detection model;
obtaining a test positive sample and a test negative sample;
inputting the test positive sample and the test negative sample into the initial collapse index detection model for iterative training to obtain the collapse index detection model.
5. The method for extracting the defect index according to claim 1, wherein the step of matching the data to be detected with a pre-established credible threat intelligence white list to obtain the defect index comprises:
extracting host information in the data to be detected;
judging whether the trusted threat information white list can be matched with the host information or not;
if so, extracting an initial collapse index in the data to be detected;
and verifying the initial collapse index, and if the initial collapse index passes the verification, obtaining the collapse index.
6. The method for extracting a loss index according to claim 5, wherein the step of verifying the initial loss index and obtaining the loss index if the initial loss index passes the verification comprises:
verifying the initial collapse index by utilizing a plurality of search engines to obtain a plurality of verification results;
and if the number of malicious verification results in the verification results reaches a threshold value, the initial defect index corresponding to the verification results is valid, and the initial defect index is used as the defect index.
7. The method for extracting a defect index according to claim 5, wherein after the step of verifying the initial defect index and obtaining the defect index if the verification is passed, the method further comprises:
acquiring domain name information of the collapse index;
carrying out false alarm judgment on the collapse index according to the domain name age and the access page data in the domain name information;
and if the collapse index is false report, not taking the collapse index as the collapse index which accords with threat information.
8. An apparatus for extracting a defect index, the apparatus comprising:
the acquisition module is used for acquiring sample data containing the collapse index;
the preprocessing module is used for preprocessing the sample data to obtain a positive sample and a negative sample;
the model training module is used for inputting the positive sample and the negative sample into a pre-trained semantic detection model for training to obtain a defect index detection model;
the matching module is used for matching the data to be detected with a pre-established credible threat information white list to obtain a defect loss index;
and the detection module is used for inputting the defect index into the defect index detection model to obtain the defect index which accords with threat information.
9. An electronic device, comprising a memory for storing a computer program and a processor for executing the computer program to make the electronic device execute the method for extracting a defect index according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements a method of extracting a failure indicator according to any one of claims 1 to 7.
CN202211140696.9A 2022-09-20 2022-09-20 Method and device for extracting defect index, electronic equipment and storage medium Active CN115225413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211140696.9A CN115225413B (en) 2022-09-20 2022-09-20 Method and device for extracting defect index, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211140696.9A CN115225413B (en) 2022-09-20 2022-09-20 Method and device for extracting defect index, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115225413A true CN115225413A (en) 2022-10-21
CN115225413B CN115225413B (en) 2022-12-23

Family

ID=83616877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211140696.9A Active CN115225413B (en) 2022-09-20 2022-09-20 Method and device for extracting defect index, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115225413B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105743877A (en) * 2015-11-02 2016-07-06 哈尔滨安天科技股份有限公司 Network security threat information processing method and system
US20180124091A1 (en) * 2016-10-27 2018-05-03 Src, Inc. Method for the Continuous Calculation of a Cyber Security Risk Index
CN109194605A (en) * 2018-07-02 2019-01-11 中国科学院信息工程研究所 A kind of suspected threat index Proactive authentication method and system based on open source information
CN110020190A (en) * 2018-07-05 2019-07-16 中国科学院信息工程研究所 A kind of suspected threat index verification method and system based on multi-instance learning
CN111294332A (en) * 2020-01-13 2020-06-16 交通银行股份有限公司 Traffic anomaly detection and DNS channel anomaly detection system and method
US20200252421A1 (en) * 2016-08-02 2020-08-06 ThreatConnect, Inc. Enrichment and analysis of cybersecurity threat intelligence and orchestrating application of threat intelligence to selected network security events
WO2021017261A1 (en) * 2019-08-01 2021-02-04 平安科技(深圳)有限公司 Recognition model training method and apparatus, image recognition method and apparatus, and device and medium
CN113886829A (en) * 2021-12-08 2022-01-04 北京微步在线科技有限公司 Method and device for detecting defect host, electronic equipment and storage medium
CN114697066A (en) * 2020-12-30 2022-07-01 网神信息技术(北京)股份有限公司 Network threat detection method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105743877A (en) * 2015-11-02 2016-07-06 哈尔滨安天科技股份有限公司 Network security threat information processing method and system
US20200252421A1 (en) * 2016-08-02 2020-08-06 ThreatConnect, Inc. Enrichment and analysis of cybersecurity threat intelligence and orchestrating application of threat intelligence to selected network security events
US20180124091A1 (en) * 2016-10-27 2018-05-03 Src, Inc. Method for the Continuous Calculation of a Cyber Security Risk Index
CN109194605A (en) * 2018-07-02 2019-01-11 中国科学院信息工程研究所 A kind of suspected threat index Proactive authentication method and system based on open source information
CN110020190A (en) * 2018-07-05 2019-07-16 中国科学院信息工程研究所 A kind of suspected threat index verification method and system based on multi-instance learning
WO2021017261A1 (en) * 2019-08-01 2021-02-04 平安科技(深圳)有限公司 Recognition model training method and apparatus, image recognition method and apparatus, and device and medium
CN111294332A (en) * 2020-01-13 2020-06-16 交通银行股份有限公司 Traffic anomaly detection and DNS channel anomaly detection system and method
CN114697066A (en) * 2020-12-30 2022-07-01 网神信息技术(北京)股份有限公司 Network threat detection method and device
CN113886829A (en) * 2021-12-08 2022-01-04 北京微步在线科技有限公司 Method and device for detecting defect host, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115225413B (en) 2022-12-23

Similar Documents

Publication Publication Date Title
US9424524B2 (en) Extracting facts from unstructured text
CN107437038B (en) Webpage tampering detection method and device
CN109582833B (en) Abnormal text detection method and device
CN102054015A (en) System and method of organizing community intelligent information by using organic matter data model
CN102054016A (en) Systems and methods for capturing and managing collective social intelligence information
CN111783016B (en) Website classification method, device and equipment
CN111767716A (en) Method and device for determining enterprise multilevel industry information and computer equipment
US20150205862A1 (en) Method and device for recognizing and labeling peaks, increases, or abnormal or exceptional variations in the throughput of a stream of digital documents
Swanson et al. Extracting the native language signal for second language acquisition
CN103902733A (en) Information retrieval method based on interrogative extension
CN107729337B (en) Event monitoring method and device
CN110209659A (en) A kind of resume filter method, system and computer readable storage medium
KR20120064559A (en) Apparatus and method for question analysis for open web question-answering
RU2738335C1 (en) Method and system for classifying and filtering prohibited content in a network
CN112131249A (en) Attack intention identification method and device
CN107391684B (en) Method and system for generating threat information
CN111581956A (en) Sensitive information identification method and system based on BERT model and K nearest neighbor
CN101895517A (en) Method and device for extracting script semantics
CN111985244A (en) Method and device for detecting manuscript washing of document content
CN112434163A (en) Risk identification method, model construction method, risk identification device, electronic equipment and medium
CN115225413B (en) Method and device for extracting defect index, electronic equipment and storage medium
CN114996707B (en) Static detection method and device for picture Trojan horse, electronic equipment and storage medium
KR20210097408A (en) Device updating harmful website information and method thereof
Thanos et al. Combined deep learning and traditional NLP approaches for fire burst detection based on twitter posts
CN111581950B (en) Method for determining synonym names and method for establishing knowledge base of synonym names

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant