CN113468315A - Vulnerability vendor name matching method - Google Patents

Vulnerability vendor name matching method Download PDF

Info

Publication number
CN113468315A
CN113468315A CN202111027098.6A CN202111027098A CN113468315A CN 113468315 A CN113468315 A CN 113468315A CN 202111027098 A CN202111027098 A CN 202111027098A CN 113468315 A CN113468315 A CN 113468315A
Authority
CN
China
Prior art keywords
vulnerability
manufacturer
words
abbreviation
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111027098.6A
Other languages
Chinese (zh)
Other versions
CN113468315B (en
Inventor
卢敏
沈传宝
吴璇
万会来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huayuan Information Technology Co Ltd
Original Assignee
Beijing Huayuan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huayuan Information Technology Co Ltd filed Critical Beijing Huayuan Information Technology Co Ltd
Priority to CN202111027098.6A priority Critical patent/CN113468315B/en
Publication of CN113468315A publication Critical patent/CN113468315A/en
Application granted granted Critical
Publication of CN113468315B publication Critical patent/CN113468315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the disclosure provides a matching method, a device, equipment and a computer-readable storage medium for vulnerability vendor names. The method comprises the steps of obtaining vulnerability description information input by a user; performing word segmentation processing on the vulnerability description information to obtain one or more identification words; determining similarity between the identification words and the abbreviation in the constructed abbreviation set of the vulnerability vendor; determining the manufacturer abbreviation with the highest score based on the similarity and the corresponding weight value of the corresponding manufacturer abbreviation; and taking the manufacturer with the highest score as the whole manufacturer corresponding to the vulnerability description information. In this way, the intelligent matching of the manufacturer and the product to which the vulnerability belongs is realized, and the workload of manual matching is greatly saved.

Description

Vulnerability vendor name matching method
Technical Field
Embodiments of the present disclosure relate generally to the field of big data technology, and more particularly, to a method, an apparatus, a device, and a computer-readable storage medium for matching vulnerability vendor names.
Background
In practical applications, for simplicity of expression, a long name is generally described in a short form, and particularly, in a text with strict word number requirements, the long name is often recorded in a short form, such as Tencent, which is a short form of Tencent computer system Limited, and some part of the total name of the company is used as a short form and is easily matched in a manner of containing character strings. However, the abbreviation of many companies is in other forms, for example, the abbreviation of chinese banking limited company is abbreviated as "zhongchang", and the abbreviation of chinese petrochemical limited company is abbreviated as "zhongpetrochemical", and it is difficult to obtain a good effect directly by character string fuzzy matching when the abbreviations are spliced from different parts of the company's whole name. In addition, the short names of some companies may exist in various forms, for example, the short name of the Chinese eastern aviation company can be east aviation or east aviation.
In summary, various short forms make it difficult to correctly recognize string matches. The traditional solution is to maintain the Mapping relation of a company as a common knowledge base, but if the solution is only based on the common knowledge base, the maintenance and updating of the common knowledge base become a great problem due to the large number of companies and the change of the companies with time.
Disclosure of Invention
According to the embodiment of the disclosure, a matching scheme of vulnerability vendor names is provided.
In a first aspect of the disclosure, a method for matching vulnerability vendor names is provided. The method comprises the following steps: acquiring vulnerability description information input by a user;
performing word segmentation processing on the vulnerability description information to obtain one or more identification words;
determining similarity between the identification words and the abbreviation in the constructed abbreviation set of the vulnerability vendor;
determining the manufacturer abbreviation with the highest score based on the similarity and the corresponding weight value of the corresponding manufacturer abbreviation; and taking the manufacturer with the highest score as the whole manufacturer corresponding to the vulnerability description information.
Further, the vulnerability vendor short form set comprises a first vulnerability vendor short form set and a second vulnerability vendor short form set; the vulnerability manufacturer abbreviation set comprises a weight value corresponding to each vulnerability abbreviation;
the first vulnerability vendor is a set for short, and is constructed in the following mode:
acquiring a vulnerability vendor name sample data set; the vulnerability vendor name sample data set comprises at least one vulnerability vendor full name;
performing word segmentation on the vulnerability vendor name sample data set to obtain one or more words;
extracting the one or more words according to a preset extraction rule to obtain a first vulnerability manufacturer short set;
the second vulnerability manufacturer is called a set for short and is constructed through a vulnerability common sense library; the vulnerability common sense library comprises vulnerability short names and vulnerability full names corresponding to the vulnerability short names.
Further, the performing word segmentation on the vulnerability vendor name sample data set to obtain one or more words includes:
dividing the vulnerability manufacturer name in the vulnerability manufacturer name sample data set into a plurality of words through a preset word segmentation dictionary;
constructing a directed acyclic graph of segmented words based on the plurality of words;
and constructing a segmentation combination of the vulnerability vendor name sample based on the directed acyclic graph to obtain one or more words.
Further, the performing word segmentation processing on the vulnerability description information to obtain one or more identification words includes:
preprocessing the vulnerability description information, removing interference information and unifying text formats;
and performing word segmentation processing on the preprocessed vulnerability description information through a word segmentation tool to obtain one or more identification words.
Further, the preset extraction rule includes:
determining part-of-speech combinations of the one or more words, and setting corresponding extraction rules according to the part-of-speech combinations;
if the part of speech combination is a distinguishing word, a noun and others, extracting a first character in the distinguishing word and a word connected with the distinguishing word;
extracting all words contained in the regional words and a word after connecting the regional words; or
Extracting a word after connecting the differential words;
if the part of speech combination is a distinguishing word, a noun, a limiting word and others, extracting the noun and the limiting word;
extracting the distinguishing words and nouns; or
Extracting the noun;
if the part of speech combination is noun, first qualifier, second qualifier, distinguisher and others, extracting the first character in the noun and the first qualifier;
extracting the noun, the first qualifier and the second qualifier; or
And extracting the distinguishing words and the nouns.
Further, the vulnerability is abbreviated as a corresponding weight value, and is determined in the following manner:
matching the short names in the vulnerability manufacturer short name set with a preset text library, determining the word frequency of each short name in the preset text library, and determining the weight of each vulnerability short name according to the word frequency.
Further, still include:
and verifying the matched vulnerability manufacturer full name, and adjusting the weight value of the corresponding vulnerability manufacturer for short according to the verification result.
In a second aspect of the present disclosure, a matching apparatus for vulnerability vendor names is provided. The device includes:
the acquisition module is used for acquiring vulnerability description information input by a user;
the word segmentation module is used for carrying out word segmentation processing on the vulnerability description information to obtain one or more identification words;
the determining module is used for determining the similarity between the identification words and the abbreviation in the constructed abbreviation set of the vulnerability vendor;
the matching module is used for determining the manufacturer abbreviation with the highest score based on the similarity and the corresponding weight value of the manufacturer abbreviation; and taking the manufacturer with the highest score as the whole manufacturer corresponding to the vulnerability description information.
In a third aspect of the disclosure, an electronic device is provided. The electronic device includes: a memory having a computer program stored thereon and a processor implementing the method as described above when executing the program.
In a fourth aspect of the present disclosure, a computer readable storage medium is provided, having stored thereon a computer program, which when executed by a processor, implements a method as in accordance with the first aspect of the present disclosure.
According to the matching method of the vulnerability vendor names, vulnerability description information input by a user is obtained; performing word segmentation processing on the vulnerability description information to obtain one or more identification words; determining similarity between the identification words and the abbreviation in the constructed abbreviation set of the vulnerability vendor; and determining the full name of the vulnerability manufacturer corresponding to the vulnerability description information based on the similarity and the corresponding weight value of the manufacturer short for corresponding, realizing intelligent matching of the manufacturer and the product to which the vulnerability belongs, and greatly saving the workload of manual matching.
It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:
FIG. 1 illustrates a flow diagram of a vulnerability vendor name matching method according to an embodiment of the present disclosure;
FIG. 2 illustrates an example diagram of vulnerability details according to an embodiment of the present disclosure;
FIG. 3 shows a code rule diagram according to an embodiment of the present disclosure;
FIG. 4 shows a block diagram of a vulnerability vendor name matching apparatus according to an embodiment of the present disclosure;
FIG. 5 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Fig. 1 shows a flowchart of a vulnerability vendor name matching method 100 according to an embodiment of the present disclosure, where the method 100 includes:
and S110, acquiring vulnerability description information input by a user.
In some embodiments, the manner of acquiring the vulnerability description information includes: description information input by a user, locally stored description information, vulnerability information captured through a crawler or other means, and the like.
Wherein, the vulnerability description information includes: the vulnerability name, vulnerability number, vulnerability content, vulnerability author, vulnerability capture time, vulnerability capture source and/or vulnerability access link, etc., as shown in fig. 2, fig. 2 shows vulnerability description information with vulnerability label CVE-2020 and 36372.
And S120, performing word segmentation processing on the vulnerability description information to obtain one or more identification words.
In some embodiments, the vulnerability description information is preprocessed, so that interference information in the vulnerability description information is removed, and a text format of the vulnerability description information is unified, so that word segmentation processing can be performed better in the following process. For example, punctuation marks, special marks, redundant spaces and the like in the vulnerability description information are removed.
In some embodiments, the vulnerability description information may be subjected to word segmentation processing by a word segmentation tool (e.g., jieba) to obtain one or more segmented words, where one segmented word may include multiple characters, e.g., the regional word "beijing" includes the characters "north" and "jing".
Referring to fig. 2, the vulnerability description information with the vulnerability label CVE-2020-.
S130, determining the similarity between the identification words and the short names in the constructed short name set of the vulnerability vendors.
In some embodiments, the vulnerability vendor abbreviation set includes a first vulnerability vendor abbreviation set and a second vulnerability vendor abbreviation set; the vulnerability manufacturer abbreviation set comprises a weight value corresponding to each vulnerability abbreviation;
the first vulnerability vendor set can be constructed in the following way:
acquiring a vulnerability manufacturer name sample data set through a network, various databases and/or manual generation and other modes; the vulnerability vendor name sample data set comprises at least one vulnerability vendor full name, and usually comprises all required vulnerability vendor full names;
further, word segmentation processing can be performed on the vulnerability vendor name sample data set through a word segmentation dictionary, and vulnerability vendor names in the vulnerability vendor name sample data set are segmented into a plurality of words; constructing a directed acyclic graph of segmented words based on the plurality of words; and constructing a segmentation combination of the vulnerability vendor name sample based on the directed acyclic graph to obtain one or more words. For example, the "china bank stocks limited company" is divided into china, bank, stocks and limited company, and the obtainable participles are as follows: china Bank, China shares, China Co., Ltd, Bank shares, etc.; the word segmentation dictionary can be constructed according to the modes of historical word segmentation records and/or manual experience input and the like;
further, if a word segmentation dictionary cannot be called currently or the existing word segmentation dictionary cannot perform word segmentation on the vulnerability vendor name sample data set, that is, when the word segmentation dictionary does not contain word segmentation rules for the vulnerability vendor name sample data set, the word segmentation processing can be performed on the vulnerability vendor name sample data set in the following manner:
marking the vulnerability vendor name sample data set based on the position of each word in the vulnerability vendor name to obtain a sequence marking result of the vulnerability name sample data; for example, according to the position of each word in the manufacturer name, "Chinese Bank stocks GmbH" is divided into China, silver, line, stock, shares, limited, official and department, and the numbers are 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10;
processing the sequence labeling result through a hidden Markov model to obtain one or more words;
further, extracting the one or more words according to a preset extraction rule to obtain a first vulnerability manufacturer short set;
specifically, a part-of-speech combination of the one or more words is determined through a semantic analysis method, and a corresponding extraction rule is set according to the part-of-speech combination;
if the parts of speech are combined into distinct words, nouns and others, usually the "others" include words describing the properties of companies such as member of shares, limited companies, limited partners and limited responsibilities, such as chinese (distinct words) bank (nouns) shares, member of shares (others), then the following extraction methods are included:
a, extracting a first character in a distinguished word and connecting a word behind the distinguished word, namely a middle line;
b, extracting all words contained in the regional words and connecting one word behind the regional words, namely the Chinese bank;
c, extracting a word after the connecting differential word, namely a bank;
if the part of speech combination is a distinguishing word, a noun, a limiting word (adverb) and others, the following extraction modes are included:
a, extracting the nouns and the qualifiers;
b, extracting the distinguishing words and the nouns;
c, extracting the nouns;
if the part of speech combination is nouns, qualifiers (first qualifiers), qualifiers (second qualifiers), distinguishers and others, the following extraction methods are included:
a, extracting a first word in the noun and the first qualifier;
b, extracting the noun, the first qualifier and the second qualifier;
c, extracting the distinguishing words and the nouns;
if the distinguishing word is represented by the letter R; nouns are represented by the letter X; the qualifier is denoted by the symbol I; others are denoted by the letter O; the above-mentioned participles can be represented as code rules as shown in fig. 3.
For example, if the part-of-speech combinations are distinguished words, nouns and others, it can be denoted as RXO, whose extraction rules include "R [1-1] X", "RX" and "X"; the said "R1-1" represents the first word of the first differentiating word, such as the move in the abbreviation of China Mobile Limited liability company, which can be expressed as RXO = > R1-1 ] X;
if the part of speech combination is a distinguishing word, a noun, a qualifier (adverb) and others, it can be denoted as RXIO, and its extraction rules include "XI", "RX", and "X";
if the parts of speech are combined into nouns, qualifiers (first qualifiers), qualifiers (second qualifiers), distinguishers and others, it can be denoted "XIIRO", whose extraction rules include "XI [1 ]", "XII", "RX", and "X";
the second vulnerability vendor set can be constructed through a vulnerability common sense library; the vulnerability common sense library comprises vulnerability short names and vulnerability full names corresponding to the vulnerability short names; the vulnerability common sense library can be an existing vulnerability database such as a national information security vulnerability library (CNNVD), and can also be established according to vulnerability information by acquiring the vulnerability information of a required type according to user requirements.
In some embodiments, the weight value corresponding to the vulnerability may be determined in the following manner:
and acquiring a text (text library) containing the vulnerability abbreviation. Specifically, a web crawler algorithm can be adopted to obtain the content from each large website (webpage); or from an existing database.
Matching the short names in the vulnerability manufacturer short name set with a preset text library, determining the word frequency of each short name in the preset text library, and determining the weight of each vulnerability short name according to the word frequency; the text library comprises a plurality of texts containing vulnerability abbreviations. If a text about Shenzhen Tengchen computer systems Limited is matched from the text library, extracting all abbreviation (e.g., Tengchen, Shenzhen Shengchen, etc.) about Shenzhen Tengchen computer systems Limited in the vulnerability manufacturer abbreviation set, comparing the extracted abbreviation with the text, and determining the occurrence frequency of each abbreviation, i.e., determining the word frequency of each abbreviation; and determining the corresponding weight according to the word frequency of each abbreviation, wherein the higher the word frequency is, the greater the weight is.
In some embodiments, the similarity between the identification word obtained in step S120 and the abbreviation in the constructed set of abbreviation of vulnerability vendor may be calculated by a shortest edit distance matching method.
S140, determining the manufacturer abbreviation with the highest score based on the similarity and the corresponding weight value of the corresponding manufacturer abbreviation; and taking the manufacturer with the highest score as the whole manufacturer corresponding to the vulnerability description information.
In some embodiments, the score of each vulnerability abbreviation may be determined by a product of the similarity and a weight value corresponding to the manufacturer abbreviation, where the larger the product is, the higher the score is, and the manufacturer full name corresponding to the abbreviation with the highest score is taken as the vulnerability manufacturer full name corresponding to the vulnerability description information.
Further, still include:
and verifying the matched vulnerability manufacturer full name, and adjusting the weight value of the corresponding vulnerability manufacturer for short according to the verification result.
According to the embodiment of the disclosure, the following technical effects are achieved:
by setting the set for short for the vulnerability manufacturer and the extraction rule, the intelligent and accurate matching of the manufacturer and the product to which the vulnerability belongs is realized, and the workload of manual matching is greatly saved.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.
Fig. 4 shows a block diagram of a vulnerability vendor name matching apparatus 400 according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 400 includes:
an obtaining module 410, configured to obtain vulnerability description information input by a user;
a word segmentation module 420, configured to perform word segmentation processing on the vulnerability description information to obtain one or more identification words;
a determining module 430, configured to determine similarity between the identification word and an abbreviation in the constructed abbreviation set of vulnerability vendors;
a matching module 440, configured to determine a manufacturer abbreviation with a highest score based on the similarity and a weight value corresponding to the manufacturer abbreviation; and taking the manufacturer with the highest score as the whole manufacturer corresponding to the vulnerability description information.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the described module may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
FIG. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure. As shown, device 500 includes a Central Processing Unit (CPU) 501 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 502 or loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the device 500 can also be stored. The CPU 501, ROM 502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The processing unit 501 performs the various methods and processes described above, such as the method 100. For example, in some embodiments, the method 100 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When loaded into RAM503 and executed by CPU 501, may perform one or more of the steps of method 100 described above. Alternatively, in other embodiments, CPU 501 may be configured to perform method 100 in any other suitable manner (e.g., by way of firmware).
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (10)

1. A vulnerability vendor name matching method is characterized by comprising the following steps:
acquiring vulnerability description information input by a user;
performing word segmentation processing on the vulnerability description information to obtain one or more identification words;
determining similarity between the identification words and the abbreviation in the constructed abbreviation set of the vulnerability vendor;
determining the manufacturer abbreviation with the highest score based on the similarity and the corresponding weight value of the corresponding manufacturer abbreviation; and taking the manufacturer with the highest score as the whole manufacturer corresponding to the vulnerability description information.
2. The method of claim 1, wherein the set of vulnerability vendors includes a first set of vulnerability vendors and a second set of vulnerability vendors; the vulnerability manufacturer abbreviation set comprises a weight value corresponding to each vulnerability abbreviation;
the first vulnerability vendor is a set for short, and is constructed in the following mode:
acquiring a vulnerability vendor name sample data set; the vulnerability vendor name sample data set comprises at least one vulnerability vendor full name;
performing word segmentation on the vulnerability vendor name sample data set to obtain one or more words;
extracting the one or more words according to a preset extraction rule to obtain a first vulnerability manufacturer short set;
the second vulnerability manufacturer is called a set for short and is constructed through a vulnerability common sense library; the vulnerability common sense library comprises vulnerability short names and vulnerability full names corresponding to the vulnerability short names.
3. The method of claim 2, wherein the performing a word segmentation on the vulnerability vendor name sample data set to obtain one or more words comprises:
dividing the vulnerability manufacturer name in the vulnerability manufacturer name sample data set into a plurality of words through a preset word segmentation dictionary;
constructing a directed acyclic graph of segmented words based on the plurality of words;
and constructing a segmentation combination of the vulnerability vendor name sample based on the directed acyclic graph to obtain one or more words.
4. The method of claim 3, wherein the performing word segmentation processing on the vulnerability description information to obtain one or more identification words comprises:
preprocessing the vulnerability description information, removing interference information and unifying text formats;
and performing word segmentation processing on the preprocessed vulnerability description information through a word segmentation tool to obtain one or more identification words.
5. The method of claim 4, wherein the preset extraction rules comprise:
determining part-of-speech combinations of the one or more words, and setting corresponding extraction rules according to the part-of-speech combinations;
if the part of speech combination is a distinguishing word, a noun and others, extracting a first character in the distinguishing word and a word connected with the distinguishing word;
extracting all words contained in the regional words and a word after connecting the regional words; or
Extracting a word after connecting the differential words;
if the part of speech combination is a distinguishing word, a noun, a limiting word and others, extracting the noun and the limiting word;
extracting the distinguishing words and nouns; or
Extracting the noun;
if the part of speech combination is noun, first qualifier, second qualifier, distinguisher and others, extracting the first character in the noun and the first qualifier;
extracting the noun, the first qualifier and the second qualifier; or
And extracting the distinguishing words and the nouns.
6. The method of claim 5, wherein the weight value corresponding to the vulnerability is determined by:
matching the short names in the vulnerability manufacturer short name set with a preset text library, determining the word frequency of each short name in the preset text library, and determining the weight of each vulnerability short name according to the word frequency.
7. The method of claim 6, further comprising:
and verifying the matched vulnerability manufacturer full name, and adjusting the weight value of the corresponding vulnerability manufacturer for short according to the verification result.
8. A matching device for vulnerability vendor names, comprising:
the acquisition module is used for acquiring vulnerability description information input by a user;
the word segmentation module is used for carrying out word segmentation processing on the vulnerability description information to obtain one or more identification words;
the determining module is used for determining the similarity between the identification words and the abbreviation in the constructed abbreviation set of the vulnerability vendor;
the matching module is used for determining the manufacturer abbreviation with the highest score based on the similarity and the corresponding weight value of the manufacturer abbreviation; and taking the manufacturer with the highest score as the whole manufacturer corresponding to the vulnerability description information.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the program, implements the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202111027098.6A 2021-09-02 2021-09-02 Vulnerability vendor name matching method Active CN113468315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111027098.6A CN113468315B (en) 2021-09-02 2021-09-02 Vulnerability vendor name matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111027098.6A CN113468315B (en) 2021-09-02 2021-09-02 Vulnerability vendor name matching method

Publications (2)

Publication Number Publication Date
CN113468315A true CN113468315A (en) 2021-10-01
CN113468315B CN113468315B (en) 2021-12-10

Family

ID=77867447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111027098.6A Active CN113468315B (en) 2021-09-02 2021-09-02 Vulnerability vendor name matching method

Country Status (1)

Country Link
CN (1) CN113468315B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330331A (en) * 2021-12-27 2022-04-12 北京天融信网络安全技术有限公司 Method and device for determining importance of word segmentation in link

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093478A (en) * 2007-07-25 2007-12-26 中国科学院计算技术研究所 Method and system for identifying Chinese full name based on Chinese shortened form of entity
CN101930435A (en) * 2009-10-27 2010-12-29 深圳市北科瑞声科技有限公司 Method and system for retrieving organization names
CN104035918A (en) * 2014-06-12 2014-09-10 华东师范大学 Chinese organization name abbreviation recognition system adopting context feature matching
CN108829661A (en) * 2018-05-09 2018-11-16 成都信息工程大学 A kind of subject of news title extracting method based on fuzzy matching
CN109635285A (en) * 2018-11-26 2019-04-16 平安科技(深圳)有限公司 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
CN110096571A (en) * 2019-04-10 2019-08-06 北京明略软件系统有限公司 A kind of mechanism name abbreviation generation method and device, computer readable storage medium
CN110895961A (en) * 2019-10-29 2020-03-20 泰康保险集团股份有限公司 Text matching method and device in medical data
CN111191242A (en) * 2019-08-09 2020-05-22 腾讯科技(深圳)有限公司 Vulnerability information determination method and device, computer readable storage medium and equipment
CN111507108A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Alias generation method and device, electronic equipment and computer readable storage medium
CN111949774A (en) * 2020-07-08 2020-11-17 深圳鹏锐信息技术股份有限公司 Intelligent question answering method and system
CN112613299A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Method and device for constructing enterprise synonym library and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093478A (en) * 2007-07-25 2007-12-26 中国科学院计算技术研究所 Method and system for identifying Chinese full name based on Chinese shortened form of entity
CN101930435A (en) * 2009-10-27 2010-12-29 深圳市北科瑞声科技有限公司 Method and system for retrieving organization names
CN104035918A (en) * 2014-06-12 2014-09-10 华东师范大学 Chinese organization name abbreviation recognition system adopting context feature matching
CN108829661A (en) * 2018-05-09 2018-11-16 成都信息工程大学 A kind of subject of news title extracting method based on fuzzy matching
CN109635285A (en) * 2018-11-26 2019-04-16 平安科技(深圳)有限公司 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
CN110096571A (en) * 2019-04-10 2019-08-06 北京明略软件系统有限公司 A kind of mechanism name abbreviation generation method and device, computer readable storage medium
CN111191242A (en) * 2019-08-09 2020-05-22 腾讯科技(深圳)有限公司 Vulnerability information determination method and device, computer readable storage medium and equipment
CN110895961A (en) * 2019-10-29 2020-03-20 泰康保险集团股份有限公司 Text matching method and device in medical data
CN111507108A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Alias generation method and device, electronic equipment and computer readable storage medium
CN111949774A (en) * 2020-07-08 2020-11-17 深圳鹏锐信息技术股份有限公司 Intelligent question answering method and system
CN112613299A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Method and device for constructing enterprise synonym library and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J. MIAO 等: "The Recognition Research of the Personal Name Abbreviations Based on the SGU Frame of the HNC Theory", 《2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING》 *
郭晖 等: "基于属性关联相似度的中文简称匹配算法研究", 《计算机与数字工程》 *
钟良伍 等: "基于中文机构名的检索方法研究", 《中文信息学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330331A (en) * 2021-12-27 2022-04-12 北京天融信网络安全技术有限公司 Method and device for determining importance of word segmentation in link
CN114330331B (en) * 2021-12-27 2022-09-16 北京天融信网络安全技术有限公司 Method and device for determining importance of word segmentation in link

Also Published As

Publication number Publication date
CN113468315B (en) 2021-12-10

Similar Documents

Publication Publication Date Title
US10095780B2 (en) Automatically mining patterns for rule based data standardization systems
CN112445775B (en) Fault analysis method, device, equipment and storage medium of photoetching machine
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN112507118A (en) Information classification and extraction method and device and electronic equipment
CN111178080B (en) Named entity identification method and system based on structured information
CN113468315B (en) Vulnerability vendor name matching method
CN111985212A (en) Text keyword recognition method and device, computer equipment and readable storage medium
CN113836316B (en) Processing method, training method, device, equipment and medium for ternary group data
CN115017898A (en) Sensitive text recognition method and device, electronic equipment and storage medium
CN115098657A (en) Method, apparatus and medium for natural language translation database query
CN114444465A (en) Information extraction method, device, equipment and storage medium
CN114092948A (en) Bill identification method, device, equipment and storage medium
CN113360685A (en) Method, device, equipment and medium for processing note content
CN112667208A (en) Translation error recognition method and device, computer equipment and readable storage medium
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN114662469B (en) Emotion analysis method and device, electronic equipment and storage medium
CN115906817A (en) Keyword matching method and device for cross-language environment and electronic equipment
CN115600592A (en) Method, device, equipment and medium for extracting key information of text content
CN114647727A (en) Model training method, device and equipment applied to entity information recognition
CN115294593A (en) Image information extraction method and device, computer equipment and storage medium
CN115017385A (en) Article searching method, device, equipment and storage medium
CN111199170B (en) Formula file identification method and device, electronic equipment and storage medium
US20090150141A1 (en) Method and system for learning second or foreign languages
CN112541075A (en) Method and system for extracting standard case time of warning situation text
CN111191095A (en) Webpage data acquisition method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant