CN114006765A - Method and device for detecting sensitive information in message and electronic equipment - Google Patents

Method and device for detecting sensitive information in message and electronic equipment Download PDF

Info

Publication number
CN114006765A
CN114006765A CN202111291135.4A CN202111291135A CN114006765A CN 114006765 A CN114006765 A CN 114006765A CN 202111291135 A CN202111291135 A CN 202111291135A CN 114006765 A CN114006765 A CN 114006765A
Authority
CN
China
Prior art keywords
sensitive information
information
message
server
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111291135.4A
Other languages
Chinese (zh)
Inventor
徐雅静
叶红
金驰
韩玮祎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111291135.4A priority Critical patent/CN114006765A/en
Publication of CN114006765A publication Critical patent/CN114006765A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0245Filtering by information in the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies

Abstract

The present disclosure provides a method, an apparatus and an electronic device for detecting sensitive information in a message, which can be used in the field of information security or finance, and the method includes: responding to the obtained message, and obtaining information sent by a server in the message; traversing a sensitive information database by using information sent by a server, wherein the sensitive information database comprises a tree structure aiming at sensitive information; and if the information sent by the server contains sensitive information, outputting detection result correlation information according to a preset strategy.

Description

Method and device for detecting sensitive information in message and electronic equipment
Technical Field
The present disclosure relates to the field of information security technologies and finance, and in particular, to a method and an apparatus for detecting sensitive information in a packet, and an electronic device.
Background
In the related art, with the continuous development of information technology, a financial institution (such as a bank) accumulates a large amount of data in daily business activities. Besides supporting the operation of business processes in the front desk of a bank, the data are increasingly used in the fields of financial statements, product pricing, user information, performance assessment, decision support and the like. The essence behind the daily business decision process of a financial institution is the process of data generation, storage, transfer and utilization. However, if the data is leaked in the transmission process, the data is lost to the financial institution or the user.
Disclosure of Invention
In view of this, the present disclosure provides a method, an apparatus, and an electronic device for detecting sensitive information in a packet, so as to at least partially solve the problem that data is leaked in a transmission link, which causes loss to a financial institution or a user.
One aspect of the present disclosure provides a method for detecting sensitive information in a packet, including: responding to the obtained message, and obtaining information sent by a server in the message; traversing a sensitive information database by using information sent by a server, wherein the sensitive information database comprises a tree structure aiming at sensitive information; and if the information sent by the server contains sensitive information, outputting detection result correlation information according to a preset strategy.
In certain embodiments, the above method further comprises: and constructing a sensitive information database based on the acquired sensitive information.
In some embodiments, constructing the sensitive information database based on the collected sensitive information comprises: acquiring sensitive information, wherein the sensitive information comprises at least one of server sensitive information or user sensitive information; for each piece of sensitive information, generating a prefix tree corresponding to the sensitive information, wherein each prefix tree comprises: the system comprises a root node and M child nodes, wherein the root node and each child node respectively indicate a character string, and the character string comprises at least one of numbers, letters, words, Chinese characters or words; and constructing a set comprising N prefix trees, wherein M and N are integers greater than 0.
In some embodiments, for each sensitive information, generating the prefix tree corresponding to the sensitive information comprises: creating a root node, wherein the root node corresponds to a first character string of the sensitive information; and sequentially inserting child nodes corresponding to the character strings behind the newly created node according to the sequence of the character strings in the sensitive information from front to back until the child node corresponding to the last character string in the sensitive information completes the insertion operation.
In some embodiments, the sensitive information has categories including: at least one of a database version category, an application system information category, a middleware category, an internal port category, an address category, a certificate number category, or a contact category. Correspondingly, the method further comprises the following steps: after generating a prefix tree corresponding to the sensitive information, combining the prefix trees with the sensitive information of the same category to obtain a category prefix tree; and constructing a set comprising N prefix trees comprises: a set comprising Q category prefix trees is constructed, where Q is a positive integer less than or equal to N.
In some embodiments, traversing the sensitive information database with information sent by the server comprises: performing word segmentation on information sent by a server to obtain at least one character string; traversing each prefix tree in the set by using at least one character string until all the prefix trees are traversed or the bottom-layer child nodes of the prefix trees are traversed; if all prefix trees are traversed, determining that the message does not include sensitive information; and if the message traverses to the bottom-layer child node of the prefix tree, determining that the message comprises sensitive information.
In some embodiments, the prefix tree has categories. Correspondingly, the method further comprises the following steps: and if the information sent by the server contains the sensitive information, outputting the category to which the prefix tree matched with the sensitive information belongs.
In some embodiments, outputting the detection result association information according to the preset policy includes: determining the sensitivity level and/or threat level of the sensitive information; and outputting the sensitivity level and/or the threat degree of the sensitive information, and outputting a message based on the sensitivity level and/or the threat degree of the sensitive information.
In some embodiments, obtaining the information sent by the server in the message includes: filtering general non-server information from the message.
One aspect of the present disclosure provides an apparatus for detecting sensitive information in a packet, including: the device comprises an information acquisition module, an information traversal module and a message output module. The information acquisition module is used for responding to the acquired message and acquiring the information sent by the server in the message. The information traversing module is used for traversing the sensitive information database by utilizing the information sent by the server, wherein the sensitive information database comprises a tree structure aiming at the sensitive information. And the message output module is used for outputting the detection result correlation information according to a preset strategy if the information sent by the server contains sensitive information.
Another aspect of the present disclosure provides an electronic device including one or more processors and a storage device, where the storage device is configured to store executable instructions, and the executable instructions, when executed by the processors, implement the method for detecting sensitive information in a message as above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the above method for detecting sensitive information in a message when the instructions are executed.
Another aspect of the present disclosure provides a computer program comprising computer executable instructions for implementing the method of detecting sensitive information in a message as above when executed.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an exemplary system architecture to which a method for detecting sensitive information in a message and an apparatus for detecting sensitive information in a message may be applied according to an embodiment of the present disclosure;
fig. 2 schematically shows a flowchart of a method for detecting sensitive information in a message according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a sensitive information database diagram according to an embodiment of the disclosure;
FIG. 4 schematically illustrates a diagram of a prefix tree according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a diagram of a category prefix tree, according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow diagram of a method of traversing sensitive information, in accordance with an embodiment of the present disclosure;
FIG. 7 is a block diagram that schematically illustrates an apparatus for detecting sensitive information in a message, in accordance with an embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of a system for detecting sensitive information in a message according to an embodiment of the disclosure; and
fig. 9 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Under the vigorous development of modern information technology construction, financial institutions such as banks accumulate a large amount of data in daily operation activities. Taking a bank as an example, in addition to supporting the operation of business processes in the front desk of the bank, the data is increasingly used in fields such as financial statements, product pricing, customer information, performance assessment, decision support and the like. The essence behind the decision making process of daily operation of a bank is the process of data generation, storage, transmission and utilization. The use value of the data assets is fully mined, and great economic benefit and competitive advantage can be brought to financial production. However, once these business data are lost, damaged or leaked, there is a possibility of causing huge economic loss or serious adverse effect on the financial institution in society, law, credit, brand. Balancing the convenience and security of data usage has become a significant challenge in the transformation and development of financial institutions.
In the aspect of data security, national regulations, such as the enactment of "network security law" and the laws and regulations about data security that may be issued in the future, require the protection of information related to data security from the aspect of legal system. The regulations of various industries such as 'data governance guide of banking financial institutions', 'information technology management approach of securities fund operation institutions', 'data classification and grading guide of securities future industry' and 'technical specification of personal financial information protection' stipulate related work of the industry data safety according to the characteristics of the industries. The related ministration of the country, the association and the like have published national standards such as "temporary regulations on the trade secret protection of central enterprises", "maturity model of data security capability of information security technology", and the like. Policy and regulation, industry standards and national regulations jointly form a relevant regulation system for data security.
In order to realize safe management and operation, a financial institution needs to meet the requirements of data security and compliance construction required by regulatory agencies such as relevant ministries of the country and well protect data. The method, the device and the electronic equipment for detecting the sensitive information in the message can quickly detect the sensitive information from a large number of messages and provide a favorable basis for preventing the sensitive information from being leaked subsequently.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.
Fig. 1 schematically shows an exemplary system architecture to which a method for detecting sensitive information in a message and an apparatus for detecting sensitive information in a message may be applied according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. It should be noted that the method, the apparatus, and the electronic device for detecting sensitive information in a message provided in the embodiments of the present disclosure may be used in the information security field for detecting the relevant aspects of the sensitive information in the message, may also be used in various fields other than the information security field, such as the field of detecting the sensitive information in the message, and may also be applied in the financial field. The method, the device and the electronic device for detecting sensitive information in a message provided by the embodiment of the disclosure are not limited in application field.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 may include a plurality of gateways, routers, hubs, network wires, etc. to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with other terminal devices and the server 105 via the network 104 to receive or send information, etc., such as sending service requests, service data, user identity information, etc. The terminal devices 101, 102, 103 may be installed with various communication client applications, such as web browser applications, banking-like applications, e-commerce-like applications, search-like applications, office-like applications, instant messaging tools, mailbox clients, social platform software, etc. (just examples).
The terminal devices 101, 102, 103 include, but are not limited to, smart phones, augmented reality devices, tablets, laptop portable computers, electronic devices with network connectivity, etc.
Server 105 may receive a request to detect sensitive information in a message, etc., and process the request. For example, server 105 may be a back office management server, a cluster of servers, and the like. The background management server can analyze and process the received service request, information request and the like and feed back the processing result to the terminal equipment.
It should be noted that the method for detecting sensitive information in a message provided by the embodiment of the present disclosure may be executed by the server 105. Accordingly, the apparatus for detecting sensitive information in a message provided by the embodiment of the present disclosure may be disposed in the server 105. It should be understood that the number of terminal devices, networks, and servers are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flowchart of a method for detecting sensitive information in a message according to an embodiment of the present disclosure.
As shown in fig. 2, the method for detecting sensitive information in a message may include operations S210 to S230.
In operation S210, information transmitted by the server in the message is acquired in response to the acquired message.
In this embodiment, the message (message) may be generated and transmitted by at least one of a server, a proxy, a gateway, or the like. Taking a hypertext transfer protocol (HTTP) message as an example, the HTTP message may include a header and a data portion. The header is a meta-information header in the form of text, which describes the content and meaning of the message. These messages flow between the client, server and proxy.
The HTTP message may include a method (method), a request URL (request-URL), a version (version), a status-code (status-code), a cause-phrase (relay-phrase), a header (header), and an entity body (entity-body).
An HTTP request, an HTTP message is sent from the client to the proxy, which then sends the HTTP message to the server. After the server completes the request for the HTTP, the HTTP for protecting the processing result information is sent to the proxy from the server, and then sent to the client by the proxy.
The embodiment mainly aims at the problem that sensitive information may be included in a message sent by a server, so that the risk of sensitive information leakage is high. It is necessary to filter out server-related information from the message and detect the information. Because the risk of sensitive information leakage in the messages generated and sent by the gateway and the client is low, in order to improve the detection speed, the messages returned by the server can be detected only.
In some embodiments, in order to further improve the sensitive information detection efficiency, noise information in the message sent by the server may be filtered.
For example, obtaining the information sent by the server in the message may include: filtering general non-server information from the message. Wherein, the general non-server information includes but is not limited to: interface information, gateway information, etc.
In operation S220, a sensitive information database is traversed using information transmitted by a server, wherein the sensitive information database includes a tree structure for sensitive information.
In this embodiment, the sensitive information database may include a plurality of sensitive information, wherein at least a part of the sensitive information may be stored in a tree structure. Different from the related art in simple information matching, the tree structure can be used for matching in the application, and the matching speed is effectively improved. For example, the tree structure is hierarchical, requiring the installation levels to match sequentially from the root node to the bottom children. If the matching of 1 child node in the tree structure is unsuccessful, the follow-up child nodes of the tree structure do not need to be matched, the matching times are effectively reduced, and the matching efficiency is improved.
FIG. 3 schematically shows a sensitive information database diagram according to an embodiment of the disclosure.
As shown in fig. 3, if the interface address 1, the IP address 1, the identification number 1, and the like are set as sensitive information, a tree structure for each piece of sensitive information can be respectively constructed in the sensitive information data. When the tree structure is used for matching the sensitive information, the efficiency of identifying the sensitive information in the message can be effectively improved. The tree structure may be a prefix tree (Trie tree), a hash tree, and the like, which is not limited herein.
In operation S230, if it is determined that the information sent by the server includes sensitive information, the detection result association information is output according to a preset policy.
In this embodiment, when it is determined that the information sent by the server includes the sensitive information, the prompt information may be output, or the sensitive information in the message may be processed and then output. For example, the message may be intercepted and a prompt message may be sent to the client with the auditing authority to determine whether the message may be passed. For example, sensitive information in the message may be deleted, replaced with characters, and the like, and the processed message may be output.
In some embodiments, the sensitivity level and the threat level may be further determined, and the manner of outputting the message may be determined according to the sensitivity level and the threat level.
For example, outputting the detection result association information according to the preset policy may include the following operations. First, the sensitivity level and/or threat level of sensitive information is determined. And then outputting the sensitivity level and/or the threat degree of the sensitive information, and outputting a message based on the sensitivity level and/or the threat degree of the sensitive information.
Wherein, the sensitivity level and the threat degree can be determined by means of data matching. For example, the sensitivity level database includes a mapping between the sensitivity information and the sensitivity level. For example, the threat level database includes a mapping between sensitive information and threat levels.
For sensitive information with low sensitivity and low threat degree, the message can be automatically processed, for example, the sensitive information is subjected to character replacement. The method has the advantages that the sensitive information with high sensitivity and high threat degree can be intercepted, and the warning information is sent to the client with the processing authority, so that management personnel and development personnel can conveniently negotiate to upgrade the service, and the risk that the sensitive information is leaked is reduced. For sensitive information with high sensitivity and low threat degree, character replacement can be carried out on the sensitive information, and prompt information is sent to a client with processing authority so as to carry out continuous statistical analysis and the like.
According to the method for detecting the sensitive information in the message, whether the sensitive information is leaked from the server of the application system is identified based on the tree structure model, whether the information transmitted by the server contains the sensitive information can be effectively detected, and the threat degree of the sensitive information leaked from the server is evaluated. In addition, favorable information is provided for further avoiding the leakage risk of sensitive information of the server, the risk possibly suffered by the system is reduced, and the application system is protected from being damaged.
In some embodiments, the sensitive information database may be pre-constructed and updated.
Specifically, the method further includes: and constructing a sensitive information database based on the acquired sensitive information. It should be noted that the sensitive information database is not required to be reconstructed every time the sensitive information is detected, and is a database that can be directly called. In addition, the sensitive information database may be updated when new sensitive data is present.
In some embodiments, building a sensitive information database based on the collected sensitive information may include the following operations.
First, sensitive information is obtained, wherein the sensitive information comprises at least one of server sensitive information or user sensitive information. For example, the sensitive information may be set according to policy requirements, social public opinion, hot events, enterprise architecture, laws and regulations, and the like. Specifically, the message may be acquired by a data acquisition method, and then possible sensitive information may be screened from the message. Such as forbidden words, insulting titles, confidential information, company confidential information, and the like.
Then, for each piece of sensitive information, generating a prefix tree corresponding to the sensitive information, wherein each prefix tree includes: the system comprises a root node and M child nodes, wherein the root node and each child node respectively indicate a character string. Wherein, the character string comprises at least one of numbers, letters, words, Chinese characters or words. For example, for a residential address, adjacent digits may be divided into the same string, while for a telephone number, individual digits may be divided into a string.
Next, a set is constructed that includes N prefix trees, where M and N are integers greater than 0.
The sensitive information database can be constructed in the above way.
Fig. 4 schematically illustrates a diagram of a prefix tree according to an embodiment of the present disclosure.
The prefix tree is also called a word lookup tree, is a tree structure, and is a variation of a hash tree. The prefix tree may be used to count, sort, and store a large number of strings (but not limited to strings). In addition, the method can be used for text word frequency statistics by a search engine system. The Trie tree can reduce query time by using the public prefix of the character string, so that meaningless character string comparison is reduced to the maximum extent, and the query efficiency is higher than that of the Hash tree. Specifically, the Trie tree has 3 basic properties: the root node contains no characters, and each child node except the root node contains only one character. The characters passing through the path from the root node to a certain child node are connected, and for the character string corresponding to the node, all the child nodes of each node contain different characters.
As shown in fig. 4, if the sensitive information is the street aster route X number of the wide-run street in sunny region in beijing, china, the sensitive information can be segmented based on a special dictionary (such as a place name dictionary, an address dictionary, etc.) or a dictionary, etc., to obtain the following segmentation results: china, Beijing City, Chaoyang district, Laiyuangyuan street, Qingyuan street number X. The constructed prefix tree may include a root node and 3 child nodes. The character strings corresponding to the nodes are respectively: china, Beijing City, Chaoyang district, Laiyuangyuan street, Qingyuan street number X.
In some embodiments, the prefix tree may be constructed as follows. For each sensitive information, generating a prefix tree corresponding to the sensitive information may include the following operations.
First, a root node is created, which corresponds to the first string of sensitive information. Referring to fig. 4, the character string corresponding to the root node may be china.
And then, according to the sequence of each character string in the sensitive information from front to back, sequentially inserting the child nodes corresponding to each character string behind the newly created node until the child node corresponding to the last character string in the sensitive information completes the insertion operation. Referring to fig. 4, there are 4 child nodes behind the root node, and the character strings corresponding to the 4 child nodes are: beijing City, Chaoyang district, Laiyuangyang street, Qingyuan No. X.
Specifically, the Trie contains a set of strings. The numbering of each node is added for ease of description. A character is identified on each edge of the tree. The characters may be characters in any one of the character sets. For example, for a string of characters that are all lower case letters, the character set is ' a ' -z '. For example, for a string of characters that are all numbers, the character set is '0' - '9'. For example, for a binary string, the character sets are 0 and 1. For example, how to create a corresponding Trie for a given set of strings W1, W2, W3, … WN. In fact, the creation of the Trie is realized by sequentially inserting W1, W2, W3 and … WN into the Trie from the root node only. The key to create a Trie tree is the insert operation of the Trie. Specifically, Trie generally supports two operations: insert (W): the insert operation is to add a string W to the collection. Search (S): the query operation is to query whether a string S is in the set. Through the insertion operation, a Trie tree can be created.
In some embodiments, in order to reduce the storage space occupied by sensitive information and reduce the number of Trie trees, the Trie trees may be merged according to a certain rule. In addition, the method can also be realized by preferentially inserting child nodes on the basis of the existing Trie tree when the Trie tree is created.
For example, assuming that a character string "in" is to be inserted, the start step is for the root node, i.e., node 0, which may be represented by P ═ 0. Firstly, the existing prefix tree is traversed, and whether the P is an edge of a connecting child node with an identifier i is determined. There is no edge, so we create a node, i.e., node No. 1, then set node No. 1 to be a child node of P ═ 0 (i.e., node No. 0), and identify the edge as i. Then move to node No. 1, i.e., let P be 1 and identify the edge as n.
Specifically, the sensitive information has categories including: at least one of a database version category, an application system information category, a middleware category, an internal port category, an address category, a certificate number category, or a contact category.
Accordingly, after generating the prefix tree corresponding to the sensitive information, the method may further include the following operation.
First, the prefix trees with the same category of sensitive information are merged to obtain a category prefix tree.
Then, constructing a set comprising N prefix trees comprises: a set comprising Q category prefix trees is constructed, where Q is a positive integer less than or equal to N.
Fig. 5 schematically illustrates a schematic diagram of a category prefix tree according to an embodiment of the present disclosure.
Referring to fig. 5, the sensitive information includes: the street is the plain aster street of the Guangyang street in Beijing City, China, the river-clearing pilot street of the Guangyang street in the Yangyang district in Beijing City, China, the Asian village street in the Yangyang district in Beijing City, China, and the like. The sensitive information belongs to a residential address or an office address, and the sensitive information has the same prefix of 'sunny district in Beijing City of China', and the root node, the first child node and the second child node can be merged. In addition, the third sub-nodes corresponding to the plain aster street number X of the wide-run street coming from the sunny region in Beijing, China and the plain river camp street number Y of the wide-run street coming from the sunny region in Beijing, China can be merged.
The number of the prefix trees is effectively reduced through the method, and the storage space occupied by the sensitive information database is reduced. The above combination can be expressed in at least two ways: for example, two trees are first constructed, and then the root node and child nodes having the same string are merged. For example, a first prefix tree for first sensitive information is first constructed, and then child nodes corresponding to different character strings in the first sensitive information in the second sensitive information are inserted into the first prefix tree.
FIG. 6 schematically shows a flow diagram of a method of traversing sensitive information, in accordance with an embodiment of the present disclosure.
As shown in fig. 6, traversing the sensitive information database using the information transmitted by the server may include operations S610 to S640.
In operation S610, a word is segmented for information transmitted by a server to obtain at least one character string. Specifically, a special dictionary or a dictionary may be used to perform word segmentation on information in a message to obtain a plurality of character strings.
In operation S620, for each prefix tree in the set, the prefix tree is traversed using at least one character string until all prefix trees are traversed or the bottom-level child nodes of the prefix tree are traversed.
In operation S630, if all prefix trees are traversed, it is determined that the packet does not include sensitive information. Wherein, if none of the prefix trees traverses to the bottom node, it indicates that the current character string is not sensitive information.
In operation S640, if traversing to the bottom-level child node of the prefix tree, it is determined that the packet includes sensitive information. Referring to fig. 5, the street of the wide-run street, the plain aster route X, the clear river, the middle street, and the suburban street in the sunny region of beijing city, respectively traverse to the bottom sub-node: the child node 10, the child node 11 and the child node 9, it can be determined that the three pieces of address information belong to sensitive information.
Specifically, referring to fig. 5, after intercepting the message information returned to the client by the server, it is queried whether a character string is address-class sensitive information. For example, the inquired address is the number X of the street aster route of the wide run street in the sunny region in beijing, china, firstly, whether the first character string "china" is in the root node No. 0 is searched, if so, the next subnode 1 is continuously matched, then, whether the second character string "beijing" is in the subnode 1 No. 1 is determined, if so, the next subnode is continuously searched downwards till the subnode at the bottom layer, for example, the subnode 10 No. 10, the conclusion is given, and the information is sensitive information. If the subnode of the bottom layer is not reached, the information is non-sensitive information, for example, the information is non-sensitive information in the western city of Beijing City in China. The principle of searching the information such as the ID card number is the same as the above.
Compared with the existing sensitive information detection method, the Trie can be used for detecting sensitive information such as mobile phone numbers, identity card numbers, names, addresses and the like, and has universality. In addition, during detection, only the root node is needed to be searched in sequence, and the searching method is simple and high in efficiency.
In some embodiments, the prefix tree has categories.
Correspondingly, the method further comprises the following steps: and if the information sent by the server contains the sensitive information, outputting the category to which the prefix tree matched with the sensitive information belongs. The category of sensitive information helps to determine the manner in which the sensitive information is processed. For example, the leakage of the mobile phone number does not cause the property of the user to have a large security risk, and the replacement processing can be carried out. For example, the leakage of the identification number and the bank card number causes great security risk to the property of the user, and besides the replacement process, some modification suggestions need to be provided for the service developer, so that the risk of leakage of sensitive information is further reduced.
In a specific embodiment, firstly, a message returned to the front end by the application system runtime server is intercepted.
And then, collecting and storing the intercepted messages in the previous step.
And then, extracting the collected messages, filtering, only retaining the information returned by the server, and removing redundant character strings.
And then, receiving the retained effective message information.
And then, traversing data in the sensitive information database by adopting a Trie tree algorithm aiming at the message information in the previous step, and matching out the sensitive information.
And then, receiving the sensitive information matched in the last step.
The detected sensitive information is then output.
In addition, sensitive information can be classified, graded and displayed.
The method for detecting the sensitive information provided by the embodiment aims to accurately detect the information returned by the server in full, and provides a favorable basis for preventing the sensitive information from being leaked. The core idea is to adopt Trie tree algorithm to effectively detect the information returned by the server, filter out sensitive information and sort the sensitive information. And further, the threat situation caused by the leakage of the sensitive information of the server is evaluated, and a powerful basis is provided for preventing the leakage of the sensitive information in follow-up precaution.
Another aspect of the present disclosure also provides a device for detecting sensitive information in a packet.
Fig. 7 is a block diagram schematically illustrating an apparatus for detecting sensitive information in a message according to an embodiment of the present disclosure.
As shown in fig. 7, the apparatus 700 may include: an information acquisition module 710, an information traversal module 720, and a message output module 730.
The information obtaining module 710 is configured to obtain, in response to the obtained packet, information sent by the server in the packet.
The information traversing module 720 is configured to traverse a sensitive information database using information sent by a server, where the sensitive information database includes a tree structure for sensitive information.
The message output module 730 is configured to output the detection result association information according to a preset policy if it is determined that the information sent by the server includes sensitive information.
Another aspect of the present disclosure also provides a system for detecting sensitive information in a packet.
Fig. 8 schematically shows a block diagram of a system for detecting sensitive information in a message according to an embodiment of the present disclosure.
As shown in fig. 8, the system for detecting sensitive information in a message may include: the system comprises a server information acquisition module, a server sensitive information detection-based module and a detection result output module.
The server information acquisition module is mainly used for acquiring and collecting information returned by the server, and the server returns the acquired information. The server-based sensitive information detection module mainly detects the server return information acquired by the server information acquisition module and detects whether the server return information contains sensitive information. And the detection result output module carries out classified evaluation and display on the detection result based on the server sensitive information detection module.
Specifically, the server information acquisition module is mainly used for intercepting messages returned to the application system by the server, acquiring the returned information of the server, storing the returned information and transmitting the information to the server-based sensitive information detection module. The server information collection module may include: the system comprises a server message intercepting unit, a server message collecting unit and a server message extracting unit.
The server message intercepting unit intercepts messages from the information returned to the front end of the application system by the server, stores the messages and provides the messages to the server message collecting unit.
The server message acquisition unit acquires the messages intercepted in the server message interception unit and collects the messages returned to the application system by the application system server.
The server message extraction unit extracts and classifies the messages collected in the server message collection unit.
The server-based sensitive information detection module is specifically used for receiving information, collected by the server information collection module, returned to the application system by the server, and filtering and detecting the information.
Specifically, the server sensitive information detection module may include a server information receiving unit, a server information filtering unit, a server information detection unit, and a sensitive information base storage unit.
The server information receiving unit receives message information which is collected by the server information collecting unit and returned by the server to the application system.
After the message information received by the server information filtering unit is preliminarily filtered, general non-server information is filtered out and transmitted to the server information detection unit.
The server information detection unit is used for performing a matching algorithm on the information returned by the server according to the Trie tree algorithm, and filtering out database version information, application system information, middleware information, internal ports, IP address information and client sensitive information.
The sensitive information base storage unit is used for storing a preset sensitive information base, and comprises database version information, application system information, middleware information, an internal port, IP address information and client sensitive information. The information is provided to the server information detection unit for matching by using the Trie tree algorithm.
The detection result output module can comprise a sensitive information classification unit, a sensitive information rating unit and a detection result summarizing and outputting unit.
The sensitive information classification unit is specifically used for classifying and grading according to the detected server sensitive information.
The sensitive information rating unit is used for evaluating the sensitive information detected by the server according to the threat degree of the sensitive information, and further evaluating the threat degree of the sensitive information leaked by the server.
The detection result summarizing and outputting unit is used for displaying the detection result in a graphic mode, so that the result can be conveniently and visually displayed.
The system for detecting sensitive information in a message provided by the embodiment of the disclosure is used for solving the problem that the detection rule of the sensitive information of the server, the mobile phone number and the identity card number are detected through the regular expression and cannot detect the sensitive information such as name, address and the like. For example, sensitive information can be accurately detected only by constructing a corresponding trie tree. By the technical scheme, the information returned to the front end of the application system by the server is detected, effective information basis is provided for subsequent effective evasive measures, and comprehensive identification of sensitive information returned to the whole application system by the server is ensured. Sensitive information is effectively identified through the Trie tree algorithm, so that the sensitive information transmitted by the server in the system is comprehensively identified, and the comprehensive objective evaluation is carried out, thereby identifying the information security risk in the system and playing a guiding role in eliminating the risk.
It should be noted that the implementation, solved technical problems, implemented functions, and achieved technical effects of each module/unit and the like in the apparatus part embodiment are respectively the same as or similar to the implementation, solved technical problems, implemented functions, and achieved technical effects of each corresponding step in the method part embodiment, and are not described in detail herein.
Any of the modules, units, or at least part of the functionality of any of them according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by any other reasonable means of hardware or firmware by integrating or packaging the circuits, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, one or more of the modules, units according to embodiments of the present disclosure may be implemented at least partly as computer program modules, which, when executed, may perform the respective functions.
For example, any plurality of the information obtaining module 710, the information traversing module 720 and the message outputting module 730 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the information obtaining module 710, the information traversing module 720, and the message outputting module 730 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, at least one of the information obtaining module 710, the information traversing module 720, and the message output module 730 may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.
Fig. 9 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are communicatively connected to each other by a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the program may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing a program code for performing the method provided by the embodiments of the present disclosure, when the computer program product runs on an electronic device, the program code is configured to enable the electronic device to implement the image model training method provided by the embodiments of the present disclosure or the method for detecting sensitive information in a message.
The computer program, when executed by the processor 901, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program may include program code that may be transmitted over any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (11)

1. A method for detecting sensitive information in a message comprises the following steps:
responding to the obtained message, and obtaining information sent by a server in the message;
traversing a sensitive information database by using the information sent by the server, wherein the sensitive information database comprises a tree structure aiming at sensitive information; and
and if the information sent by the server contains sensitive information, outputting detection result correlation information according to a preset strategy.
2. The method of claim 1, further comprising:
constructing the sensitive information database based on the acquired sensitive information, comprising:
acquiring sensitive information;
for each piece of sensitive information, generating a prefix tree corresponding to the sensitive information, wherein each prefix tree comprises: the system comprises a root node and M child nodes, wherein the root node and each child node respectively indicate a character string, and the character string comprises at least one of numbers, letters, words, Chinese characters or words; and
constructing a set comprising N of said prefix trees, wherein M and N are integers greater than 0.
3. The method of claim 2, wherein for each sensitive information, generating a prefix tree corresponding to the sensitive information comprises:
creating a root node, wherein the root node corresponds to a first character string of the sensitive information; and
and sequentially inserting child nodes corresponding to the character strings behind the newly created node according to the sequence of the character strings in the sensitive information from front to back until the child node corresponding to the last character string in the sensitive information completes the insertion operation.
4. The method of claim 2, wherein the sensitive information has categories comprising: at least one of a database version category, an application system information category, a middleware category, an internal port category, an address category, a certificate number category or a contact manner category;
the method further comprises the following steps: after said generating the prefix tree corresponding to the sensitive information,
combining prefix trees with sensitive information of the same category to obtain a category prefix tree; and
the constructing a set including N of the prefix trees includes: constructing a set comprising Q of the category prefix trees, wherein Q is a positive integer less than or equal to N.
5. The method of claim 2, wherein said traversing a sensitive information database with said information sent by a server comprises:
performing word segmentation on the information sent by the server to obtain at least one character string;
for each prefix tree in the set, traversing the prefix tree by using the at least one character string until all prefix trees are traversed or the bottom-layer child nodes of the prefix tree are traversed;
if all prefix trees are traversed, determining that the message does not contain sensitive information; and
and if the message traverses to the bottom-layer child node of the prefix tree, determining that the message comprises sensitive information.
6. The method of claim 5, wherein the prefix tree has a category;
the method further comprises the following steps:
and if the information sent by the server contains sensitive information, outputting the category to which the prefix tree matched with the sensitive information belongs.
7. The method according to any one of claims 1 to 6, wherein the outputting the detection result association information according to the preset strategy comprises:
determining a sensitivity level and/or a threat level of the sensitive information; and
and outputting the sensitivity level and/or the threat degree of the sensitive information, and outputting the message based on the sensitivity level and/or the threat degree of the sensitive information.
8. The method according to any one of claims 1 to 6, wherein the obtaining of the information sent by the server in the message comprises:
filtering general non-server information from the message.
9. An apparatus for detecting sensitive information in a message, comprising:
the information acquisition module is used for responding to the acquired message and acquiring the information sent by the server in the message;
the information traversing module is used for traversing a sensitive information database by using the information sent by the server, wherein the sensitive information database comprises a tree structure aiming at sensitive information; and
and the message output module is used for outputting the detection result correlation information according to a preset strategy if the information sent by the server contains sensitive information.
10. An electronic device, comprising:
one or more processors;
a storage device for storing executable instructions which, when executed by the processor, implement the method of any one of claims 1 to 8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, implement a method according to any one of claims 1 to 8.
CN202111291135.4A 2021-11-02 2021-11-02 Method and device for detecting sensitive information in message and electronic equipment Pending CN114006765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111291135.4A CN114006765A (en) 2021-11-02 2021-11-02 Method and device for detecting sensitive information in message and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111291135.4A CN114006765A (en) 2021-11-02 2021-11-02 Method and device for detecting sensitive information in message and electronic equipment

Publications (1)

Publication Number Publication Date
CN114006765A true CN114006765A (en) 2022-02-01

Family

ID=79926705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111291135.4A Pending CN114006765A (en) 2021-11-02 2021-11-02 Method and device for detecting sensitive information in message and electronic equipment

Country Status (1)

Country Link
CN (1) CN114006765A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115955325A (en) * 2022-10-26 2023-04-11 贝壳找房(北京)科技有限公司 Information management and control method and system and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674247A (en) * 2019-09-23 2020-01-10 广州虎牙科技有限公司 Barrage information intercepting method and device, storage medium and equipment
CN111814192A (en) * 2020-08-28 2020-10-23 支付宝(杭州)信息技术有限公司 Training sample generation method and device and sensitive information detection method and device
CN112560090A (en) * 2020-12-15 2021-03-26 建信金融科技有限责任公司 Data detection method and device
CN113536325A (en) * 2021-09-14 2021-10-22 杭州振牛信息科技有限公司 Digital information risk monitoring method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674247A (en) * 2019-09-23 2020-01-10 广州虎牙科技有限公司 Barrage information intercepting method and device, storage medium and equipment
CN111814192A (en) * 2020-08-28 2020-10-23 支付宝(杭州)信息技术有限公司 Training sample generation method and device and sensitive information detection method and device
CN112560090A (en) * 2020-12-15 2021-03-26 建信金融科技有限责任公司 Data detection method and device
CN113536325A (en) * 2021-09-14 2021-10-22 杭州振牛信息科技有限公司 Digital information risk monitoring method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115955325A (en) * 2022-10-26 2023-04-11 贝壳找房(北京)科技有限公司 Information management and control method and system and electronic equipment
CN115955325B (en) * 2022-10-26 2024-02-02 贝壳找房(北京)科技有限公司 Information management and control method and system and electronic equipment

Similar Documents

Publication Publication Date Title
US20200389495A1 (en) Secure policy-controlled processing and auditing on regulated data sets
Zheng et al. Xblock-eth: Extracting and exploring blockchain data from ethereum
US20200412767A1 (en) Hybrid system for the protection and secure data transportation of convergent operational technology and informational technology networks
CN109816397B (en) Fraud discrimination method, device and storage medium
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
CN102171702B (en) The detection of confidential information
CN109274632B (en) Website identification method and device
US7693767B2 (en) Method for generating predictive models for a business problem via supervised learning
AU2022204452B2 (en) Verification of electronic identity components
CN109690547A (en) For detecting the system and method cheated online
CN103685307A (en) Method, system, client and server for detecting phishing fraud webpage based on feature library
CN111311136A (en) Wind control decision method, computer equipment and storage medium
CN110177114A (en) The recognition methods of network security threats index, unit and computer readable storage medium
US20210397669A1 (en) Clustering web page addresses for website analysis
CN111753171A (en) Malicious website identification method and device
CN113158251B (en) Application privacy disclosure detection method, system, terminal and medium
CN108023868A (en) Malice resource address detection method and device
CN108270754B (en) Detection method and device for phishing website
CN114006765A (en) Method and device for detecting sensitive information in message and electronic equipment
CN111125118A (en) Associated data query method, device, equipment and medium
CN111049837A (en) Malicious website identification and interception technology based on communication operator network transport layer
US20200402061A1 (en) Cryptocurrency transaction pattern based threat intelligence
CN107103243A (en) The detection method and device of leak
KR20130032660A (en) System and method for searching leakage of individual information
US9904662B2 (en) Real-time agreement analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination