CN114938285A - Data security identification method and storage medium - Google Patents

Data security identification method and storage medium Download PDF

Info

Publication number
CN114938285A
CN114938285A CN202210297085.9A CN202210297085A CN114938285A CN 114938285 A CN114938285 A CN 114938285A CN 202210297085 A CN202210297085 A CN 202210297085A CN 114938285 A CN114938285 A CN 114938285A
Authority
CN
China
Prior art keywords
data
credible
model
content data
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210297085.9A
Other languages
Chinese (zh)
Other versions
CN114938285B (en
Inventor
韩腾飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202210297085.9A priority Critical patent/CN114938285B/en
Publication of CN114938285A publication Critical patent/CN114938285A/en
Application granted granted Critical
Publication of CN114938285B publication Critical patent/CN114938285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Operations Research (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Storage Device Security (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The invention discloses a method, a system and a storage medium for safely identifying data. Wherein, the method comprises the following steps: acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; performing credible identification on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for representing the safety degree of the content data; identifying risky content data in the content data based on the trustworthy results. The invention solves the technical problem of low efficiency of safely identifying the data and achieves the technical effect of improving the efficiency of safely identifying the data.

Description

Data security identification method and storage medium
Technical Field
The invention relates to the field of cloud security and data processing, in particular to a data security identification method and a storage medium.
Background
At present, when data is safely identified in mass data, in order to search risk accounts or risk behaviors, full scanning is usually directly performed, and most scanning is invalid scanning, so that the technical problem of low efficiency of safely identifying the data exists.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a data security identification method and a storage medium, which at least solve the technical problem of low efficiency of data security identification.
According to an aspect of an embodiment of the present invention, there is provided a method for securely identifying data, including: acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; performing credible identification on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on the credible model and historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; identifying risky content data in the content data based on the trustworthy results.
According to an aspect of the embodiments of the present invention, another method for securely identifying data is provided from a system side, including: acquiring to-be-detected data under a target scene by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the to-be-detected data, the to-be-detected data comprises content data to be subjected to safety identification and a parameter, and the parameter is associated with historical data under the target scene; performing credible identification on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on the credible model and historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; identifying risky content data in the content data based on the trustworthy results; and outputting the risk content data by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the risk content data.
According to an aspect of the embodiments of the present invention, another method for securely identifying data is provided from a text scene side, including: acquiring data to be detected from an entertainment platform, wherein the data to be detected comprises media content data to be subjected to safety identification and parameters, and the parameters are associated with historical data of the entertainment platform; performing credible identification on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting historical data based on an entertainment platform, the characteristic dimensions of the historical data comprise characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the media content data; identifying risky content data in the media content data based on the trustworthy results; and outputting the risk content data to the entertainment platform.
According to an aspect of the embodiments of the present invention, another method for securely identifying data is provided from a human-computer interaction side, including: responding to a data input instruction acting on an operation interface, and displaying to-be-detected data in a target scene on the operation interface, wherein the to-be-detected data comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; and responding to a safety identification instruction acting on the operation interface, and displaying risk content data of the content data on the operation interface, wherein the risk content data are identified from the content data based on a credible result, the credible result is obtained by carrying out credible identification on data to be detected based on a credible model and is used for representing the safety degree of the content data, the credible model is obtained by fitting based on historical data based on the credible model, and the characteristic dimension of the historical data comprises the characteristic dimension of a parameter.
According to another aspect of the embodiments of the present invention, there is provided an apparatus for secure identification of data, including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring data to be detected in a target scene, the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; the first identification unit is used for performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; and a second identification unit for identifying the risk content data in the content data based on the credible result.
According to another aspect of the embodiments of the present invention, another data security identification apparatus is provided from a system side, including: the second acquisition unit is used for acquiring the data to be detected in the target scene by calling the first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the data to be detected, the data to be detected comprises content data to be subjected to safety identification and a parameter, and the parameter is associated with historical data in the target scene; the third identification unit is used for performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; a fourth identification unit configured to identify risky content data among the content data based on the credible result; and the first output unit is used for outputting the risk content data by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the risk content data.
According to another aspect of the embodiments of the present invention, there is provided another data security identification apparatus from a text scene side, including: the third acquisition unit is used for acquiring data to be detected from the entertainment platform, wherein the data to be detected comprises media content data to be subjected to safety identification and parameters, and the parameters are associated with historical data of the entertainment platform; the fifth identification unit is used for performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting historical data based on the entertainment platform, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the media content data; a sixth identification unit configured to identify risky content data among the media content data based on the credible result; and the second output unit is used for outputting the risk content data to the entertainment platform.
According to another aspect of the embodiments of the present invention, another data security identification apparatus is provided from a human-computer interaction side, including: the first response unit is used for responding to a data input instruction acting on the operation interface and displaying to-be-detected data under a target scene on the operation interface, wherein the to-be-detected data comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data under the target scene; and the second response unit is used for responding to the safety identification instruction acting on the operation interface and displaying risk content data of the content data on the operation interface, wherein the risk content data are identified from the content data based on a credible result, the credible result is obtained by carrying out credible identification on the data to be detected based on a credible model and is used for representing the safety degree of the content data, the credible model is obtained by fitting based on historical data, and the characteristic dimension of the historical data comprises the characteristic dimension of a parameter.
The embodiment of the invention also provides a computer readable storage medium. The computer readable storage medium includes a stored program, wherein the program, when executed by a processor, controls an apparatus in which the computer readable storage medium is located to perform the method for secure identification of data according to an embodiment of the present invention.
The embodiment of the invention also provides a processor. The processor is used for running a program, wherein the program executes the data security identification method of the embodiment of the invention when running.
According to another aspect of the embodiments of the present invention, there is provided a system for secure identification of data, including: a processor; a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; identifying risky content data in the content data based on the trustworthy results.
In the embodiment of the invention, data to be detected in a target scene are obtained, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; the method comprises the steps of identifying risk content data in the content data based on the credible result, namely, in the application, adding a parameter field on the content data, wherein the parameter field is similar to a label, the label can be used for representing the source, type, size, format and the like of the data, determining the credible result of the content data and the parameter through a corresponding credible model, and then carrying out safe identification on the content data, so that the calculation cost of rear-end identification can be reduced, the technical effect of improving the efficiency of carrying out safe identification on the data is achieved, and the technical problem of low efficiency of carrying out safe identification on the data is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a data processing method for a service grid-based application according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for secure identification of data according to an embodiment of the present invention;
FIG. 3 is a flow chart of another method for secure identification of data provided from the system side according to an embodiment of the present invention;
FIG. 4 is a flow chart of another method for secure identification of data provided from the text scene side according to an embodiment of the present invention;
FIG. 5 is a flowchart of another method for secure identification of data provided from a human-computer interaction side according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of one principal feature dimension according to an embodiment of the present disclosure;
FIG. 7A is a schematic illustration of an application flow of a trust model in accordance with an embodiment of the present disclosure;
FIG. 7B is a schematic diagram of a services grid utilizing a trust model for secure identification of data in accordance with an embodiment of the present invention;
FIG. 8 is a schematic diagram of a device for securely identifying data according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a secure identification apparatus for data provided from the system side according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a device for securely recognizing data provided from a text scene side according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a device for securely recognizing data provided from a human-computer interaction side according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
cloud security, which is an important branch of the cloud computing field, has been widely applied in the antivirus field, and the cloud security monitors the abnormal behavior of software in a network through a large number of meshed clients to obtain the latest messages of Trojan horses and malicious programs in the internet, pushes the latest messages to a server for automatic analysis and processing, and distributes the solutions of viruses and Trojan horses to each client;
the method has the advantages that resource utilization rate is high, but new safety problems, namely scene safety problems, are introduced during sharing, namely isolation among user resources needs to be guaranteed on one hand, and safety protection strategies facing virtual objects such as virtual machines, virtual switches and virtual storage are needed on the other hand;
the credible model, which is usually a complex series-parallel structure, is used for identifying requests with a probability of occurrence of risk behaviors approaching 0 or being very low, and also can be used for estimating the probability of the credible product completing a specified credible function in the process of executing a task, and is a model for measuring the working effectiveness of credible measurement.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for secure identification of data, including the steps illustrated in the flowchart of the figure, as executable on a computer system such as a set of computer-executable instructions, and although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that illustrated.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 is an exemplary block diagram of system interactions for implementing a method for secure identification of data according to an embodiment of the present disclosure, as shown in fig. 1, a computer terminal 101 (or mobile device) may be connected or electronically connected to one or more servers (e.g., a security server, a resource server, a game server, etc.) via a data network connection. In an alternative embodiment, the computer terminal 101 (or mobile device) may be any mobile computing device or the like. The data network connection may be a local area network connection, a wide area network connection, an internet connection, or other type of data network connection. The computer terminal 101 (or mobile device) may execute to connect to a network service executed by a server (e.g., a secure server) or a group of servers. The web server 102 is a web-based user service such as social networking, cloud resources, email, online payment, or other online applications. The memory 103 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 103 may further include memory located remotely from the processor, which may be connected to the computer terminal 101 through a network.
Under the operating environment, the application provides a method for safely identifying data as shown in fig. 2.
Fig. 2 is a flowchart of a method for securely identifying data according to an embodiment of the present invention. As shown in fig. 2, the method may include the steps of:
step S202, data to be detected in a target scene are obtained, wherein the data to be detected comprise content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene.
In the technical solution provided by step S202 of the present invention, the target scene may be a client scene, for example, a chat scene of the client, a posting scene of the client on a forum, a media information scene published by the client on an entertainment platform, etc., and is not limited herein specifically; the content data may be risk violation content.
In this embodiment, data to be detected in a target scene may be acquired, for example, when content security identification is performed, risk account or risk violation content data of a risk behavior in a customer scene may be acquired.
In this embodiment, the parameter may be associated with historical data in the target scenario, for example, the parameter may be used for a source of the content data, and may also be used for a data type, a size, a format, and the like, which is not limited in this respect.
In an optional embodiment, the relationship between the parameter and the historical data in the target scene may be determined based on the recognition result of the trusted model on the historical data in the target scene, for example, the target scene may be a TXT text scene, the historical data may be TXT text historical data, and in the historical data for risk detection on the TXT text, if there is no historical data in which a risk has occurred, the data type (TXT text type) may be used as the parameter (filter tag) for data separation.
In this embodiment, the parameter may be an account unique code (ID), a group ID, a chat room ID, a device ID, and the like, which is not limited herein.
And S204, performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data.
In the technical solution provided by step S204 of the present invention, the historical data may be historical scene behavior data, historical penalized record data, and the like, and is not limited specifically herein.
In this embodiment, the to-be-detected data may be subjected to trusted identification based on the trusted model to obtain a trusted result, for example, if the current customer (account number) is active in a chat scene for 20 days and sends ten thousand messages, the risk account or contents of ten thousand risk violation messages sent by the risk account may be subjected to trusted identification based on the trusted model, and the trusted result is output.
In this embodiment, a credible model may be obtained by fitting based on historical data in a target scene, where fitting may refer to matching of digital statistics of a group of observation results with a corresponding group of digital values.
For example, the historical data in the target scene is fitted, and the fitted credible model can be represented by the following formula:
Figure BDA0003563879140000071
wherein n is used for representing the number of the features corresponding to the feature dimension.
In this embodiment, the characteristic dimensions of the historical data may include characteristic dimensions of parameters, such as an account dimension, a phone number dimension, and a device dimension; when selecting the feature dimension, the following principles may be considered: because various auditing mechanisms exist in a client scene, an account for sending risk violation content is not usually in a long-term active state; the account sending the violation content usually sends the violation message only in one customer scenario; the account sending the offending content will typically be in a different network environment or operating device.
In step S206, risk content data in the content data is identified based on the credible result.
In the technical solution provided by step S206 in the present invention, the risk content data in the content data may be identified based on the credible result, for example, the credible result may be f (credible coefficient), and the risk content data in the content data may be identified based on f (credible coefficient).
In this embodiment, f (confidence coefficient) score (segment) may be used for layering, for example, f (confidence coefficient) is greater than or equal to 80, which indicates that the layering result is highly reliable; f (credibility coefficient) is more than or equal to 20 points, which indicates that the layering result is possible at risk; f (confidence coefficient) is less than 20 points, which indicates that the layering result is high risk.
In this embodiment, the trusted result may be shunted to different computing links based on the hierarchical result, for example, no further processing may be performed for the trusted result whose hierarchical result is highly trusted; the credible results with the layered results as possible risks can be input into a deep learning model for further analysis; and for the credible result with high risk of layering result, the credible result can be shunted to a calculation link which is manually checked or directly rejected.
Through the steps S202 to S206 in the present application, data to be detected in a target scene is obtained, where the data to be detected includes content data to be subjected to security identification and parameters, and the parameters are associated with historical data in the target scene; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; according to the method and the device, risk content data in the content data are identified based on the credible result, namely, in the application, the parameter field is added on the content data, the credible result of the content data and the parameter is determined through the corresponding credible model, and then the content data is safely identified, so that the calculation cost of rear-end identification can be reduced, the technical effect of improving the efficiency of safely identifying the data is achieved, and the technical problem of low efficiency of safely identifying the data is solved.
The above-described method of this embodiment is further described below.
As an optional implementation manner, in step S204, performing trusted identification on data to be detected based on a trusted model to obtain a trusted result, the method includes: and performing credible identification on positive correlation characteristics and negative correlation characteristics in the data to be detected based on the credible model to obtain a credible result, wherein the value of the positive correlation characteristics is in direct proportion to the safety degree represented by the credible result, and the value of the negative correlation characteristics is in inverse proportion to the safety degree represented by the credible result.
In this embodiment, the positive correlation features and the negative correlation features in the data to be detected may be credibly identified based on the credible model to obtain a credible result, for example, in a client scenario, the more active days, the lower the probability of generating content risk, and therefore, the active days in the client scenario may be positive correlation features; in the client scenario, the higher the historical risk concentration, the higher the probability of the current generation of content risk, and thus, the historical risk concentration may be a negatively correlated feature in the client scenario.
For example, 100 messages are sent out in a history of one account, and 90 messages are all illegal, and the historical risk content concentration is 90%, which can be used for indirectly explaining the probability that the content currently sent by the user is the risk content.
As an optional implementation manner, the positive correlation feature and the negative correlation feature in the data to be detected are credibly identified based on a credible model to obtain a credible result, and the method includes: and carrying out credible identification on the logarithm of the positive correlation characteristic and the logarithm of the reciprocal of the negative correlation characteristic based on a credible model to obtain a credible result.
In this embodiment, the logarithm of the positive correlation feature and the logarithm of the reciprocal of the negative correlation feature may be credibly identified based on the credible model, and a credible result is obtained, for example, the fitting credible model may be expressed as follows:
Figure BDA0003563879140000091
wherein n is used for representing the number of the features corresponding to the feature dimension.
In this embodiment, the logarithm of the reciprocal of all positive correlation features and the logarithm of the reciprocal of all negative correlation features are summed to obtain a confidence result, which may be f (confidence coefficient).
As an optional implementation manner, the positive correlation features and the negative correlation features in the data to be detected are credibly identified based on a credible model to obtain a credible result, and the method includes one of the following steps: adjusting the positive correlation characteristics; performing credible identification on the adjusted positive correlation characteristics and negative correlation characteristics based on a credible model to obtain a credible result; adjusting the negative correlation characteristic; performing credible identification on the positive correlation characteristics and the adjusted negative correlation characteristics based on a credible model to obtain a credible result; adjusting the positive correlation characteristic and the negative correlation characteristic; and carrying out credible identification on the adjusted positive correlation characteristics and the adjusted negative correlation characteristics based on a credible model to obtain a credible result.
In this embodiment, the positive correlation feature or the negative correlation feature may be adjusted, for example, the credible result output by the credible model may be adjusted by frequently adding, deleting, and modifying the positive correlation feature or the negative correlation feature.
In this embodiment, the positive correlation features may be adjusted, and the adjusted positive correlation features and negative correlation features may be credibly identified based on the credible model to obtain a credible result, for example, the positive correlation features may be added or deleted, and the added or deleted positive correlation features and negative correlation features may be identified based on the credible model to obtain a credible result.
In this embodiment, the negative correlation characteristics may be adjusted, and the adjusted negative correlation characteristics and positive correlation characteristics may be subjected to confidence identification based on the confidence model to obtain a confidence result, for example, the negative correlation characteristics may be added or deleted, and the added or deleted negative correlation characteristics and positive correlation characteristics may be identified based on the confidence model to obtain a confidence result.
In this embodiment, the positive correlation feature and the negative correlation feature may be adjusted, and the adjusted positive correlation feature and the adjusted negative correlation feature may be credibly identified based on the credible model to obtain a credible result, for example, the positive correlation feature and the negative correlation feature may be added or deleted, and the added or deleted positive correlation feature and the negative correlation feature may be identified based on the credible model to obtain a credible result
As an optional implementation manner, in the historical data, determining feature data corresponding to feature dimensions of the parameters; and fitting the characteristic data to obtain a credible model.
In this embodiment, when the feature of the trusted model is constructed, the feature description dimension may be selected as at least one of an account dimension, a mobile phone number dimension, and an equipment dimension, and the feature data corresponding to the account dimension may be the number of active days in the current customer scenario, the number of illegal contents in the current customer scenario, the concentration of illegal contents in the current customer scenario, the number of models of associated equipment, the number of associated mobile phones, and the like, which is not specifically limited herein; the feature data corresponding to the mobile phone number dimension may be historical active days, the number of associated devices, whether there is an associated risk, the number of associated risk scene customers, the number of active customers, the historical payment amount, and the like, and is not particularly limited herein; the feature data corresponding to the device dimension may be associated with the number of accounts, the number of associated mobile phones, the number of active days, the number of associated risk contents, the concentration of associated risk contents, and the like, and is not particularly limited herein.
In this embodiment, feature data corresponding to the feature dimension of the parameter may be determined in the history data, and fitting processing may be performed on the feature data to obtain a trusted model, for example, in the history data, the number of active days of the current customer scenario, the number of illegal contents of the current customer scenario, the illegal content concentration of the current customer scenario, the number of models of associated devices, and the number of associated mobile phones, which correspond to the account dimension, are determined; and/or determining that the characteristic data corresponding to the mobile phone number dimension can be historical active days, the number of associated equipment, whether the associated risk exists, the number of associated risk scene customers, the number of active customers, historical payment amount and the like; and/or determining the number of associated accounts, the number of associated mobile phones, the number of active days, the number of associated risk contents, the concentration of associated risk contents and the like of the feature data corresponding to the dimension of the equipment, and fitting the feature data to obtain a credible model.
As an optional implementation manner, fitting processing is performed on the feature data to obtain a credible model, and the method includes: determining a weight corresponding to the feature data, wherein the weight is used for representing the contribution degree of the feature data to the credible model; and fitting the characteristic data based on the weight to obtain a credible model.
In this embodiment, when the feature data is subjected to fitting processing to obtain the trusted model, the weight corresponding to the feature data may be determined, and the feature data is subjected to fitting processing based on the weight to obtain the trusted model, for example, in consideration of features at different latitudes, the influence on the risk result is different, and the feature results at different latitudes may be subjected to weighting processing, for example, the number of days of activity of the feature data corresponding to the device dimension may be multiplied by a coefficient 1 as a parameter of the trusted model; the risk degree of the characteristic data correlation risk content concentration corresponding to the dimension of the equipment is high, and the calculation result is multiplied by a coefficient 100 to be used as a credible model to be entered.
As an alternative implementation, step S206, identifying risky content data in the content data based on the trusted result, the method includes: responsive to the confidence result being within the first threshold range, risky content data is identified from the content data based on an identification model, wherein the identification model includes a cluster of processors and/or is trained based on deep learning.
In this embodiment, the risky content data may be identified from the content data based on the identification model in response to the confidence result being within the first threshold range, for example, when it is detected that the confidence result is within the first threshold range, a signal representing the information may be generated, and in response to the signal, the risky content data may be identified from the content data based on the identification model.
In this embodiment, the credible result score (segment) can be layered according to the credible result score, for example, the credible result score is greater than or equal to 80 points, which indicates that the layered result is highly credible; the credible result score is greater than or equal to 20 points, which indicates that the layering result is possible at risk; the credible result score is less than 20 points, which indicates that the layering result is high risk.
In this embodiment, the first threshold range may be a threshold range that is at risk, i.e., the confidence result score is greater than or equal to 20 points and less than 80 points; the recognition model may be a Graphics Processing Unit (GPU) cluster recognition model or a deep learning cluster model.
As an optional implementation, the method further comprises one of: in response to the confidence result being within a second threshold range, inhibiting input of the content data to the recognition model, wherein the second threshold range is characterized by a higher degree of security than the first threshold range; and in response to the credible result being within a third threshold range, inputting the content data to an auditing platform or discarding the content data, wherein the security degree represented by the third threshold range is lower than the security degree represented by the first threshold range, and the content data is audited by the auditing platform in response to the auditing operation instruction.
In this embodiment, the input of the content data to the recognition model may be inhibited in response to the confidence result being within the second threshold range, for example, by generating a signal representing the information when the confidence result is detected to be within the second threshold range, and in response to the signal, inhibiting the input of the content data to the recognition model.
In this embodiment, the second threshold range may be a highly trusted threshold range for which pre-filtering may be straightforward.
In this embodiment, the content data may be input to the auditing platform or discarded in response to the confidence result being within a third threshold range, such as when the confidence result is detected to be within the third threshold range, a signal representing this information is generated, and the content data is input to the auditing platform or discarded in response to this signal.
In this embodiment, the third threshold range may be a high-risk threshold range, and the trusted result for high risk may be processed by the auditing platform or directly discarded.
As an optional implementation, the method further comprises: a first threshold range, a second threshold range, and a third threshold range corresponding to the target scene are determined.
In this embodiment, a first threshold range, a second threshold range, and a third threshold range corresponding to the target scene may be determined, for example, the relevance of the scores of the credible models may be different for different scenes, and the first threshold range, the second threshold range, and the third threshold range corresponding to different target scenes may be determined according to different target scenes.
For example, the second threshold range of the text scenario may be greater than or equal to 80 minutes, and when the confidence result score of the confidence model is greater than or equal to 80 minutes, the current request may be considered to be risk-free (the probability of meeting the risk occurrence is one ten-thousandth); the second threshold range of the picture scene may be greater than or equal to 85 points, and when the confidence result score of the confidence model is greater than or equal to 85 points, the current request may be considered risk-free (the probability of meeting the risk occurrence is one ten thousandth).
As an optional implementation, the method further comprises: and adjusting the credible model based on the credible result and/or the associated information of the credible result.
In this embodiment, a record of each request of the client is maintained, and the trust model may be adjusted based on the trusted result and/or information associated with the trusted result.
In this embodiment, the information associated with the trusted result, for example, the number of times a mobile phone number requests content risk judgment, and the like.
As an alternative embodiment, the characteristic dimensions include at least one of: account number dimensions, communication dimensions, device dimensions.
In this embodiment, when selecting the feature dimension, the selection may be performed according to the following principle: firstly, as various auditing mechanisms exist in a client scene, an account for sending risk violation content is usually not in a long-term active state; secondly, an account sending the violation message does not usually send the violation message in only one customer scenario; finally, due to blackout transitions, the account sending the violation message will typically be on a different network environment or on a different operating device.
In this embodiment, model calculation features may be constructed based on the client's participation in optional parameters (e.g., account ID, group ID, chat room ID, etc.)/optional parameters (e.g., model, device ID, etc.), and historical behavior data in the client scenario, and feature dimensions may be: account number dimension, communication dimension (mobile phone number dimension), device dimension.
In this embodiment, optionally, the influence on the risk result is different in consideration of the features at different latitudes, and feature data at different dimensions need to be weighted.
In this embodiment, a principle of selecting a feature dimension corresponding to a target scene may be determined according to the target scene, so as to determine a main feature dimension and feature data corresponding to the main feature dimension, and perform weighting processing on the feature data.
In the embodiment of the invention, the parameter field is added on the content data, the content data and the credible result of the parameter are determined through the corresponding credible model, and then the content data is safely identified, so that the calculation cost of the back-end identification can be reduced, the technical effect of improving the efficiency of safely identifying the data is achieved, and the technical problem of low efficiency of safely identifying the data is solved.
According to the embodiment of the invention, a method for processing the data of the application is also provided from the system side.
Fig. 3 is a flowchart of another method for securely identifying data provided from the system side according to an embodiment of the present invention.
As shown in fig. 3, the method may include the steps of:
step S302, acquiring data to be detected in a target scene by calling a first interface, wherein the first interface comprises a first parameter, a parameter value of the first parameter is the data to be detected, the data to be detected comprises content data to be subjected to safety identification and a parameter, and the parameter is associated with historical data in the target scene.
In the technical solution provided by step S302 of the present invention, the first interface may be a remote communication interface disposed on the system, or may also be a virtual button on the visual screen, which is not limited herein.
In this embodiment, the first interface may be invoked to obtain data to be detected in a target scene, for example, the first interface may be invoked to obtain risk violation content data of a risk account or a risk behavior in a customer scene.
And S304, performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data.
In the technical solution provided in step S304 of the present invention, the data to be detected may be subjected to trusted identification based on the trusted model to obtain a trusted result, for example, if the current customer (account number) is active in a chat scene for 20 days and sends ten thousand messages, the risk account or contents of ten thousand risk violation messages sent by the risk account may be subjected to trusted identification based on the trusted model, and the trusted result is output.
In step S306, risk content data in the content data is identified based on the credible result.
In the technical solution provided by step S306 of the present invention, the risk content data in the content data may be identified based on the credibility result, for example, the credibility result may be f (credibility coefficient), and the risk content data in the content data may be identified based on f (credibility coefficient).
And step S308, outputting the risk content data by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the risk content data.
In the technical solution provided by step S308 of the present invention, the first interface may be a remote communication interface disposed on the system, and may also be a virtual button on a visual screen, which is not limited specifically herein.
In this embodiment, the risk content data may be output by calling the second interface, for example, the trusted results output by the trusted model are layered and distributed to different calculation links, so as to improve accuracy of risk identification and reduce calculation cost.
In the embodiment of the disclosure, the data to be detected in the target scene is acquired by calling the first interface, the data to be detected is subjected to trusted identification based on the trusted model to obtain the trusted result, the risk content data in the content data is identified based on the trusted result, and the risk content data is output by calling the second interface, so that the purpose of performing safe identification on the data on the system side is achieved, the technical effect of improving the efficiency of performing safe identification on the data is achieved, and the technical problem of low efficiency of performing safe identification on the data is solved.
Fig. 4 is a flowchart of another method for securely recognizing data provided from a text scene side according to an embodiment of the present invention. As shown in fig. 4, the method may include the steps of:
step S402, data to be detected from the entertainment platform are obtained, wherein the data to be detected comprise media content data to be subjected to safety identification and parameters, and the parameters are associated with historical data of the entertainment platform.
In the technical solution provided by step S402 of the present invention, the media content data may be a text, an image (picture, video), a voice, etc., and is not limited herein.
In this embodiment, the data to be detected from the entertainment platform may be obtained, for example, obtaining text, images (pictures, videos), voice, etc. from the entertainment platform.
And S404, performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting historical data based on the entertainment platform, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the media content data.
In the technical solution provided in step S404 of the present invention, the data to be detected may be subjected to trusted identification based on the trusted model to obtain a trusted result, for example, text, image (picture, video), voice, etc. of the entertainment platform are subjected to trusted identification based on the trusted model to obtain the trusted result.
In step S406, risk content data among the media content data is identified based on the trusted result.
In the technical solution provided by step S406 of the present invention, the risk content data in the media content data can be identified based on the trusted result, for example, the risk content data in the text, the image (picture, video), the voice, etc. of the entertainment platform can be identified based on the trusted result.
And step S408, outputting the risk content data to the entertainment platform.
In the technical solution provided in step S408 of the present invention, the risk content data can be output to the entertainment platform, for example, a risk text, an image (picture, video), a voice, etc. can be output to the entertainment platform.
In the embodiment of the disclosure, the data to be detected from the entertainment platform is acquired, the data to be detected is subjected to credible identification based on the credible model to obtain the credible result, the risk content data in the media content data is identified based on the credible result, and finally the risk content data is output to the entertainment platform, so that the purpose of safely identifying the data of the entertainment platform on the text scene side is achieved, the technical effect of improving the efficiency of safely identifying the data is achieved, and the technical problem of low efficiency of safely identifying the data is solved.
Fig. 5 is a flowchart of another method for securely recognizing data provided from a human-computer interaction side according to an embodiment of the present invention. As shown in fig. 5, the method may include the steps of:
step S502, responding to a data input instruction acting on an operation interface, and displaying to-be-detected data in a target scene on the operation interface, wherein the to-be-detected data comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene.
In the technical solution provided by step S502 of the present invention, the data to be detected in the target scene may be displayed on the operation interface in response to the data input instruction acting on the operation interface, for example, when the data input instruction acting on the operation interface is detected, a signal for representing the information is generated, and the data to be detected in the target scene is displayed on the operation interface in response to the signal.
Step S504, responding to a safety identification instruction acting on the operation interface, and displaying risk content data of the content data on the operation interface, wherein the risk content data is identified from the content data based on a credible result, the credible result is obtained by carrying out credible identification on data to be detected based on a credible model and is used for representing the safety degree of the content data, the credible model is obtained by fitting based on historical data, and the characteristic dimension of the historical data comprises the characteristic dimension of a parameter.
In the technical solution provided by step S504 above, the risky content data of the content data may be displayed on the operation interface in response to the security identification instruction acting on the operation interface, for example, when the security identification instruction acting on the operation interface is detected, a signal representing the information is generated, and in response to the signal, the risky content data of the content data is displayed on the operation interface.
In the embodiment of the disclosure, the data to be detected in the target scene is displayed on the operation interface by responding to the data input instruction acting on the operation interface, and the risk content data of the content data is displayed on the operation interface by responding to the safety identification instruction acting on the operation interface, so that the purpose of carrying out safety identification on the data on the operation interface according to the instruction on the human-computer interaction side is achieved, the technical effect of improving the efficiency of carrying out safety identification on the data is achieved, and the technical problem of low efficiency of carrying out safety identification on the data is solved.
Example 2
Preferred embodiments of the above-described method of this embodiment are further described below.
In the related art, when data is safely identified, risk accounts or risk behaviors are searched for in massive data, and usually, full-scale scanning is directly performed, while most scanning is invalid, so that the technical problem that the efficiency of safely identifying the data is low exists.
Taking content security image scanning as an example, the bottom layer calculation mainly consumes GPU resources, the final risk ratio is generally about 1%, 99% of GPU calculation can be considered as invalid calculation, a content generation main body with the risk behavior probability of 0 or very low is marked in advance based on the historical behavior of a content generation main body by establishing a credible model, the content generated by the main body is directly subjected to pre-filtering and is not calculated or randomly extracted for calculation, and the back-end identification calculation cost can be greatly reduced.
In the embodiment of the disclosure, a parameter field is added to content data, and a trusted result of the content data and parameters is determined through a corresponding trusted model, so that the content data is safely identified, and the calculation cost of back-end identification can be reduced, thereby achieving the technical effect of improving the efficiency of safely identifying the data, and further solving the technical problem of low efficiency of safely identifying the data.
An Application Programming Interface (API) entry in the embodiment of the present disclosure is introduced below.
In the application, the API is rich in parameter input, and a mandatory parameter field and an optional parameter field are added. In a public cloud security commercialization scene, content security docking is achieved through interaction of a client application scene system mainly in the form of an API interface, an API reference field is mainly content, in the application, necessary parameters and optional parameters are added, the necessary parameters such as account ID, group ID and chat room ID are not specifically limited; optional parameters, such as a mobile phone (mobile), a network address (Internet Protocol, abbreviated as IP), a device ID, a model, etc., are not specifically limited herein.
The following describes a characterization of the trusted model.
FIG. 6 is a schematic diagram depicting major feature dimensions according to an embodiment of the disclosure. As shown in fig. 6, model calculation features are constructed based on the optional parameters/optional parameters entered by the client, the historical behavior data in the client scene and the historical application scene behavior in the ecology, and the main feature description dimensions are as follows: the characteristic data of the account dimension can be the number of active days of the current customer scene, the number of illegal contents of the current customer scene, the concentration of illegal contents of the current customer scene, the number of models of associated equipment, the number of associated mobile phones and the like, and is not particularly limited herein; the feature data of the mobile phone number dimension can be historical active days, the number of associated devices, whether the associated risk exists, the number of associated risk scene customers, the number of active customers, the historical payment amount and the like, and is not particularly limited; the feature data of the device dimension may be the number of associated accounts, the number of associated mobile phones, the number of active days, the number of associated risk contents, the concentration of associated risk contents, and the like, and is not particularly limited herein.
In the above embodiment, the historical scene behavior represents the behavior of the current operating account/mobile phone in the current client and the current scene (for example, 1W messages are sent after being active for 20 days in the chat scene of the current client); the historical behavior data represents historical application scene behaviors in the ecology, for example, whether a transaction of a current operation mobile phone number is active in the system, whether violation information release history exists, and the like, and the ecological data can be historical application scene behavior data, historical punished record data, and the like.
In the above embodiment, when selecting the feature description dimension, the selection may be performed according to the following principle: firstly, an account of risk violation content is sent, and the client scene can have various auditing mechanisms and can not be in a long-term active state generally; secondly, the violation message account is sent, and the violation message is usually not sent only in one customer scene; finally, the account sending the violation message, because in black-out traffic, is typically performed by the operating device in a different operating network environment.
In the above embodiment, the dimensions may be described through each feature, and a weighting model may be established, where the influence on the risk result is different in consideration of the features of different dimensions, so that the feature results of different dimensions need to be weighted.
For example, the number of active days multiplied by the coefficient 1 is used as a credible model to be entered, the risk correlation degree of the associated risk content concentration is high, and the calculation result multiplied by the coefficient 100 can be used as a credible model to be entered.
The construction of the trusted model is described below.
In a client scene, the active days are high, the probability of generating content risks is smaller, and therefore the active days in the client scene can be positive correlation characteristics; in the client scenario, the higher the historical risk content concentration, the higher the probability of generating a content risk when in the client scenario, so the historical risk concentration in the client scenario may be a negatively correlated feature.
It should be noted that, for example, an account sends a total of 100 messages in history, 90 messages are illegal, and the historical risk content concentration is 90%, which may indirectly indicate that the probability that the content currently sent by the account is high-risk content is very high.
The fitted confidence model can be expressed as follows:
Figure BDA0003563879140000171
wherein n is used for representing the number of the features corresponding to the feature dimension.
By fitting the confidence model, the actively correlated features are all positively correlated with the confidence coefficient (target score), with the more risk the lower the confidence coefficient.
In the related art, a traditional machine learning model is adopted for model construction, but the operation of frequently adding or deleting positive correlation characteristics and negative correlation characteristics cannot be compatible in the process of resisting learning.
According to the traditional machine learning model, the characteristic increasing process comprises target characteristic mining, new training sample selection, model training, model verification, model deployment, effect verification and complex process, each step is unreasonable in operation, and stability risks exist.
In the application, the model is constructed through a credible model, and the core expression of the credible model is the risk correlation, so that the traditional two-classification/multi-classification machine learning model cannot well meet the scene requirement.
For example, a two-class/multi-class model needs to combine with a specific scene (chat/forum, etc.) to precisely define a training label, whereas in a business scene, the scene is complex (many clients, many application scene forms), the definitions of different clients on risks are not completely the same, and it is difficult to define a general model classification label to train the model.
The following describes the application of the trusted model.
FIG. 7A is a schematic diagram of a flow of an application of a trust model according to an embodiment of the disclosure. As shown in fig. 7A, when the trusted model is applied, the results may be layered based on the results produced by the trusted model, and the results are distributed to different calculation links according to different layered results, so as to balance the risk trusted result and the calculation cost as a whole.
For example, when the confidence coefficient is more than 80 points, the calculation cost of 64.4% can be brought while completely abandoning and not shunting to the GPU cluster recognition (99.99% is white, and the method can directly return to no risk without flowing into the GPU cluster analysis). Under the precondition that risk recall is not fluctuated, the error identification magnitude is greatly reduced, and the accuracy rate is improved by 2-5 times. It should be noted that 99.99% of white content may indicate that the probability of occurrence risk of the content is low, for example, in content with f (confidence coefficient) greater than or equal to 80 points, 99.99% of content is normal content (it is understood that the probability of occurrence risk is ten thousandth), and may be directly regarded as no risk.
In the above embodiment, the API entry is firstly filtered by the trusted model, the trusted model returns high confidence (risk-free), and the direct interface returns risk-free; the trusted model returns a non-highly trusted request, and the content itself flows into the content risk identification model (e.g., identifying whether the picture has pornography/riot content).
In the above embodiment, the GPU cluster/deep learning cluster may perform risk identification and determination by referring to the content (including text/image/voice, etc.) entered by the client.
In the above embodiment, the result summary, for example, each request record of the client is reserved and used as an entry of the credible model; (e.g., a mobile phone number, how many times a content risk determination is requested, etc.).
In the above embodiments, the relevance of the scores for the credible models may vary from scenario to scenario.
For example, in a text scenario, if the score of the credible model is greater than or equal to 80 points, the current request is considered to be risk-free (the probability of meeting the risk occurrence is one ten thousandth); in the picture scene, because the application scenes are different in shape, the score of the credible model is greater than or equal to 85 points, so that the current request is risk-free (the probability of meeting the risk occurrence is one ten thousandth).
In the above embodiment, the segments of the credible model are also distinguished in the same scene and different development stages. For example, company a application is just online, and a trust model score greater than or equal to 60 may consider the current request to be risk-free (the probability of meeting the risk occurrence is in the ten thousandth); company B is a national level application, the attention of black products is high, the score of a credible model is greater than or equal to 90, and the current request can be considered to be risk-free (the probability of meeting the risk occurrence is one ten thousandth).
In the embodiment of the invention, the parameter field is added on the content data, the content data and the credible result of the parameter are determined through the corresponding credible model, and then the content data is safely identified, so that the calculation cost of the back-end identification can be reduced, the technical problem of low efficiency of the safety identification of the data is solved, and the technical effect of improving the efficiency of the safety identification of the data is achieved.
In the embodiment of the present disclosure, the above-mentioned trusted model may be applied to the service grid, and when the trusted model is used to perform security identification of data, the method for performing security identification of data using the trusted model in the embodiment of the present disclosure may be executed in the service grid in the form of an application service instance.
The embodiment of the disclosure provides a schematic diagram of a service grid for performing secure identification of data by using a trusted model.
Fig. 7B is a schematic diagram of a service grid for secure identification of data using a trust model according to an embodiment of the present invention, where the service grid 700 is mainly used to facilitate secure and reliable communication between multiple microservices, which are applications divided into multiple smaller services or instances and distributed to run on different clusters/machines.
As shown in fig. 7B, the microservice may include an application service instance a and an application service instance B, which form the functional application layer of the service grid 700. In one embodiment, application service instance A runs on machine/workload container group 714(POD) in the form of container/process 708 and application service instance B runs on machine/workload container group 717(POD) in the form of container/process 710.
In one embodiment, application service instance a may be a commodity inquiry service and application service instance B may be a commodity ordering service.
Alternatively, the application service instance a may be a secure identification service for chat messages in a customer scenario, and the application service instance B may be an output risk content data service.
As shown in FIG. 7B, application service instance A and grid agent (sidecar)703 coexist in machine workload container group 714, and application service instance B and grid agent 705 coexist in machine workload container 714. Mesh agent 703 and mesh agent 705 form a data plane layer (data plane) of service mesh 700. Grid agent 703 and grid agent 705 are each in the form of container/process 704, container/process 704 may receive request 712 for commodity query services, grid agent 707 may be running, and grid agent 703 and application service instance a may communicate in both directions, and grid agent 705 and application service instance B may communicate in both directions. In addition, there may be two-way communication between mesh agent 703 and mesh agent 705.
In one embodiment, all traffic for application service instance A is routed through grid proxy 703 to the appropriate destination and all network traffic for application service instance B is routed through grid proxy 705 to the appropriate destination. It should be noted that the network traffic mentioned herein includes, but is not limited to, forms of hypertext Transfer Protocol (HTTP), Representational State Transfer (REST), high-performance, general-purpose open-source framework (GRPC), and data structure storage system (Redis) in open-source memory.
In one embodiment, the functionality of extending the data plane layer may be implemented by writing custom filters (filters) for agents (Envoy) in service grid 700, which may be configured to allow the service grid to properly proxy service traffic, enable service interworking, and service governance. Grid agent 703 and grid agent 705 may be configured to perform at least one of the following functions: service discovery (service discovery), health checking (health checking), Routing (Routing), Load Balancing (Load Balancing), authentication and authorization (authentication and authorization), and observability (observability).
As shown in fig. 7B, the services grid 700 also includes a control plane layer. Where the control plane layer may be a group of services running in a dedicated namespace, these services are hosted by hosting control plane components 701 in machine/workload container groups (machines/Pod) 702. As shown in fig. 7, hosted control plane component 701 is in two-way communication with mesh proxy 703 and mesh proxy 705. Managed control plane component 701 is configured to perform some of the functions of control management. For example, hosted control plane component 701 receives telemetry data transmitted by mesh proxy 703 and mesh proxy 705, which may be further aggregated. These services, hosting control plane component 701 may also provide user-oriented Application Programming Interfaces (APIs) to more easily manipulate network behavior, provide configuration data to grid proxy 703 and grid proxy 705, and the like.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the related art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 3
According to an embodiment of the present invention, there is also provided a data security identification apparatus for implementing the data security identification method shown in fig. 2.
Fig. 8 is a schematic diagram of a device for securely recognizing data according to an embodiment of the present invention. As shown in fig. 8, the data security identification device 80 may include: a first acquisition unit 81, a first recognition unit 82, and a second recognition unit 83.
The first acquiring unit 81 is configured to acquire data to be detected in a target scene, where the data to be detected includes content data to be subjected to security identification and a parameter, and the parameter is associated with historical data in the target scene;
the first identification unit 82 is configured to perform trusted identification on data to be detected based on a trusted model to obtain a trusted result, where the trusted model is obtained by fitting based on historical data, the characteristic dimensions of the historical data include characteristic dimensions of parameters, and the trusted result is used to represent a security degree of content data;
a second identifying unit 83 for identifying the risky content data in the content data based on the trusted result.
Optionally, the first identifying unit 81 includes: the first identification module is used for carrying out credible identification on positive correlation characteristics and negative correlation characteristics in the data to be detected based on the credible model to obtain a credible result, wherein the value of the positive correlation characteristics is in direct proportion to the safety degree represented by the credible result, and the value of the negative correlation characteristics is in inverse proportion to the safety degree represented by the credible result.
Optionally, the first identification module comprises: and the first identification submodule is used for carrying out credible identification on the logarithm of the positive correlation characteristic and the logarithm of the reciprocal of the negative correlation characteristic based on the credible model to obtain a credible result.
Optionally, the first identification module comprises one of: a second identification submodule, a third identification submodule and a fourth identification submodule.
The second identification submodule is used for carrying out credible identification on positive correlation characteristics and negative correlation characteristics in data to be detected based on the credible model through the following steps to obtain a credible result: adjusting the positive correlation characteristics; and performing credible identification on the adjusted positive correlation characteristics and negative correlation characteristics based on a credible model to obtain a credible result.
The third identification submodule is used for carrying out credible identification on positive correlation characteristics and negative correlation characteristics in the data to be detected based on the credible model through the following steps to obtain credible results: adjusting the negative correlation characteristic; and carrying out credible identification on the positive correlation characteristics and the adjusted negative correlation characteristics based on the credible model to obtain a credible result.
The fourth identification submodule is used for carrying out credible identification on positive correlation characteristics and negative correlation characteristics in the data to be detected based on the credible model to obtain a credible result: adjusting the positive correlation characteristic and the negative correlation characteristic; and carrying out credible identification on the adjusted positive correlation characteristics and the adjusted negative correlation characteristics based on a credible model to obtain a credible result.
Optionally, the apparatus further comprises: a determining unit, configured to determine, in the history data, feature data corresponding to a feature dimension of the parameter; and the processing unit is used for fitting the characteristic data to obtain a credible model.
Optionally, the processing unit comprises: the determining module is used for determining the weight corresponding to the feature data, wherein the weight is used for representing the contribution degree of the feature data to the credible model; and the processing module is used for fitting the characteristic data based on the weight to obtain a credible model.
Optionally, the second identifying unit 83 includes: and a response module, configured to identify risky content data from the content data based on an identification model in response to the confidence result being within a first threshold range, wherein the identification model includes a processor cluster and/or is obtained based on deep learning training.
Optionally, the response module comprises: the first response submodule is used for responding to the credible result being within a second threshold range, and forbidding inputting of the content data to the recognition model, wherein the safety degree represented by the second threshold range is higher than that represented by the first threshold range; and the second response submodule is used for responding to the credible result in a third threshold range, inputting the content data into the auditing platform or discarding the content data, wherein the safety degree represented by the third threshold range is lower than the safety degree represented by the first threshold range, and the content data is audited by the auditing platform in response to the auditing operation instruction.
Optionally, the response module further comprises: the determining submodule is used for determining a first threshold range, a second threshold range and a third threshold range corresponding to the target scene.
Optionally, the apparatus further comprises: and the adjusting unit is used for adjusting the credible model based on the credible result and/or the associated information of the credible result.
Optionally, the characteristic dimension comprises at least one of: account number dimensions, communication dimensions, device dimensions.
In the embodiment of the disclosure, data to be detected in a target scene is acquired through a first acquisition unit, wherein the data to be detected comprises content data to be subjected to security identification and parameters, and the parameters are associated with historical data in the target scene; the first identification unit is used for performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; the second identification unit identifies risk content data in the content data based on the credible result, namely, in the application, the parameter field is added to the content data, the credible result of the content data and the parameter is determined through the corresponding credible model, and then the content data is safely identified, so that the calculation cost of the rear-end identification can be reduced, the technical effect of improving the efficiency of safely identifying the data is achieved, and the technical problem of low efficiency of safely identifying the data is solved.
It should be noted here that the first acquiring unit 81, the first identifying unit 82, and the second identifying unit 83 correspond to steps S202 to S206 in embodiment 1, and the three units are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the above units as a part of the apparatus may operate in the computer terminal 10 provided in the first embodiment.
According to the embodiment of the present invention, there is also provided a data security identification device for implementing the data security identification method shown in fig. 3 from the system side.
Fig. 9 is a schematic diagram of a device for securely recognizing data provided from a system side according to an embodiment of the present invention. As shown in fig. 9, the data security identification device 90 may include: a second acquisition unit 91, a third recognition unit 92, a fourth recognition unit 93 and a first output unit 94.
The second obtaining unit 91 is configured to obtain data to be detected in a target scene by calling a first interface, where the first interface includes a first parameter, a parameter value of the first parameter is the data to be detected, the data to be detected includes content data to be subjected to security identification and a parameter, and the parameter is associated with history data in the target scene;
the third identification unit 92 is configured to perform trusted identification on the data to be detected based on a trusted model to obtain a trusted result, where the trusted model is obtained by fitting based on historical data, the characteristic dimensions of the historical data include characteristic dimensions of parameters, and the trusted result is used to represent the security degree of the content data;
a fourth identifying unit 93 configured to identify risky content data among the content data based on the credible result;
the first output unit 94 is configured to output the risk content data by invoking a second interface, where the second interface includes a second parameter, and a parameter value of the second parameter is the risk content data.
It should be noted here that the second acquiring unit 91, the third identifying unit 92, the fourth identifying unit 93 and the first outputting unit 94 correspond to steps S302 to S308 in embodiment 1, and the four units are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the above units as a part of the apparatus may operate in the computer terminal 10 provided in the first embodiment.
In the embodiment of the present disclosure, the second obtaining unit is used to call the first interface to obtain the data to be detected in the target scene; the third identification unit is used for carrying out credible identification on the data to be detected based on the credible model to obtain a credible result; a fourth identification unit that identifies risky content data among the content data based on the credible result; the first output unit outputs the risk content data by calling the second interface, so that the purpose of safely identifying the data on the system side is achieved, the technical effect of improving the efficiency of safely identifying the data is achieved, and the technical problem of low efficiency of safely identifying the data is solved.
According to the embodiment of the invention, a data security identification device for implementing the data security identification method shown in fig. 4 is also provided from the text scene side.
Fig. 10 is a schematic diagram of a device for securely recognizing data provided from a text scene side according to an embodiment of the present invention. As shown in fig. 10, the data security identification device 100 may include: a third acquisition unit 101, a fifth recognition unit 102, a sixth recognition unit 103 and a second output unit 104.
A third obtaining unit 101, configured to obtain data to be detected from the entertainment platform, where the data to be detected includes media content data to be subjected to security identification and a parameter, and the parameter is associated with historical data of the entertainment platform;
a fifth identification unit 102, configured to perform trusted identification on the data to be detected based on a trusted model to obtain a trusted result, where the trusted model is obtained by fitting historical data based on an entertainment platform, the characteristic dimensions of the historical data include characteristic dimensions of parameters, and the trusted result is used to represent a security degree of the media content data;
a sixth identifying unit 103, configured to identify risk content data in the media content data based on the credible result;
and a second output unit 104, configured to output the risk content data to the entertainment platform.
It should be noted here that the third acquiring unit 101, the fifth identifying unit 102, the sixth identifying unit 103, and the second outputting unit 104 correspond to steps S402 to S408 in embodiment 1, and the four units are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the above units as a part of the apparatus may operate in the computer terminal 10 provided in the first embodiment.
In the embodiment of the disclosure, the data to be detected from the entertainment platform is acquired, the data to be detected is subjected to credible identification based on the credible model to obtain the credible result, the risk content data in the media content data is identified based on the credible result, and finally the risk content data is output to the entertainment platform, so that the purpose of safely identifying the data of the entertainment platform on the text scene side is achieved, the technical effect of improving the efficiency of safely identifying the data is achieved, and the technical problem of low efficiency of safely identifying the data is solved.
According to the embodiment of the invention, a data security identification device for implementing the data security identification method shown in the figure 5 is also provided from the man-machine interaction side.
Fig. 11 is a schematic diagram of a device for securely recognizing data provided from a human-computer interaction side according to an embodiment of the present invention. As shown in fig. 11, the data security identification device 110 may include: a first response unit 111 and a second response unit 112.
The first response unit 111 is configured to respond to a data input instruction acting on the operation interface, and display data to be detected in a target scene on the operation interface, where the data to be detected includes content data to be subjected to security identification and a parameter, and the parameter is associated with historical data in the target scene;
the second response unit 112 is configured to, in response to the security identification instruction acting on the operation interface, display risk content data of the content data on the operation interface, where the risk content data is identified from the content data based on a trusted result, the trusted result is obtained by performing trusted identification on data to be detected based on a trusted model and is used to represent a security degree of the content data, the trusted model is obtained by fitting based on history data, and a feature dimension of the history data includes a feature dimension of a parameter.
It should be noted here that the first responding unit 111 and the second responding unit 112 correspond to steps S502 to S504 in embodiment 1, and the two units are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure of the first embodiment. It should be noted that the above units as part of the apparatus may operate in the computer terminal 10 provided in the first embodiment.
In the embodiment of the disclosure, the first response unit responds to a data input instruction acting on the operation interface, and displays the data to be detected in the target scene on the operation interface; the second response unit responds to the safety identification instruction acting on the operation interface and displays the risk content data of the content data on the operation interface, so that the aim of safely identifying the data on the operation interface according to the instruction on the man-machine interaction side is fulfilled, the technical effect of improving the efficiency of safely identifying the data is achieved, and the technical problem of low efficiency of safely identifying the data is solved.
Example 4
Embodiments of the present invention may provide a system for secure identification of data that may include a processor and a memory.
In this embodiment, the above-mentioned system for securely recognizing data may execute the program code of the following steps in the method for securely recognizing data according to the embodiment of the present invention: acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; performing credible identification on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for representing the safety degree of the content data; identifying risky content data in the content data based on the trustworthy results.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for secure identification of data in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, that is, implements the method for secure identification of data described above. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, which may be connected to a computer terminal (or mobile terminal) through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; risk content data in the content data is identified based on the trustworthy results.
Optionally, the processor may further execute the program code of the following steps: and carrying out credible identification on positive correlation characteristics and negative correlation characteristics in the data to be detected based on the credible model to obtain a credible result, wherein the value of the positive correlation characteristics is in direct proportion to the safety degree represented by the credible result, and the value of the negative correlation characteristics is in inverse proportion to the safety degree represented by the credible result.
Optionally, the processor may further execute the program code of the following steps: and carrying out credible identification on the logarithm of the positive correlation characteristic and the logarithm of the reciprocal of the negative correlation characteristic based on a credible model to obtain a credible result.
Optionally, the processor may further execute a program code of one of the following steps: adjusting the positive correlation characteristics; performing credible identification on the adjusted positive correlation characteristic and negative correlation characteristic based on a credible model to obtain a credible result; adjusting the negative correlation characteristic; carrying out credible identification on the positive correlation characteristics and the adjusted negative correlation characteristics based on a credible model to obtain a credible result; adjusting the positive correlation characteristic and the negative correlation characteristic; and carrying out credible identification on the adjusted positive correlation characteristics and the adjusted negative correlation characteristics based on a credible model to obtain a credible result.
Optionally, the processor may further execute the program code of the following steps: determining feature data corresponding to feature dimensions of the parameters in the historical data; and fitting the characteristic data to obtain a credible model.
Optionally, the processor may further execute the program code of the following steps: determining a weight corresponding to the feature data, wherein the weight is used for representing the contribution degree of the feature data to the credible model; and fitting the characteristic data based on the weight to obtain a credible model.
Optionally, the processor may further execute the program code of the following steps: responsive to the confidence result being within the first threshold range, risky content data is identified from the content data based on an identification model, wherein the identification model includes a cluster of processors and/or is trained based on deep learning.
Optionally, the processor may further execute a program code of one of the following steps: in response to the confidence result being within a second threshold range, inhibiting input of the content data to the recognition model, wherein the second threshold range is characterized by a higher degree of security than the first threshold range; and in response to the credible result being within a third threshold range, inputting the content data to an auditing platform or discarding the content data, wherein the security degree represented by the third threshold range is lower than the security degree represented by the first threshold range, and the content data is audited by the auditing platform in response to the auditing operation instruction.
Optionally, the processor may further execute the program code of the following steps: a first threshold range, a second threshold range, and a third threshold range corresponding to the target scene are determined.
Optionally, the processor may further execute the program code of the following steps: based on the trusted result and/or associated information of the trusted result, the trusted model is adjusted.
As an optional implementation manner, the processor may further call the information and the application program stored in the memory through the transmission device to perform the following steps: acquiring to-be-detected data under a target scene by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the to-be-detected data, the to-be-detected data comprises content data to be subjected to safety identification and a parameter, and the parameter is associated with historical data under the target scene; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; identifying risky content data in the content data based on the trusted result; and outputting the risk content data by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the risk content data.
As an optional implementation manner, the processor may further call the information and the application program stored in the memory through the transmission device to perform the following steps: acquiring data to be detected from an entertainment platform, wherein the data to be detected comprises media content data to be subjected to safety identification and parameters, and the parameters are associated with historical data of the entertainment platform; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting historical data based on an entertainment platform, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the media content data; identifying risky content data in the media content data based on the trustworthy results; and outputting the risk content data to the entertainment platform.
As an optional implementation manner, the processor may further call the information and the application program stored in the memory through the transmission device to perform the following steps: responding to a data input instruction acting on an operation interface, and displaying to-be-detected data in a target scene on the operation interface, wherein the to-be-detected data comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; and responding to a safety identification instruction acting on the operation interface, and displaying risk content data of the content data on the operation interface, wherein the risk content data are identified from the content data based on a credible result, the credible result is obtained by carrying out credible identification on data to be detected based on a credible model and is used for representing the safety degree of the content data, the credible model is obtained by fitting based on historical data, and the characteristic dimension of the historical data comprises the characteristic dimension of a parameter.
The embodiment of the invention provides a scheme for safely identifying data. Acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; the risk content data in the content data are identified based on the credible result, the technical problem of low efficiency of safety identification of the data is solved, and the technical effect of improving the safety identification efficiency of the data is achieved.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the secure identification method for data provided in the first embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; performing credible identification on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for representing the safety degree of the content data; identifying risky content data in the content data based on the trustworthy results.
Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: and carrying out credible identification on positive correlation characteristics and negative correlation characteristics in the data to be detected based on the credible model to obtain a credible result, wherein the value of the positive correlation characteristics is in direct proportion to the safety degree represented by the credible result, and the value of the negative correlation characteristics is in inverse proportion to the safety degree represented by the credible result.
Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: and carrying out credible identification on the logarithm of the positive correlation characteristic and the logarithm of the reciprocal of the negative correlation characteristic based on a credible model to obtain a credible result.
Optionally, the computer readable storage medium is further arranged to store program code for performing one of the following steps: adjusting the positive correlation characteristics; performing credible identification on the adjusted positive correlation characteristic and negative correlation characteristic based on a credible model to obtain a credible result; adjusting the negative correlation characteristic; carrying out credible identification on the positive correlation characteristics and the adjusted negative correlation characteristics based on a credible model to obtain a credible result; adjusting the positive correlation characteristic and the negative correlation characteristic; and performing credible identification on the adjusted positive correlation characteristics and the adjusted negative correlation characteristics based on a credible model to obtain a credible result.
Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: determining feature data corresponding to feature dimensions of the parameters in the historical data; and fitting the characteristic data to obtain a credible model.
Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: determining a weight corresponding to the feature data, wherein the weight is used for representing the contribution degree of the feature data to the credible model; and fitting the characteristic data based on the weight to obtain a credible model.
Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: responsive to the confidence result being within the first threshold range, risky content data is identified from the content data based on an identification model, wherein the identification model includes a cluster of processors and/or is trained based on deep learning.
Optionally, the computer readable storage medium is further arranged to store program code for performing one of the following steps: in response to the trustworthy results being within a second threshold range, inhibiting input of content data to the recognition model, wherein the second threshold range is characterized by a higher degree of security than the first threshold range; and in response to the credible result being within a third threshold range, inputting the content data into an auditing platform or discarding the content data, wherein the security degree represented by the third threshold range is lower than the security degree represented by the first threshold range, and the content data is audited by the auditing platform in response to the auditing operation instruction.
Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: a first threshold range, a second threshold range, and a third threshold range corresponding to the target scene are determined.
Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: and adjusting the credible model based on the credible result and/or the associated information of the credible result.
As an optional implementation manner, in this embodiment, the computer-readable storage medium is further configured to store program codes for performing the following steps: acquiring to-be-detected data under a target scene by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the to-be-detected data, the to-be-detected data comprises content data to be subjected to safety identification and a parameter, and the parameter is associated with historical data under the target scene; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the content data; identifying risky content data in the content data based on the trustworthy results; and outputting the risk content data by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the risk content data.
As an optional implementation manner, in this embodiment, the computer-readable storage medium is further configured to store program codes for performing the following steps: acquiring data to be detected from an entertainment platform, wherein the data to be detected comprises media content data to be subjected to safety identification and parameters, and the parameters are associated with historical data of the entertainment platform; credible identification is carried out on data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting historical data based on an entertainment platform, the characteristic dimensions of the historical data comprise the characteristic dimensions of parameters, and the credible result is used for expressing the safety degree of the media content data; identifying risky content data in the media content data based on the trustworthy results; and outputting the risk content data to the entertainment platform.
As an optional implementation manner, in this embodiment, the computer-readable storage medium is further configured to store program codes for performing the following steps: responding to a data input instruction acting on an operation interface, and displaying to-be-detected data in a target scene on the operation interface, wherein the to-be-detected data comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene; and responding to a safety identification instruction acting on the operation interface, and displaying risk content data of the content data on the operation interface, wherein the risk content data are identified from the content data based on a credible result, the credible result is obtained by carrying out credible identification on data to be detected based on a credible model and is used for representing the safety degree of the content data, the credible model is obtained by fitting based on historical data, and the characteristic dimension of the historical data comprises the characteristic dimension of a parameter.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described in detail in a certain embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. A method for secure identification of data, comprising:
acquiring data to be detected in a target scene, wherein the data to be detected comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data in the target scene;
performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on the historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of the parameters, and the credible result is used for expressing the safety degree of the content data;
identifying risky content data among the content data based on the trustworthiness result.
2. The method of claim 1, wherein performing a trusted recognition on the data to be detected based on a trusted model to obtain a trusted result comprises:
and carrying out credible identification on positive correlation characteristics and negative correlation characteristics in the data to be detected based on the credible model to obtain the credible result, wherein the value of the positive correlation characteristics is in direct proportion to the safety degree represented by the credible result, and the value of the negative correlation characteristics is in inverse proportion to the safety degree represented by the credible result.
3. The method according to claim 2, wherein the credible identification of the positive correlation characteristics and the negative correlation characteristics in the data to be detected is performed based on the credible model to obtain the credible result, and the method comprises:
and carrying out credible identification on the logarithm of the positive correlation characteristic and the logarithm of the reciprocal of the negative correlation characteristic based on the credible model to obtain the credible result.
4. The method according to claim 2, wherein the credible identification of the positive correlation characteristics and the negative correlation characteristics in the data to be detected is performed based on the credible model, so as to obtain the credible result, and the credible result includes one of the following steps:
adjusting the positive correlation characteristic; performing credible identification on the adjusted positive correlation characteristic and negative correlation characteristic based on the credible model to obtain a credible result;
adjusting the negative correlation characteristic; performing credible identification on the positive correlation characteristics and the adjusted negative correlation characteristics based on the credible model to obtain the credible result;
adjusting the positive correlation characteristic and the negative correlation characteristic; and carrying out credible identification on the adjusted positive correlation characteristics and the adjusted negative correlation characteristics based on the credible model to obtain the credible result.
5. The method of claim 1,
determining feature data corresponding to feature dimensions of the parameters in the historical data;
and fitting the characteristic data to obtain the credible model.
6. The method of claim 5, wherein fitting the feature data to obtain the confidence model comprises:
determining a weight corresponding to the feature data, wherein the weight is used for representing the contribution degree of the feature data to the credible model;
and fitting the characteristic data based on the weight to obtain the credible model.
7. The method of claim 1, wherein identifying risky content data among the content data based on the trust result comprises:
in response to the confidence result being within a first threshold range, identifying the risky content data from the content data based on an identification model, wherein the identification model comprises a cluster of processors and/or is trained based on deep learning.
8. The method of claim 7, further comprising one of:
responsive to the confidence result being within a second threshold range, inhibiting input of the content data to a recognition model, wherein the second threshold range is characterized by a degree of security that is higher than the degree of security characterized by the first threshold range;
and in response to the credible result being within a third threshold range, inputting the content data to an auditing platform or discarding the content data, wherein the third threshold range represents a lower security level than the first threshold range, and the content data is audited by the auditing platform in response to an auditing operation instruction.
9. The method of claim 8, further comprising:
determining the first threshold range, the second threshold range, and the third threshold range corresponding to the target scene.
10. The method according to any one of claims 1 to 9, further comprising:
and adjusting the credibility model based on the credibility result and/or the associated information of the credibility result.
11. A method for secure identification of data, comprising:
acquiring data to be detected in a target scene by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the data to be detected, the data to be detected comprises content data to be subjected to safety identification and a parameter, and the parameter is associated with historical data in the target scene;
performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on the historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of the parameters, and the credible result is used for expressing the safety degree of the content data;
identifying risky content data in the content data based on the trustworthiness results;
and outputting the risk content data by calling a second interface, wherein the second interface comprises a second parameter, and a parameter value of the second parameter is the risk content data.
12. A method for secure identification of data, comprising:
acquiring data to be detected from an entertainment platform, wherein the data to be detected comprises media content data to be subjected to safety identification and parameters, and the parameters are associated with historical data of the entertainment platform;
performing credible identification on the data to be detected based on a credible model to obtain a credible result, wherein the credible model is obtained by fitting based on the historical data, the characteristic dimensions of the historical data comprise the characteristic dimensions of the parameters, and the credible result is used for expressing the safety degree of the media content data;
identifying risky content data in the media content data based on the trustworthy results;
and outputting the risk content data to the entertainment platform.
13. A method for secure identification of data, comprising:
responding to a data input instruction acting on an operation interface, and displaying to-be-detected data under a target scene on the operation interface, wherein the to-be-detected data comprises content data to be subjected to safety identification and parameters, and the parameters are associated with historical data under the target scene;
and responding to a safety identification instruction acting on the operation interface, and displaying risk content data of the content data on the operation interface, wherein the risk content data are identified from the content data based on a credible result, the credible result is obtained by carrying out credible identification on the data to be detected based on a credible model and is used for representing the safety degree of the content data, the credible model is obtained by fitting based on the historical data, and the characteristic dimension of the historical data comprises the characteristic dimension of the parameter.
14. A computer-readable storage medium, comprising a stored program, wherein the program, when executed by a processor, controls an apparatus in which the computer-readable storage medium is located to perform the method of any of claims 1-13.
CN202210297085.9A 2022-03-24 2022-03-24 Data security identification method and storage medium Active CN114938285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210297085.9A CN114938285B (en) 2022-03-24 2022-03-24 Data security identification method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210297085.9A CN114938285B (en) 2022-03-24 2022-03-24 Data security identification method and storage medium

Publications (2)

Publication Number Publication Date
CN114938285A true CN114938285A (en) 2022-08-23
CN114938285B CN114938285B (en) 2024-10-22

Family

ID=82861512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210297085.9A Active CN114938285B (en) 2022-03-24 2022-03-24 Data security identification method and storage medium

Country Status (1)

Country Link
CN (1) CN114938285B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180248918A1 (en) * 2015-11-02 2018-08-30 Alibaba Group Holding Limited Service processing method and apparatus
CN108665366A (en) * 2018-04-27 2018-10-16 平安科技(深圳)有限公司 Determine method, terminal device and the computer readable storage medium of consumer's risk grade
CN109741065A (en) * 2019-01-28 2019-05-10 广州虎牙信息科技有限公司 A kind of payment risk recognition methods, device, equipment and storage medium
CN110399925A (en) * 2019-07-26 2019-11-01 腾讯科技(武汉)有限公司 Risk Identification Method, device and the storage medium of account
CN111582722A (en) * 2020-05-09 2020-08-25 拉扎斯网络科技(上海)有限公司 Risk identification method and device, electronic equipment and readable storage medium
CN112488719A (en) * 2020-11-17 2021-03-12 中信银行股份有限公司 Account risk identification method and device
CN112926699A (en) * 2021-04-25 2021-06-08 恒生电子股份有限公司 Abnormal object identification method, device, equipment and storage medium
CN113592293A (en) * 2021-07-29 2021-11-02 上海掌门科技有限公司 Risk identification processing method, electronic device and computer-readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180248918A1 (en) * 2015-11-02 2018-08-30 Alibaba Group Holding Limited Service processing method and apparatus
CN108665366A (en) * 2018-04-27 2018-10-16 平安科技(深圳)有限公司 Determine method, terminal device and the computer readable storage medium of consumer's risk grade
CN109741065A (en) * 2019-01-28 2019-05-10 广州虎牙信息科技有限公司 A kind of payment risk recognition methods, device, equipment and storage medium
CN110399925A (en) * 2019-07-26 2019-11-01 腾讯科技(武汉)有限公司 Risk Identification Method, device and the storage medium of account
CN111582722A (en) * 2020-05-09 2020-08-25 拉扎斯网络科技(上海)有限公司 Risk identification method and device, electronic equipment and readable storage medium
CN112488719A (en) * 2020-11-17 2021-03-12 中信银行股份有限公司 Account risk identification method and device
CN112926699A (en) * 2021-04-25 2021-06-08 恒生电子股份有限公司 Abnormal object identification method, device, equipment and storage medium
CN113592293A (en) * 2021-07-29 2021-11-02 上海掌门科技有限公司 Risk identification processing method, electronic device and computer-readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘建设;: "区块链移动支付项目的风险识别与控制", 财会通讯, no. 17, 17 June 2019 (2019-06-17) *

Also Published As

Publication number Publication date
CN114938285B (en) 2024-10-22

Similar Documents

Publication Publication Date Title
EP3471007B1 (en) Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions
CN110399925B (en) Account risk identification method, device and storage medium
US10990896B2 (en) Systems and methods for incorporating long-term patterns in online fraud detection
US10929511B2 (en) Systems and methods for protecting sensitive information
US20190158526A1 (en) Computerized system and method for automatically determining malicious ip clusters using network activity data
CN103198123B (en) For system and method based on user's prestige filtering spam email message
CN106713332A (en) Network data processing method, device and system
US10255423B2 (en) Systems and methods for providing image-based security measures
CN100362805C (en) Multifunctional management system for detecting erotic images and unhealthy information in network
CN110457601B (en) Social account identification method and device, storage medium and electronic device
CN109213857A (en) A kind of fraud recognition methods and device
JP2019101672A (en) Cyber attack information processing program, cyber attack information processing method and information processing device
Luntovskyy et al. Cryptographic technology blockchain and its applications
CN104683376A (en) Novel cloud computing distributed data encryption method and system
CN114422211B (en) HTTP malicious traffic detection method and device based on graph attention network
CN113111359A (en) Big data resource sharing method and resource sharing system based on information security
Prasath et al. A meta‐heuristic Bayesian network classification for intrusion detection
CN114422271B (en) Data processing method, device, equipment and readable storage medium
Salau et al. Data cooperatives for neighborhood watch
CN114329450A (en) Data security processing method, device, equipment and storage medium
CN116866076A (en) Network honey pot identification method, device, equipment and storage medium
Alzubi et al. EdgeFNF: Toward Real-time Fake News Detection on Mobile Edge Computing
Cresci Harnessing the social sensing revolution: challenges and opportunities
Moradi et al. Rogue people: on adversarial crowdsourcing in the context of cyber security
CN117093627A (en) Information mining method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant