CN116933316A

CN116933316A - Method and device for analyzing consistency of intelligent terminal application sensitive behavior and privacy policy

Info

Publication number: CN116933316A
Application number: CN202310920155.6A
Authority: CN
Inventors: 杨智; 杨保山; 陈性元; 孙浩东; 袁占慧; 靖志昊; 徐航
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-10-24

Abstract

The application discloses a method and a device for analyzing consistency of sensitive behaviors and privacy policies of an intelligent terminal application. The situation that the description of the target privacy policy of the target application is not standard and the target sensitive behavior of the target application is violated is avoided, and the safety of user information is ensured. And the application market can know the reason that the target privacy policy is inconsistent with the target sensitive behavior according to the inconsistent condition satisfied by the target sensitive behavior data, and can deeply reveal specific behavior of the target privacy policy, which is described to be not standard.

Description

Method and device for analyzing consistency of intelligent terminal application sensitive behavior and privacy policy

Technical Field

The application relates to the technical field of Internet, in particular to a method and a device for analyzing consistency of intelligent terminal application sensitive behaviors and privacy policies.

Background

Along with the development of network informatization, various applications in the intelligent terminal are integrated into various aspects of life of people, and great convenience is brought to people. However, with the continuous expansion of the application market, applications in the application market are more and more complex, sensitive behaviors of some applications may reveal user information, so that the user information is less and less secure, in order to protect the security of the user information, a developer of the application needs to attach a privacy policy when uploading the applications in the application market, the privacy policy can declare which sensitive behaviors will be performed, and the applications can perform the sensitive behaviors after the user agrees.

However, some applications have privacy policies that are not described in specification, and are against the sensitive actions they actually perform, so that the security of the user information cannot be guaranteed.

Disclosure of Invention

In view of the above, the application provides a method and a device for analyzing consistency of sensitive behavior and privacy policy of an intelligent terminal application, which are used for solving the problem that security of user information cannot be ensured due to violation of the sensitive behavior actually performed by the intelligent terminal because description of the privacy policy of the application is not standard.

In order to achieve the above object, the following solutions have been proposed:

An intelligent terminal application sensitive behavior and privacy policy consistency analysis method comprises the following steps:

acquiring a target privacy policy of a target application to be analyzed and target sensitive behavior data actually generated by the target application;

for each target sensitive behavior data, judging whether any one of the following set inconsistency conditions is satisfied:

the target privacy policy does not have statement of target sensitive behavior corresponding to the target sensitive behavior data;

the declarations of the target sensitive behavior in the target privacy policy all indicate that the target sensitive behavior is not performed;

the target privacy policy has a statement that the target sensitive behavior is performed and a statement that the target sensitive behavior is not performed;

if yes, outputting the conclusion that the target privacy policy is inconsistent with the target sensitive behavior and the inconsistent condition met by the target sensitive behavior data.

Preferably, the method further comprises:

for each target sensitive behavior data, judging whether any one of the following set coincidence conditions is satisfied:

the declarations of the target sensitive behaviors in the target privacy policy all represent the target sensitive behaviors;

Claims in the target privacy policy for sensitive behavior to be performed can cover the target sensitive behavior;

if yes, outputting the conclusion that the target privacy policy is consistent with the target sensitive behavior and the consistent condition satisfied by the target sensitive behavior data.

Preferably, after the obtaining the target privacy policy of the target application to be analyzed and the target sensitive behavior data actually generated by the target application, the method further includes:

extracting declarations of sensitive behaviors in the target privacy policy;

the process for judging whether the target sensitive behavior data meets the consistent condition and the inconsistent condition comprises the following steps:

and judging whether the target sensitive behavior data meets the consistent condition and the inconsistent condition or not based on the statement of each sensitive behavior in the target privacy policy.

Preferably, extracting the statement of each sensitive behavior in the target privacy policy includes:

inputting the target privacy policy into a preset sensitive behavior statement extraction model to obtain statement of each sensitive behavior in the extracted target privacy policy, wherein the sensitive behavior statement extraction model is a model obtained by training each first training privacy policy as a training sample and taking statement of each first training sensitive behavior in each first training privacy policy extracted in advance as a sample label.

Preferably, the statement of sensitive behavior in the extracted privacy policy specifically includes: the method comprises the steps of a data type corresponding to the sensitive behavior, an entity and a first combination of actions, wherein the data type corresponding to the sensitive behavior is the type of sensitive information related to the sensitive behavior, the entity corresponding to the sensitive behavior is the entity in which the related sensitive information flows, and the action corresponding to the sensitive behavior indicates that the sensitive information corresponding to the sensitive behavior flows to the entity corresponding to the sensitive behavior or the sensitive information corresponding to the sensitive behavior does not flow to the entity corresponding to the sensitive behavior.

Preferably, the action corresponding to the sensitive behavior is a preset first identifier or a preset second identifier, when the action corresponding to the sensitive behavior is the first identifier, the sensitive information corresponding to the sensitive behavior is indicated to flow to the entity corresponding to the sensitive behavior, and when the action corresponding to the sensitive behavior is the second identifier, the sensitive information corresponding to the sensitive behavior is indicated to not flow to the entity corresponding to the sensitive behavior.

Preferably, for each of the first training privacy policies, the process of extracting a first combination corresponding to each of the first training sensitive behaviors in the first training privacy policies includes:

Extracting key information about the first training sensitive behavior in the first training privacy policy;

extracting the data type matched with the key information from a preset data type annotation library;

extracting an entity matched with the key information from a preset entity annotation library;

extracting the actions matched with the key information from a preset action annotation library;

and constructing a first combination corresponding to the first training sensitive behavior based on the extracted data type, entity and action matched with the first training sensitive behavior.

Preferably, the construction process of the data type annotation library, the entity annotation library and the action annotation library comprises the following steps:

acquiring preset second training privacy policies;

acquiring each statement in each second training privacy policy;

performing word segmentation on each sentence to obtain a word segmentation result of each sentence;

searching words with data type attributes, words with entity attributes and words with action attributes in word segmentation results of each sentence;

constructing the data type annotation library based on each searched word with the data type attribute;

constructing the entity annotation library based on each searched word with entity attribute;

And constructing the action annotation library based on the searched words with the action attributes.

Preferably, the process of acquiring each target sensitive behavior data actually generated by the target application includes:

and acquiring a second combination of the data type of the actually generated sensitive information and the entity of the actually generated sensitive information flow direction in the target application by adopting a preset static analysis algorithm or dynamic analysis algorithm, and taking the second combination as each target sensitive behavior data actually generated by the target application.

Preferably, the process of acquiring the second combination of the data type of the sensitive information actually generated in the target application and the entity of the sensitive information flow direction actually generated by using a preset static analysis algorithm includes:

acquiring each API call in the target application;

a second combination of data types of the actually generated sensitive information in the target application and entities of the actually generated sensitive information flow direction is obtained based on each API call.

Preferably, the obtaining, based on each API call, a second combination of data types of the sensitive information actually generated in the target application and entities of the sensitive information flow actually generated includes:

Determining a preset sensitive information source list and a preset sensitive information sink list, wherein the sensitive information source list comprises preset data types of all sensitive information, and the sensitive information sink list comprises preset entities of all sensitive information flows;

and iteratively executing the steps of determining the data type matched in the sensitive information source list and the entity matched in the sensitive information sink list by any API call, and determining the combination of the matched data type and entity as the second combination of the data type of the sensitive information actually generated in the target application and the entity of the actually generated sensitive information flow direction when the matched data type and entity are not determined, until the time of determining that all the API calls are matched with the data type and entity.

Preferably, the statement that the target sensitive behavior does not exist in the target privacy policy is specifically: the first combination associated with the second combination does not exist in each first combination corresponding to the target privacy policy;

the statement of the target sensitive behavior in the target privacy policy indicates that the target sensitive behavior is not performed specifically as follows: for a first combination corresponding to each of the target privacy policies associated with the second combination, the actions in the first combination are the second identifications;

The target privacy policy has a statement that the target sensitive behavior is performed, and the statement that the target sensitive behavior is not performed is specifically: in a first combination corresponding to the target privacy policy associated with the second combination, there is both a first combination of actions included as the first identifier and a first combination of actions included as the second identifier;

the statement of the target sensitive behavior in the target privacy policy indicates that the target sensitive behavior is performed specifically as follows: for a first combination corresponding to each of the target privacy policies associated with the second combination, the data types in the first combination being consistent with the data types in the second combination, and the entities in the first combination being consistent with the entities in the second combination, and the actions in the first combination being the first identifications;

claims of sensitive behavior to be performed in the target privacy policy can cover the specific target sensitive behavior as follows: for a first combination corresponding to each of the target privacy policies associated with the second combination, the data type in the second combination belongs to the data type in the first combination and the entity in the second combination belongs to the entity in the first combination, or the data type in the second combination belongs to the data type in the first combination and the entity in the second combination is consistent with the entity in the first combination, or the data type in the second combination is consistent with the data type in the first combination and the entity in the second combination belongs to the entity in the first combination, and the action in the first combination is the first identification.

Preferably, the determining of the first combination corresponding to the target privacy policy associated with the second combination includes:

for each of the first combinations corresponding to the target privacy policies:

calculating semantic similarity between the data types in the first combination and the data types in the second combination;

calculating semantic similarity between the entities in the first combination and the entities in the second combination;

when the semantic similarity between the data type in the first combination and the data type in the second combination is greater than 0 and the semantic similarity between the entity in the first combination and the entity in the second combination is greater than 0, determining the first combination as a combination associated with the second combination.

Preferably, the process of determining whether the data type in the first combination is consistent with the data type in the second combination, and whether the entity in the first combination is consistent with the entity in the second combination, includes:

judging whether the semantic similarity between the data type in the first combination and the data type in the second combination and the semantic similarity between the entity in the first combination and the entity in the second combination are equal to 1 or not;

If so, determining that the data type in the first combination is consistent with the data type in the second combination, and that the entity in the first combination is consistent with the entity in the second combination.

Preferably, the process of determining whether the data type in the second combination belongs to the data type in the first combination includes:

judging whether the semantic similarity between the data type in the second combination and the data type in the first combination is between 0 and 1;

if yes, determining that the data type in the second combination belongs to the data type in the first combination;

judging whether the semantic similarity between the entity in the second combination and the entity in the first combination is between 0 and 1;

if yes, determining that the entity in the second combination belongs to the entity in the first combination.

Preferably, the calculating the semantic similarity between the data type in the first combination and the data type in the second combination includes:

determining a first dependency relationship between data types in a first combination and data types in the second combination;

based on a preset source node t and the first subordinate relation, taking the data type in a first combination and the data type in the second combination as nodes, and constructing a first directed graph, wherein nodes corresponding to the data type in the first combination and the data type in the second combination belong to the source node t;

Calculating a semantic similarity sim (u 1, v 1) between the data types in the first combination and the second combination based on the formula:

wherein Δ (u 1, t) is a pre-calculated distance between a node u1 corresponding to a data type in a first combination in the first directed graph and the source node t, and Δ (v 1, t) is a pre-calculated distance between a node v1 corresponding to a data type in the second combination in the first directed graph and the source node t;

the calculating semantic similarity between the entities in the first combination and the entities in the second combination comprises:

determining a second affiliation between the entity in the first combination and the entity in the second combination;

based on the source node t and the second subordinate relation, taking the entity in the first combination and the entity in the second combination as nodes, and constructing a second directed graph, wherein nodes corresponding to the entity in the first combination and the entity in the second combination belong to the source node t;

calculating a semantic similarity sim (u 2, v 2) between the entities in the first combination and the entities in the second combination based on the following formula:

where Δ (u 2, t) is a pre-calculated distance between the source node t and a node u2 corresponding to an entity in a first combination in the second directed graph, and Δ (v 2, t) is a pre-calculated distance between the source node t and a node v2 corresponding to an entity in the second combination in the second directed graph.

An intelligent terminal application sensitive behavior and privacy policy consistency analysis device, comprising:

the policy and behavior data acquisition unit is used for acquiring the target privacy policy of the target application to be analyzed and the target sensitive behavior data actually generated by the target application;

a first judging unit, configured to judge, for each of the target sensitive behavior data, whether any one of the following set inconsistency conditions is satisfied:

if yes, executing the following steps of the first conclusion and condition output unit;

and the first conclusion and condition output unit is used for outputting the conclusion that the target privacy policy is inconsistent with the target sensitive behavior and the inconsistent condition met by the target sensitive behavior data.

According to the technical scheme, according to the intelligent terminal application sensitive behavior and privacy policy consistency analysis method provided by the embodiment of the application, as target sensitive behavior data actually generated by each target application is judged whether any set inconsistent condition is met, when the target sensitive behavior data meets any inconsistent condition, a conclusion that the target privacy policy of the target application is inconsistent with the target sensitive behavior corresponding to the target sensitive behavior data and inconsistent conditions met by the target sensitive behavior number are output, so that the condition that the description of the target privacy policy of the target application is not standard and the target sensitive behavior actually performed by the target sensitive behavior data is violated is avoided, and the safety of user information is ensured.

Further, since the inconsistent conditions are respectively that the declaration of the target sensitive behavior does not exist in the target privacy policy, the declaration of the target sensitive behavior in the target privacy policy indicates that the target sensitive behavior is not performed, the declaration of the target sensitive behavior in the target privacy policy does not exist, the declaration of the target sensitive behavior is also performed, when any inconsistent condition is not satisfied by the target sensitive behavior data, not only the conclusion that the target sensitive behavior corresponding to the target privacy policy and the target sensitive behavior data is inconsistent, but also the inconsistent condition satisfied by the target sensitive behavior data is output, so that the application market can learn the reason of the inconsistency between the target privacy policy and the target sensitive behavior according to the inconsistent condition satisfied by the target sensitive behavior data, the specific behavior of the target privacy policy description is deeply revealed, and the corresponding processing can be performed on the application according to the specific behavior of the target privacy policy description non-specification.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for analyzing consistency of application sensitive behavior and privacy policy of an intelligent terminal according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a state matrix according to an embodiment of the present application;

FIG. 3 is a process diagram of a method for intelligent terminal application sensitive behavior and privacy policy consistency analysis;

fig. 4 is a schematic structural diagram of an analysis device for consistency of application sensitive behavior and privacy policy of an intelligent terminal according to an embodiment of the present application;

fig. 5 is a block diagram of a hardware structure of an intelligent terminal application sensitive behavior and privacy policy consistency analysis device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The scheme for analyzing the consistency of the application sensitive behavior and the privacy policy of the intelligent terminal can be suitable for various applications in the intelligent terminal, such as background applications, website applications and the like, and the intelligent terminal can be a mobile phone, a tablet and the like.

The scheme of the application can be realized based on the terminal with the data processing capability, and the terminal can be a computer, a server, a cloud end and the like.

The embodiment of the application provides an analysis scheme for consistency of intelligent terminal application sensitive behavior and privacy policy, and a method for analyzing consistency of intelligent terminal application sensitive behavior and privacy policy of the application is described next through an attached figure 1, and as shown in the figure 1, the method can comprise the following steps:

s100, acquiring target privacy policies of target applications to be analyzed and target sensitive behavior data actually generated by the target applications.

Specifically, in order to avoid the situation that the description of the privacy policy of the application is not standard and is violated with the actual sensitive behavior of the application, the privacy policy of the application and the actual sensitive behavior data of the application can be analyzed to determine whether the sensitive behaviors corresponding to the privacy policy of the application and the sensitive behavior data are inconsistent, so that the target privacy policy of the target application to be analyzed and the target sensitive behavior data actually generated by the target application are obtained. The link of the target privacy policy can be obtained from an application detail interface of a target application of the application market, and then the privacy policy document and the third party information sharing list in the link are extracted to obtain the target privacy policy.

S110, judging whether any set inconsistent condition is met for each target sensitive behavior data.

Wherein setting the inconsistency condition includes: the target privacy policy does not have statement of target sensitive behavior corresponding to the target sensitive behavior data; statement of target sensitive behavior in the target privacy policy indicates that the target sensitive behavior is not performed; there are statements in the target privacy policy that indicate that target sensitive behavior is not being performed, as well as statements that indicate that target sensitive behavior is not being performed.

If yes, the following step S120 is executed.

Specifically, considering that the target privacy policy is inconsistent with the target sensitive behavior corresponding to the target sensitive behavior data, the situations are classified into a plurality of types, for example, if the target sensitive behavior corresponding to the target sensitive behavior data is that the third party a obtains the position information of the user, and no relevant statement for the third party a to obtain the position information of the user exists in the target privacy policy, the description of the target privacy policy is not standard, and the target privacy policy is inconsistent with the target sensitive behavior. If the target sensitive behavior corresponding to the target sensitive behavior data is that the third party A obtains the position information of the user, and the statement of the third party A obtaining the position information of the user in the target privacy policy indicates that the third party A cannot obtain the position information of the user, it is indicated that the target privacy policy is against the target sensitive behavior actually performed by the third party A, and the target privacy policy is inconsistent with the target sensitive behavior. If the target sensitive behavior corresponding to the target sensitive behavior data is that the third party A obtains the position information of the user, and the target privacy policy has a statement that the third party A can obtain the position information of the user and a statement that the third party A cannot obtain the position information of the user, the description of the target privacy policy is not standard, and the target privacy policy is inconsistent with the target sensitive behavior. Therefore, the embodiment of the application sets the three inconsistent conditions according to different conditions, and when the target sensitive behavior data meets any inconsistent condition, the target privacy policy is considered inconsistent with the target sensitive behavior.

S120, outputting a conclusion that the target privacy policy is inconsistent with the target sensitive behavior and inconsistent conditions met by the target sensitive behavior data.

Specifically, when the target sensitive behavior data meets any inconsistent condition, a conclusion that the target privacy policy is inconsistent with the target sensitive behavior is output, so that the application market knows that the target privacy policy is inconsistent with the target sensitive behavior, and the inconsistent condition met by the target sensitive behavior data is output, so that the application market can know the reason that the target privacy policy is inconsistent with the target sensitive behavior according to the inconsistent condition met by the target sensitive behavior data, and can deeply reveal specific behaviors that the target privacy policy describes the nonstandard, so that the application can be correspondingly processed according to the specific behaviors that the privacy policy describes the nonstandard.

According to the intelligent terminal application sensitive behavior and privacy policy consistency analysis method provided by the embodiment of the application, as target sensitive behavior data actually generated by each target application is judged whether any set inconsistent condition is met, when any inconsistent condition is met by the target sensitive behavior data, a conclusion that the target sensitive behavior of the target application is inconsistent with the target sensitive behavior corresponding to the target sensitive behavior data and inconsistent conditions met by the target sensitive behavior data are output, the condition that the description of the target privacy policy of the target application is not standard and the target sensitive behavior of the target application actually performed is violated is avoided, and the safety of user information is ensured.

Optionally, considering that the target privacy policy is consistent with the target sensitive behavior may be classified into different situations, the consistent condition may be preset based on different situations, and whether the target sensitive behavior data meets any consistent condition may be determined, based on this, whether any set consistent condition is met for each target sensitive behavior data is also determined, if so, a conclusion that the target privacy policy is consistent with the target sensitive behavior and the consistent condition met by the target sensitive behavior data are output.

Specifically, when the target sensitive behavior data meets any consistent condition, outputting a conclusion that the target privacy policy is consistent with the target sensitive behavior so as to enable the application market to know that the target privacy policy is consistent with the target sensitive behavior, and outputting the consistent condition met by the target sensitive behavior data so as to enable the application market to know the reason that the target privacy policy is consistent with the target sensitive behavior according to the consistent condition met by the target sensitive behavior data.

Wherein setting the inconsistency condition may include:

statement of target sensitive behavior in the target privacy policy indicates that target sensitive behavior is performed.

Specifically, there may be multiple statements of the target sensitive behavior in the target privacy policy, where the target privacy policy is consistent with the target sensitive behavior when the statements of the target sensitive behavior in the target privacy policy all indicate that the target sensitive behavior is performed. For example, if the target sensitive behavior corresponding to the target sensitive behavior data is that the third party a obtains the location information of the user, and in the target privacy policy, the statement about the target sensitive behavior indicates that the third party a may obtain the location information of the user, at this time, the target privacy policy is consistent with the target sensitive behavior.

Claims of sensitive behavior to be made in the target privacy policy can cover the target sensitive behavior.

Specifically, when the statement of the sensitive behavior to be performed in the target privacy policy can cover the target sensitive behavior, the target privacy policy is consistent with the target sensitive behavior. For example, if the target sensitive behavior corresponding to the target sensitive behavior data is that the third party a obtains the location information of the user, and in the target privacy policy, the statement of the sensitive behavior to be performed is that the third party may obtain the location information of the user, and the third party a belongs to the third party, which indicates that the target privacy policy is consistent with the target sensitive behavior.

In the embodiment of the application, when the target sensitive behavior data meets any set consistent condition, a conclusion that the target privacy policy is consistent with the target sensitive behavior is output, and the consistent condition met by the target sensitive behavior data is output, so that an application market can not only know the conclusion that the target privacy policy is consistent with the target sensitive behavior, but also know the reason that the target privacy policy is consistent with the target sensitive behavior according to the consistent condition met by the target sensitive behavior data.

Optionally, in order to determine whether the target sensitive behavior data meets the above consistent condition and the inconsistent condition, after the target privacy policy of the target application to be analyzed and the target sensitive behavior data actually generated by the target application are obtained, a statement of each sensitive behavior in the target privacy policy may be extracted, and based on the statement of each sensitive behavior in the target privacy policy, whether the target sensitive behavior data meets the above consistent condition and the inconsistent condition may be determined. Based on this, the process of extracting declarations of sensitive behaviors in the target privacy policy may include:

Inputting the target privacy policy into a preset sensitive behavior statement extraction model to obtain statements of sensitive behaviors in the extracted target privacy policy. The sensitive behavior statement extraction model is a model obtained by taking each first training privacy policy as a training sample and taking statement of each first training sensitive behavior in each first training privacy policy extracted in advance as a sample label.

Specifically, the embodiment of the application presets that each first training privacy policy is used as a training sample, the statement of each first training sensitive behavior in each first training privacy policy extracted in advance is used as a sensitive behavior statement extraction model obtained by sample tag training, and the target privacy policy is input into the sensitive behavior statement extraction model, so that the statement of each sensitive behavior in the target privacy policy extracted by the sensitive behavior statement can be obtained.

Optionally, considering that the declaration on the sensitive behavior in the privacy policy mainly includes three key contents, which are types of sensitive information related to the sensitive behavior, respectively, the entity that the sensitive information related to the sensitive behavior flows to, and the entity operates the sensitive information, for example, if a declaration "you collect your location information when using a shake function" exists in the privacy policy of an application, then the type of sensitive information related to the corresponding declaration is location information, the entity that the location information flows to is the application, and the entity that the entity operates the sensitive information is collection, based on this, the declaration on each sensitive behavior in the above extracted privacy policy may be:

A first combination of data types, entities, and actions corresponding to the sensitive behavior. The data type corresponding to the sensitive behavior is the type of sensitive information related to the sensitive behavior, the entity corresponding to the sensitive behavior is the entity to which the related sensitive information flows, and the action corresponding to the sensitive behavior indicates that the sensitive information corresponding to the sensitive behavior flows to the entity corresponding to the sensitive behavior or the sensitive information corresponding to the sensitive behavior does not flow to the entity corresponding to the sensitive behavior. The extracted privacy policy statement is a statement of each sensitive behavior in the extracted target privacy policy, and the extracted first training privacy policy is a statement of each first training sensitive behavior.

Based on this, in the embodiment of the present application, a process of extracting a first combination corresponding to a first training sensitive behavior in each first training privacy policy is described, where the process may include:

step S1, key information about first training sensitive behaviors in a first training privacy policy is extracted.

Specifically, in order to extract the data type, entity and action in the first combination corresponding to the first training sensitive behavior in the first training privacy policy, first, key information about the first training sensitive behavior in the first training privacy policy is extracted, so that the data type, entity and action are extracted from the key information.

And S2, extracting the data type matched with the key information from a preset data type annotation library.

Specifically, the application presets the data type annotation library, and the data type annotation library comprises various data types, so that the data types matched with the key information are extracted from the data type annotation library.

And S3, extracting the entity matched with the key information from a preset entity annotation library.

Specifically, the entity annotation library is preset, and various entities are included in the entity annotation library, so that the entity matched with the key information is extracted from the entity annotation library.

And S4, extracting the action matched with the key information from a preset action annotation library.

Specifically, the application presets the action annotation library, and the action annotation library comprises various actions, so that the actions matched with the key information are extracted from the action annotation library.

And S5, constructing a first combination corresponding to the first training sensitive behavior based on the extracted data type, entity and action matched with the first training sensitive behavior.

Specifically, after the data type, entity and action matched with the first training sensitive behavior are extracted, the extracted data type, entity and action matched with the first training sensitive behavior are combined, and then a first combination corresponding to the first training sensitive behavior can be obtained.

Based on this, the above sensitive behavior declaration extraction model may be specifically configured to: extracting key information about training sensitive behaviors in a target privacy policy, extracting a data type matched with the key information in the data type annotation library, extracting an entity matched with the sensitive behaviors in the target privacy policy in the entity annotation library, extracting an action matched with the sensitive behaviors in the target privacy policy in the action annotation library, and combining the extracted data type, entity and action into an internal state representation of a first combination corresponding to the sensitive behaviors in the target privacy policy. Alternatively, the sensitive behavior declaration extraction model may be a Bi-GRU-CRF model.

In the embodiment of the present application, the construction process of the data type annotation library, the entity annotation library and the action annotation library is described separately, where the process may include:

step S10, obtaining preset second training privacy policies.

Specifically, the embodiment of the application presets each second training privacy policy so as to construct a data type annotation library, an entity annotation library and an action annotation library based on each second training privacy policy.

Step S11, each sentence in each second training privacy policy is acquired.

Specifically, each second training privacy policy may be segmented according to the set punctuation marks, so as to obtain each sentence in each second training privacy policy. Alternatively, the punctuation marks may be set. ","; "? "and" ≡! "etc.

And step S12, word segmentation is carried out on each sentence, and a word segmentation result of each sentence is obtained.

Specifically, after each sentence in each second training privacy policy is obtained, word segmentation is performed on each sentence to obtain a word segmentation result of each sentence, so that a data type annotation library, an entity annotation library and an action annotation library can be constructed based on the word segmentation result of each sentence.

Step S13, searching words with data type attributes, words with entity attributes and words with action attributes in word segmentation results of each sentence.

Specifically, in the word segmentation result of each sentence, the words with the data type attribute, the words with the entity attribute and the words with the action attribute are searched, so that a data type annotation library, an entity annotation library and an action annotation library can be constructed based on the searched words.

And S14, constructing a data type annotation library based on the searched words with the data type attribute.

Specifically, each searched word with the data type attribute can be used as a data type annotation library. Optionally, word frequency of each searched word with data type attribute can be counted, and word with high word frequency and data type attribute is used as a data type label library.

And S15, constructing an entity annotation library based on the searched words with the entity attributes.

Specifically, each searched word with the entity attribute can be combined into an entity annotation library. Optionally, word frequencies of the searched words with the entity attributes can be counted, and words with high word frequencies and the entity attributes are combined into an entity annotation library.

And 16, constructing an action annotation library based on the searched words with the action attributes.

Specifically, each searched word with the action attribute can be combined into an action annotation library. Alternatively, word frequencies of the searched words with the action attribute can be counted, and words with high word frequencies and the action attribute are combined into an action annotation library.

Optionally, considering that in the actual analysis, it is not important that the entity performs the specific operation on the sensitive information, only it needs to determine whether the entity performs the corresponding operation on the sensitive information, based on this, the action corresponding to the sensitive behavior may be a preset first identifier or a preset second identifier, when the action of the sensitive behavior is the first identifier, the sensitive information corresponding to the sensitive behavior is indicated to flow to the entity corresponding to the sensitive behavior, and when the action corresponding to the sensitive behavior is the second identifier, the sensitive information corresponding to the sensitive behavior is indicated to not flow to the entity corresponding to the sensitive behavior. Alternatively, the first identifier may be 1 and the second identifier may be 0.

Optionally, when the statement of each sensitive behavior in the extracted target privacy policy is a first combination of a data type, an entity and an action corresponding to the sensitive behavior, the obtained target sensitive behavior data may be a second combination of a data type of sensitive information actually generated in the target application and an entity to which the sensitive information actually generated flows, based on which the process of obtaining each target sensitive behavior data actually generated by the target application may include:

and acquiring a second combination of the data type of the actually generated sensitive information and the entity of the actually generated sensitive information flow direction in the target application by adopting a preset static analysis algorithm or dynamic analysis algorithm, and taking the second combination as the actually generated sensitive behavior data of each target of the target application.

Specifically, the static analysis algorithm is an algorithm for estimating a sensitive information flow existing in an application program by performing static analysis on a structure, a variable, a function call, and the like of the application program without actually executing the program, and acquiring a data type of the sensitive information actually generated in the target application and an entity of the sensitive information flow actually generated, and the dynamic analysis algorithm is an algorithm for detecting transfer of the sensitive information by actually executing the program and monitoring a behavior of the program in an operation process, and acquiring a data type of the sensitive information actually generated in the target application and an entity of the sensitive information flow actually generated. The embodiment of the application presets a static analysis algorithm and a dynamic analysis algorithm, and can acquire the data type of the actually generated sensitive information in the target application and the second combination of the actually generated entities of the sensitive information flow direction through the static analysis algorithm or the dynamic analysis algorithm, and the second combination is used as the actually generated sensitive behavior data of each target of the target application.

Alternatively, the process of acquiring the second combination of the data type of the actually generated sensitive information and the entity of the actually generated sensitive information flow direction in the target application by using the static analysis algorithm may include:

and step S01, acquiring each API call in the target application.

In particular, an API call in an application refers to an application program that enables interactions with other software components, services, or platforms by calling an Application Programming Interface (API). And acquiring each API call in the target application so as to perform static analysis on each API call to obtain a second combination of the data type of the sensitive information actually generated in the target application and the entity of the sensitive information flow direction actually generated.

Step S02, obtaining a second combination of the data type of the actually generated sensitive information in the target application and the entity of the actually generated sensitive information flow direction based on each API call.

Specifically, static analysis may be performed on each API call to obtain a second combination of the data type of the sensitive information actually generated in the target application and the entity of the sensitive information flow actually generated.

Optionally, the process of acquiring the second combination of the data type of the actually generated sensitive information and the entity of the actually generated sensitive information flow direction in the target application based on each API call in the step S02 may include:

Step S021, determining a preset sensitive information source list and a sensitive information sink list, wherein the sensitive information source list comprises preset data types of all sensitive information, and the sensitive information sink list comprises preset entities of all sensitive information flows.

Specifically, the embodiment of the application presets a sensitive information source list comprising preset data types of each sensitive information and a sensitive information sink list comprising preset entities of each sensitive information flow direction, so as to determine the data types of the sensitive information actually generated in the target application in the sensitive information source list, and determine the entities of the sensitive information flow direction actually generated in the target application in the sensitive information sink list.

Step S022, iteratively executing the steps of determining the data type matched in the sensitive information source list and the entity matched in the sensitive information sink list by any API call, and determining the combination of the matched data type and entity as the second combination of the data type of the sensitive information actually generated in the target application and the entity of the actually generated sensitive information flow direction when the matched data type and entity are not determined, until the matched data type and entity of all API calls are determined.

Specifically, considering that there may be multiple sensitive paths from the same data type of sensitive information to the same entity in one application, when analyzing whether the privacy policy is consistent with the sensitive behavior, it is only necessary to determine whether the fact that the sensitive information flows to the entity is true, and regardless of the number of sensitive paths, when the data type and entity matched by the API call are not determined, the combination of the matched data type and entity is determined as the second combination of the data type of the sensitive information actually generated in the target application and the entity that the sensitive information actually generated flows to.

Optionally, a state matrix corresponding to the data type and the entity may be set, and whether the data type and the entity matched by the API call have been determined or not may be determined through the state matrix. Referring to fig. 2, the rows of the state matrix represent data types, the columns of the state matrix represent entities, the data types are A, B, C, the entities are D, E, F, when the data type and the entity matched by each API call are not determined, the elements in the state matrix are all set to 0, if the data type matched by a certain API call is determined to be a, the matched entity is D, since the elements corresponding to a and D are 0, it is indicated that a and D are not determined, the elements corresponding to a and D are combined to form a second combination, and the elements corresponding to a and D are set to 1, it is determined that the data type matched by a certain API call is a again, and when the matched entity is D, the elements corresponding to a and D are 1, so that a and D are not combined to form the second combination at this time.

In the embodiment of the application, when the data type and the entity matched with the API call are not determined, the combination of the matched data type and entity is determined as the second combination of the data type of the actually generated sensitive information and the entity of the actually generated sensitive information flow direction in the target application, so that the problem of long time consumption of the second combination of the entity of the actually generated sensitive information and the actually generated sensitive information flow direction in the target application caused by repeated combination of the same second combination is avoided.

Optionally, considering that at the code layer, the same type of sensitive information may be acquired by multiple API calls, for example, the API call for acquiring the location information may have an android location for acquiring latitude information of the current location of the user. GetLatitude (), and the information obtained by the API calls belong to the position information, and the description granularity of the sensitive information in the privacy policy is coarse, such as the position information, the device information and the address list information, the obtained specific position information cannot be described, so that in order to match the expression of the privacy policy, the time for judging whether the target sensitive behavior data meets the consistency condition and the inconsistency condition is saved, and the sensitive information source list sources including the position information, the calendar information, the network information, the contact information, the account information, the database information, the mobile phone information, the file information, the popup window information and the like, and the sensitive information sink list sink including the telephone connection, the network connection, the log, the file and the like can be preset.

In the embodiment of the application, the sensitive information source list can comprise position information, calendar information, network information, contact person information, account information, database information, mobile phone information, file information, popup window information and the like, and is matched with the description granularity of the sensitive information in the privacy policy, so that the time for judging whether the target sensitive behavior data meets the consistent condition and the inconsistent condition is saved.

Alternatively, the above step S022 may be implemented using a modified IFDS (InterproceduralFinite Distributive Subset) algorithm, wherein the IFDS algorithm is expressed as a five-tuple (G ^# ,D,F,M,) Directed graph G, referred to herein as a hypergraph (hypergraph) ^# ＝(N ^# ,E ^# ) A collection of inter-process control flow graphs (InterproceduralControlFlow Graph, ICFG) of programs, each G _i ∈G ^# An ICFG, G representing the program _main Hypergraph representing the entry function as main, for each G _i All have a unique starting point s _p And a unique endpoint e _p The Call node and the return node between the processes are respectively expressed as Call _p And Ret _p D represents a finite set of data stream facts,representing a set of data flow distribution functions, M representing E ^# F from hypergraph G ^# Is provided for the mapping of edges to data flow functions,representing satisfaction of operator- >And performs an intersection or union operation. The modified IFDS algorithm is the following code language fragments: />

/>

The algorithm firstly carries out category mapping on the input SourceList and the input sink list according to the sensitive information source list sources and the sensitive information sink list sinks, establishes mapping libraries SourceMap and sink map, and starts analysis if the current API call is in the mapping library. In addition, a state matrix state is added, which means that if a sensitive path from sourceCategory to sink Category is extracted, the value of the corresponding position in the matrix is set to 1, otherwise, the value is set to 0, the corresponding source in sourceList is deleted, and in the next cycle judgment, if the current sensitive path exists, the current sensitive path is skipped directly. Wherein sourceCategory is the data type in the second combination and SinkCategory is the entity in the second combination.

Optionally, when the statement of each sensitive behavior in the extracted target privacy policy is a first combination of a data type, an entity and an action corresponding to the sensitive behavior, and the obtained target sensitive behavior data is a second combination of a data type of sensitive information actually generated in the target application and an entity to which the sensitive information actually generated flows, in order to determine whether the second combination meets the inconsistent condition and the consistent condition based on the first combination corresponding to the target privacy policy, before determining whether the second combination meets the inconsistent condition and the consistent condition based on the first combination, determining that the target privacy policy associated with the second combination corresponds to the first combination so as to determine whether the second combination meets the inconsistent condition and the consistent condition based on the first combination associated with the second combination, a process of determining that the target privacy policy associated with the second combination corresponds to the first combination may include:

First combination of privacy policy pair for each target:

step S001, calculating semantic similarity between the data type in the first combination and the data type in the second combination.

Step S002, calculating semantic similarity between the entity in the first combination and the entity in the second combination.

Specifically, to determine whether the first combination is associated with the second combination, it may be determined by comparing the semantic similarity between the data types and the entities in the first combination with the second combination, so that the semantic similarity between the data types in the first combination and the second combination and the semantic similarity between the entities are calculated.

Step S003, determining the first combination as a combination associated with the second combination when the semantic similarity between the data type in the first combination and the data type in the second combination is greater than 0 and the semantic similarity between the entity in the first combination and the entity in the second combination is greater than 0.

Specifically, when the semantic similarity between the data type in the first combination and the data type in the second combination is greater than 0 and the semantic similarity between the entity in the first combination and the entity in the second combination is greater than 0, the first combination is indicated to be associated with the second combination.

Based on this, the absence of declaration of the sensitive behavior to the target in the above-mentioned target privacy policy may be specifically: the first combinations associated with the second combinations do not exist in the first combinations corresponding to the target privacy policy.

Specifically, when the first combination associated with the second combination does not exist in each first combination corresponding to the target privacy policy, the first combination that the semantic similarity between the data type and the data type in the second combination is greater than 0 and the semantic similarity between the entity and the entity in the second combination is greater than 0 is indicated, and the statement of the target sensitive behavior does not exist in the target privacy policy.

The statement of the target sensitive behavior in the target privacy policy indicates that the target sensitive behavior is not performed may be: for a first combination corresponding to each target privacy policy associated with the second combination, the actions in the first combination are all 0.

Specifically, for the first combination corresponding to each target privacy policy associated with the second combination, the actions in the first combination are all 0, and 0 indicates that the sensitive information corresponding to the sensitive behavior does not flow to the entity of the sensitive behavior, and the statement of the sensitive behavior to the target in the target privacy policy indicates that the sensitive behavior to the target is not performed. For example, the second combination is (a, position information), and the first combination associated therewith is (a, position information, 0).

The above-mentioned target privacy policy may specifically include that the declaration indicating that the target sensitive behavior is performed or that the declaration indicating that the target sensitive behavior is not performed is: in the first combinations corresponding to the target privacy policy associated with the second combinations, there are both the first combinations including the action of 1 and the first combinations including the action of 0.

Specifically, in the first combination corresponding to the target privacy policy associated with the second combination, both the first combination including the action as 1 and the first combination including the action as 0 exist, and 1 indicates that the sensitive information corresponding to the sensitive behavior flows to the entity of the sensitive behavior, and 0 indicates that the sensitive information corresponding to the sensitive behavior does not flow to the entity of the sensitive behavior, which indicates that the target privacy policy has both the statement of performing the target sensitive behavior and the statement of not performing the target sensitive behavior. For example, the second combination is (third party a, location information), and the first combination associated therewith is (third party a, location information, 1) and (third party a, location information, 0).

The statement of the target sensitive behavior in the target privacy policy may specifically be that the target sensitive behavior is performed: in a first combination corresponding to a target privacy policy associated with a second combination, the data type in the first combination is consistent with the data type in the second combination, and the entity in the first combination is consistent with the entity in the second combination, and the action in the first combination is 1.

Specifically, in a first combination corresponding to the target privacy policy associated with the second combination, the data type in the first combination is consistent with the data type in the second combination, and the entity in the first combination is consistent with the entity in the second combination, and the action in the first combination is 1, which indicates that the statement of the target sensitive behavior in the target privacy policy indicates that the target sensitive behavior is performed. For example, the second combination is (third party a, location information) and the first combination associated therewith is (third party a, location information, 1).

The statement of the sensitive behavior to be performed in the above-mentioned target privacy policy can cover that the target sensitive behavior specifically may be: for a first combination corresponding to a target privacy policy associated with a second combination, the data type in the first combination belongs to the data type in the first combination and the entity in the second combination belongs to the entity in the first combination, or the data type in the second combination belongs to the data type in the first combination and the entity in the second combination is consistent with the entity in the first combination, or the data type in the second combination corresponds to the data type in the first combination and the entity in the second combination belongs to the entity in the first combination, and the action in the first combination is 1.

Specifically, for a first combination corresponding to a target privacy policy associated with a second combination, when the data type in the first combination belongs to the data type in the first combination and the entity in the second combination belongs to the entity in the first combination, and the action in the first combination is 1, or when the data type in the first combination belongs to the data type in the first combination and the entity in the second combination corresponds to the entity in the first combination, and the action in the first combination is 1, or when the data type in the first combination corresponds to the data type in the first combination and the entity in the second combination belongs to the entity in the first combination, and the action in the first combination is 1, the statement of the sensitive action to be performed in the target privacy policy can encompass the target sensitive action. For example, the second combination is (third party a, location information), and the first combination associated therewith is (third party, location information, 1), with third party a belonging to the third party.

Optionally, it may be determined whether the semantic similarity between the data type in the first combination and the data type in the second combination and the semantic similarity between the entity in the first combination and the entity in the second combination are both equal to 1, and if so, it is determined that the data type in the first combination is consistent with the data type in the second combination, and the entity in the first combination is consistent with the entity in the second combination. It may be determined whether the semantic similarity between the data type in the second combination and the data type in the first combination is between 0 and 1, and if so, it is determined that the data type in the second combination belongs to the data type in the first combination. It may be determined whether the semantic similarity between the entity in the second combination and the entity in the first combination is between 0 and 1, and if so, it is determined that the entity in the second combination belongs to the entity in the first combination.

Considering that the semantic similarity between the solved data types and the semantic similarity between the entities can be converted into a single-source shortest path problem, the data types and the entities are used as nodes to construct a directed graph, and the semantic similarity between the nodes is calculated based on the distance between the nodes in the directed graph, in this embodiment of the present application, the process of calculating the semantic similarity between the data types in the first combination and the data types in the second combination in the step S001 and the process of calculating the semantic similarity between the entities in the first combination and the entities in the second combination in the step S002 are respectively described as follows:

the process of calculating the semantic similarity between the data types in the first combination and the second combination in the above step S001 may include:

step S0011, determining a first dependency between the data types in the first combination and the data types in the second combination.

Specifically, the semantic similarity between the data types in the first combination and the data types in the second combination is calculated, the data types in the first combination and the data types in the second combination are required to be used as nodes to construct a directed graph, and edges between two nodes in the directed graph represent the existence of the dependency relationship between the two nodes, so that a first dependency relationship between the data types in the first combination and the data types in the second combination is required to be determined, and the directed graph is constructed based on the first dependency relationship.

Step S0012, based on a preset source node t and a first subordinate relation, constructing a first directed graph by taking the data type in the first combination and the data type in the second combination as nodes. Wherein nodes corresponding to the data types in the first combination and the second combination belong to a source node t.

Specifically, the embodiment of the application presets a source node t, based on the source node t and a first subordinate relation, the data types in the first combination and the second combination are taken as nodes, a first directed graph is constructed, the nodes corresponding to the data types in the first combination and the second combination belong to the source node t, and edges between the nodes in the first directed graph represent subordinate relations between the nodes.

Step S0013, calculating a semantic similarity sim (u 1, v 1) between the data types in the first combination and the second combination based on the following formula:

where Δ (u 1, t) is the pre-computed distance between the node u1 corresponding to the data type in the first combination in the first directed graph and the source node t, and Δ (v 1, t) is the pre-computed distance between the node v1 corresponding to the data type in the second combination in the first directed graph and the source node t.

Specifically, Δ (u 1, t) and Δ (v 1, t) may be calculated based on a directed graph shortest path calculation algorithm.

The process of calculating the semantic similarity between the entity in the first combination and the entity in the second combination in step S002 may include:

step S0021, determining a second affiliation between the entities in the first combination and the entities in the second combination.

Specifically, calculating the semantic similarity between the entity in the first combination and the entity in the second combination requires that the entity in the first combination and the entity in the second combination be used as nodes to construct a directed graph, and edges between two nodes in the directed graph represent that there is a dependency relationship between the two nodes, so that a second dependency relationship between the entity in the first combination and the entity in the second combination is required to be determined, so that the directed graph is constructed based on the second dependency relationship.

Step S0022, based on the source node t and the second subordinate relation, the entity in the first combination and the entity in the second combination are used as nodes, and a second directed graph is constructed. Wherein, the nodes corresponding to the entities in the first combination and the second combination belong to the source node t.

Specifically, based on the source node t and the second subordinate relation, the entities in the first combination and the entities in the second combination are taken as nodes to construct a second directed graph, and the nodes corresponding to the entities in the first combination and the entities in the second combination belong to the source node t, and edges between the nodes in the second directed graph represent that subordinate relations exist between the nodes.

Step S0023, calculating a semantic similarity sim (u 2, v 2) between the entities in the first combination and the entities in the second combination based on the following formula:

where Δ (u 2, t) is the pre-computed distance between the source node t and the node u2 corresponding to the entity in the first combination in the second directed graph, and Δ (v 2, t) is the pre-computed distance between the source node t and the node v2 corresponding to the entity in the second combination in the second directed graph.

Specifically, Δ (u 2, t) and Δ (v 2, t) may be calculated based on a directed graph shortest path calculation algorithm.

Referring to fig. 3, fig. 3 is a process schematic diagram of a method for analyzing consistency of sensitive behavior and privacy policy of an intelligent terminal application, acquiring APK (Android Application Package) files, privacy policy documents and third party information sharing lists of the application from an application store, extracting data types and entities matched with each API call in an APK file based on an improved IFDS algorithm, combining the data types and the entities into a second combination as a sensitive behavior binary group, extracting a first combination including the data types, the entities and actions corresponding to each sensitive behavior in the privacy policy documents and the third party information sharing list by using a Bi-GRU-CRF model, and performing consistency analysis as a privacy policy ternary group:

The case of coincidence is divided into:

explicit expression: in the privacy policy triples associated with the sensitive behavior triples, the data types in the privacy policy triples are consistent with the data types in the sensitive behavior triples, and the entities in the privacy policy triples are consistent with the entities in the sensitive behavior triples, and the action in the privacy policy triples is 1.

Fuzzy expression: for a privacy policy triplet associated with a sensitive behavior triplet, the data type in the privacy policy triplet belongs to the data type in the privacy policy triplet and the entity in the sensitive behavior triplet belongs to the entity in the privacy policy triplet, or the data type in the sensitive behavior triplet belongs to the data type in the privacy policy triplet and the entity in the sensitive behavior triplet is consistent with the entity in the privacy policy triplet, or the data type in the sensitive behavior triplet is consistent with the data type in the privacy policy triplet and the entity in the sensitive behavior triplet belongs to the entity in the privacy policy triplet, and the action in the privacy policy triplet is 1.

The case of inconsistency is classified as:

the omitted expression: there are no privacy policy triples in each privacy policy triplet that are associated with sensitive behavior triples.

Incorrect expression: for each privacy policy triplet associated with a sensitive behavior triplet, the action in the privacy policy triplet is 0.

Ambiguous expressions: among the privacy policy triples associated with the sensitive behavior triples, there are both privacy policy triples including action 1 and privacy policy triples including action 0.

The intelligent terminal application sensitive behavior and privacy policy consistency analysis device provided by the embodiment of the application is described below, and the intelligent terminal application sensitive behavior and privacy policy consistency analysis device described below and the intelligent terminal application sensitive behavior and privacy policy consistency analysis method described above can be correspondingly referred to each other.

First, referring to fig. 4, an apparatus for analyzing the consistency of the sensitive behavior and the privacy policy of the intelligent terminal application will be described, as shown in fig. 4, the apparatus for analyzing the consistency of the sensitive behavior and the privacy policy of the intelligent terminal application may include:

a policy and behavior data obtaining unit 10, configured to obtain a target privacy policy of a target application to be analyzed and target sensitive behavior data actually generated by the target application;

a first judging unit 20, configured to judge, for each of the target sensitive behavior data, whether any of the following set inconsistency conditions is satisfied:

if so, the following first conclusion and condition output unit 30 is executed;

the first conclusion and condition output unit 30 is configured to output a conclusion that the target privacy policy is inconsistent with the target sensitive behavior and an inconsistent condition satisfied by the target sensitive behavior data.

Optionally, the intelligent terminal application sensitive behavior and privacy policy consistency analysis device may further include:

the second judging unit is used for judging whether any one of the following set consistent conditions is met for each target sensitive behavior data:

If yes, executing the following steps of the second conclusion and condition output unit;

and the second conclusion and condition output unit is used for outputting the conclusion that the target privacy policy is consistent with the target sensitive behavior and the consistent condition met by the target sensitive behavior data.

the statement extraction unit is used for extracting statements of sensitive behaviors in the target privacy policy;

based on this, the process of the first judging unit judging whether the target sensitive behavior data satisfies the inconsistency condition may include:

and judging whether the target sensitive behavior data meets inconsistent conditions or not based on statement of each sensitive behavior in the target privacy policy.

The process of the second judging unit judging whether the target sensitive behavior data meets the consistency condition may include:

and judging whether the target sensitive behavior data meets a consistent condition or not based on the statement of each sensitive behavior in the target privacy policy.

Optionally, the process of extracting the claims of each sensitive behavior in the target privacy policy by the claim extracting unit may include:

Optionally, the statement of sensitive behavior in the extracted privacy policy specifically includes: the method comprises the steps of a data type corresponding to the sensitive behavior, an entity and a first combination of actions, wherein the data type corresponding to the sensitive behavior is the type of sensitive information related to the sensitive behavior, the entity corresponding to the sensitive behavior is the entity in which the related sensitive information flows, and the action corresponding to the sensitive behavior indicates that the sensitive information corresponding to the sensitive behavior flows to the entity corresponding to the sensitive behavior or the sensitive information corresponding to the sensitive behavior does not flow to the entity corresponding to the sensitive behavior.

Optionally, the action corresponding to the sensitive behavior is a preset first identifier or a preset second identifier, when the action corresponding to the sensitive behavior is the first identifier, the sensitive information corresponding to the sensitive behavior is indicated to flow to the entity corresponding to the sensitive behavior, and when the action corresponding to the sensitive behavior is the second identifier, the sensitive information corresponding to the sensitive behavior is indicated to not flow to the entity corresponding to the sensitive behavior.

the key information extraction unit is used for extracting key information about the first training sensitive behavior in the first training privacy policy;

the data type extracting unit is used for extracting the data type matched with the key information from a preset data type annotation library;

the entity extraction unit is used for extracting the entity matched with the key information from a preset entity annotation library;

the action extraction unit is used for extracting actions matched with the key information from a preset action annotation library;

the first combination construction unit is used for constructing a first combination corresponding to the first training sensitive behavior based on the extracted data type, entity and action matched with the first training sensitive behavior.

the second training privacy policy acquisition unit is used for acquiring preset second training privacy policies;

the sentence acquisition unit is used for acquiring each sentence in each second training privacy policy;

The word segmentation result acquisition unit is used for segmenting each sentence to obtain a word segmentation result of each sentence;

the word searching unit is used for searching words with data type attributes, words with entity attributes and words with action attributes in word segmentation results of each sentence;

the data type annotation library construction unit is used for constructing the data type annotation library based on the searched words with the data type attribute;

the entity annotation library construction unit is used for constructing the entity annotation library based on the searched words with the entity attributes;

and the action annotation library construction unit is used for constructing the action annotation library based on the searched words with the action attributes.

Optionally, the process of the policy and behavior data obtaining unit obtaining each target sensitive behavior data actually generated by the target application may include:

Optionally, the process of the policy and behavior data obtaining unit obtaining the second combination of the data type of the sensitive information actually generated in the target application and the entity of the sensitive information flow actually generated by using a preset static analysis algorithm may include:

Acquiring each API call in the target application;

Optionally, the process of the policy and action data obtaining unit obtaining, based on each API call, the second combination of the data type of the sensitive information actually generated in the target application and the entity of the sensitive information flow actually generated may include:

Optionally, the statement that the target sensitive behavior does not exist in the target privacy policy is specifically: the first combination associated with the second combination does not exist in each first combination corresponding to the target privacy policy;

an associated first combination determination unit for:

a third judging unit, configured to judge whether the semantic similarity between the data type in the first combination and the data type in the second combination and the semantic similarity between the entity in the first combination and the entity in the second combination are both equal to 1;

if yes, executing the following steps of the first determining unit;

and the first determining unit is used for determining that the data type in the first combination is consistent with the data type in the second combination, and the entity in the first combination is consistent with the entity in the second combination.

a fourth judging unit configured to judge whether a semantic similarity between the data type in the second combination and the data type in the first combination is between 0 and 1;

If yes, executing the following steps of the second determining unit;

and a second determining unit, configured to determine that the data type in the second combination belongs to the data type in the first combination.

a fifth judging unit, configured to judge whether a semantic similarity between the entity in the second combination and the entity in the first combination is between 0 and 1;

if yes, executing the following steps of a third determining unit;

and a third determining unit, configured to determine that the entity in the second combination belongs to the entity in the first combination.

Optionally, the process of calculating the semantic similarity between the data types in the first combination and the second combination by the associated first combination determining unit may include:

wherein Δ (u 1, t) is a pre-calculated distance between the source node t and a node u1 corresponding to a data type in a first combination in the first directed graph, and Δ (v 1, t) is a pre-calculated distance between the source node t and a node v1 corresponding to a data type in the second combination in the first directed graph.

Optionally, the process of calculating the semantic similarity between the entity in the first combination and the entity in the second combination by the associated first combination determining unit may include:

The intelligent terminal application sensitive behavior and privacy policy consistency analysis device provided by the embodiment of the application can be applied to intelligent terminal application sensitive behavior and privacy policy consistency analysis equipment. Fig. 5 is a block diagram showing a hardware structure of an intelligent terminal application sensitive behavior and privacy policy consistency analysis device, and referring to fig. 5, the hardware structure of the intelligent terminal application sensitive behavior and privacy policy consistency analysis device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;

processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;

The memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;

wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to: and realizing each processing flow in the intelligent terminal application sensitive behavior and privacy policy consistency analysis scheme.

The embodiment of the present application also provides a storage medium storing a program adapted to be executed by a processor, the program being configured to: and realizing each processing flow in the intelligent terminal application sensitive behavior and privacy policy consistency analysis scheme.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The intelligent terminal application sensitive behavior and privacy policy consistency analysis method is characterized by comprising the following steps of:

2. The method as recited in claim 1, further comprising:

3. The method according to claim 2, wherein after the obtaining the target privacy policy of the target application to be analyzed and the target sensitive behavior data actually generated by the target application, the method further comprises:

Extracting declarations of sensitive behaviors in the target privacy policy;

4. The method of claim 3, wherein extracting claims of sensitive behavior in the target privacy policy comprises:

5. The method of claim 4, wherein the extracted claims of sensitive behavior in the privacy policy specifically include: the method comprises the steps of a data type corresponding to the sensitive behavior, an entity and a first combination of actions, wherein the data type corresponding to the sensitive behavior is the type of sensitive information related to the sensitive behavior, the entity corresponding to the sensitive behavior is the entity in which the related sensitive information flows, and the action corresponding to the sensitive behavior indicates that the sensitive information corresponding to the sensitive behavior flows to the entity corresponding to the sensitive behavior or the sensitive information corresponding to the sensitive behavior does not flow to the entity corresponding to the sensitive behavior.

6. The method of claim 5, wherein the action corresponding to the sensitive behavior is a preset first identifier or a preset second identifier, when the action corresponding to the sensitive behavior is the first identifier, the sensitive information corresponding to the sensitive behavior is indicated to flow to the entity corresponding to the sensitive behavior, and when the action corresponding to the sensitive behavior is the second identifier, the sensitive information corresponding to the sensitive behavior is indicated to not flow to the entity corresponding to the sensitive behavior.

7. The method of claim 5, wherein for each of the first training privacy policies, extracting a corresponding first combination of each of the first training sensitive behaviors in the first training privacy policies comprises:

8. The method of claim 7, wherein the process of constructing the data type annotation library, the entity annotation library, and the action annotation library comprises:

acquiring preset second training privacy policies;

acquiring each statement in each second training privacy policy;

9. The method of claim 6, wherein the process of obtaining each of the target sensitive behavior data actually generated by the target application comprises:

10. The method of claim 9, wherein the step of obtaining a second combination of the data type of the actually generated sensitive information and the entity of the actually generated sensitive information flow direction in the target application using a predetermined static analysis algorithm comprises:

acquiring each API call in the target application;

11. The method of claim 10, wherein the obtaining, based on each API call, a second combination of data types of the actually generated sensitive information in the target application and entities of the actually generated sensitive information flow direction comprises:

12. The method of claim 9, wherein the step of determining the position of the substrate comprises,

the statement that the target sensitive behavior does not exist in the target privacy policy is specifically: the first combination associated with the second combination does not exist in each first combination corresponding to the target privacy policy;

13. The method of claim 12, wherein the determining of the first combination corresponding to the target privacy policy associated with the second combination comprises:

14. The method of claim 13, wherein determining whether the data type in the first combination is consistent with the data type in the second combination and whether the entity in the first combination is consistent with the entity in the second combination comprises:

15. The method of claim 13, wherein determining whether the data type in the second combination belongs to the data type in the first combination comprises:

16. The method of claim 13, wherein the calculating the semantic similarity between the data types in the first combination and the second combination comprises:

17. An intelligent terminal application sensitive behavior and privacy policy consistency analysis device, which is characterized by comprising: