CN116795707A

CN116795707A - Software privacy compliance pre-detection method and related equipment thereof

Info

Publication number: CN116795707A
Application number: CN202310776690.9A
Authority: CN
Inventors: 苏媛媛
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2023-06-28
Filing date: 2023-06-28
Publication date: 2023-09-22

Abstract

The embodiment of the application belongs to the field of financial science and technology, is applied to the field of intelligent government affairs, and relates to a software privacy compliance pre-detection method and related equipment thereof, wherein the method comprises the steps of receiving privacy policies of standard privacy regulations, demand cards and APP, and constructing a privacy compliance training model based on the standard privacy regulations; splitting corresponding policy text from the privacy policy according to the demand card; consistency checking is carried out on the demand card and the policy text; and when the consistency check fails, inputting the requirement card into the privacy compliance training model to obtain an output risk value. Wherein the privacy compliance training model may be stored in a blockchain. The application can discover the privacy risk of the financial software to be released in advance.

Description

Software privacy compliance pre-detection method and related equipment thereof

Technical Field

The application relates to the field of financial science and technology, in particular to a software privacy compliance pre-detection method and related equipment thereof.

Background

With the implementation of data security laws and personal information protection related laws of various countries, the importance of various communities on personal information and personal sensitive information is increased, and privacy disclosure is also more concerned. As enterprises, APP under the name of the enterprises can provide services for users while collecting user information, and whether the enterprises can update and disclose acquired personal information in time between 'collection' and 'provision' becomes the focus of attention of various communities of supervision and society. Updating and disclosing the acquired personal information and focusing on privacy policy contents issued by the enterprise APP. If the APP adds a function, personal information such as location and mobile phone number is collected, but the privacy policy does not disclose the added function and details of the collected information in time, which may bring about supervision risks for enterprises.

The information disclosed by the APP privacy policy exceeds the information actually collected by the APP, belongs to the over-range collection personal information, and possibly influences customer experience and company reputation due to the fact that the APP is forced to be put down as a result of untimely rectification, and especially financial APP can be more involved in functions of pulling page payment and the like, so that required permission is more than that of general APP, and a plurality of legal requirements can be involved. Therefore, a comprehensive privacy detection mode for early warning detection of the APP before the APP is formally released is needed.

Disclosure of Invention

The embodiment of the application aims to provide a software privacy compliance pre-detection method and related equipment thereof, which can discover privacy risks existing in financial software to be released in advance.

In order to solve the technical problems, the embodiment of the application provides a software privacy compliance pre-detection method, which adopts the following technical scheme:

a software privacy compliance pre-detection method comprises the following steps:

receiving a standard privacy rule, a requirement card and a privacy policy of an APP, and constructing a privacy compliance training model based on the standard privacy rule;

splitting corresponding policy text from the privacy policy according to the demand card;

Consistency checking is carried out on the demand card and the policy text;

and if the consistency check is not passed, inputting the requirement card into the privacy compliance training model to obtain an output risk value.

Further, the step of receiving the privacy policy of the standard privacy rule, the requirement card and the APP, and constructing the privacy compliance training model based on the standard privacy rule includes:

acquiring a plurality of preset risk factors, wherein each risk factor comprises at least one risk scene, and splitting the standard privacy regulations according to the risk scenes to obtain target training texts corresponding to each risk scene;

performing feature extraction on the target training text based on a TF-IDF text feature extraction technology to obtain risk factors and risk parameters of the risk scene;

and constructing the privacy compliance training model based on the risk parameters.

Further, the step of inputting the requirement card into the privacy compliance training model to obtain the output risk value includes:

the privacy compliance training model takes a risk scene marked with the requirement card as a first target scene and takes a risk factor to which the first target scene belongs as a first target factor;

The privacy compliance training model determines risk parameters of the first target scene and risk parameters of the first target factor, calculates the risk parameters of the target scene and the risk parameters of the target factor, and outputs the risk value.

Further, the step of constructing the privacy compliance training model based on the risk parameters includes:

taking the risk scene as a label of a corresponding policy text, and training a classifier of an initial privacy compliance training model based on the policy text to obtain the privacy compliance training model;

the step of inputting the requirement card into the privacy compliance training model to obtain the output risk value comprises the following steps:

inputting the demand card into the privacy compliance training model, outputting a risk scene of the demand card by a classifier of the privacy compliance training model as a second target scene, and taking a risk factor of the second target scene as a second target factor;

and the privacy compliance training model determines and calculates risk parameters of the second target scene and risk parameters of the second target factor, and outputs the risk value.

Further, the step of performing consistency check on the requirement card and the policy text includes:

Matching the policy text with the standard privacy legislation;

if the matching fails, determining that the consistency check of the demand card and the policy text fails;

if the matching is successful, converting the demand card and the policy text into vectors respectively to obtain a demand vector and a policy vector, and performing cosine similarity calculation on the demand vector and the policy vector to obtain a cosine value;

if the cosine value is 1, determining that the consistency check of the demand card and the policy text passes;

and if the cosine value is not 1, determining that the consistency check of the demand card and the policy text is not passed.

Further, the step of converting the requirement card and the policy text into vectors respectively to obtain a requirement vector and a policy vector includes:

sequentially performing word segmentation operation on the demand card and the policy text to obtain a demand word segmentation text and a policy word segmentation text;

merging the required word segmentation text and the policy word segmentation text to obtain a merged word segmentation text;

respectively comparing whether the vocabulary of the required word segmentation text and the vocabulary of the policy word segmentation text exist in the merging word segmentation text;

And representing the existing vocabulary through word frequency, and representing the non-existing vocabulary through 0 to obtain the demand vector and the policy vector.

Further, the step of extracting features of the target training text based on TF-IDF text feature extraction technology to obtain risk factors and risk parameters of the risk scene includes:

extracting word frequencies of preset keywords from each target training text respectively, and calculating average values of word frequencies corresponding to the target training texts respectively to serve as risk parameters of risk scenes corresponding to the target training texts;

and respectively calculating IDFs of preset keywords of each target training text, and respectively calculating the average value of the IDFs corresponding to each target training text to serve as risk parameters of the risk factors corresponding to the target training texts.

In order to solve the technical problems, the embodiment of the application also provides a software privacy compliance pre-detection device, which adopts the following technical scheme:

a software privacy compliance pre-detection device, comprising:

the receiving module is used for receiving the standard privacy regulations, the requirement cards and the privacy policies of the APP, and constructing a privacy compliance training model based on the standard privacy regulations;

The splitting module is used for splitting the corresponding policy text from the privacy policy according to the demand card;

the checking module is used for checking consistency of the demand card and the policy text; and

and the output module is used for inputting the requirement card into the privacy compliance training model when the consistency check fails to obtain an output risk value.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

a computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the software privacy compliance pre-detection method described above.

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

a computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the software privacy compliance pre-detection method described above.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

according to the method, the pre-detection of the privacy compliance of the software is realized, and the early warning detection is carried out before the APP is released, particularly before the financial APP is released, so that the privacy risk of the APP to be released can be found in advance; the privacy compliance training model is built through standard privacy regulations and used for outputting risk values of the demand cards, determining privacy risk conditions of the demand cards, finding problems in advance and reducing APP illegal putting-down risks; through consistency check on the demand card and the policy text, whether the demand card is consistent with the policy text or not is determined, and whether the APP to be issued is consistent with the privacy policy of the software or not is determined, wherein the standard privacy rules comprise standard privacy rules in the financial field or not; therefore, the privacy policy can be conveniently updated in real time and accurately, and the situation that the software is required to be modified or put on shelf after being released is avoided.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a software privacy compliance pre-detection method in accordance with the present application;

FIG. 3 is a schematic diagram of one embodiment of a software privacy compliance pre-detection apparatus in accordance with the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. the software privacy compliance pre-detection device; 301. a receiving module; 302. splitting the module; 303. an inspection module; 304. and an output module.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the software privacy compliance pre-detection method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the software privacy compliance pre-detection device is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a software privacy compliance pre-detection method in accordance with the present application is shown. The software privacy compliance pre-detection method comprises the following steps:

S1: receiving a standard privacy rule, a requirement card and a privacy policy of an APP, and constructing a privacy compliance training model based on the standard privacy rule;

s2: splitting a corresponding policy text from the privacy policy according to the demand card, wherein the demand card is associated with a preset risk scene in a one-to-one correspondence manner;

s3: consistency checking is carried out on the demand card and the policy text;

s4: and if the consistency check is not passed, inputting the requirement card into the privacy compliance training model to obtain an output risk value.

In this embodiment, the electronic device (e.g., the server/terminal device shown in fig. 1) on which the software privacy compliance pre-detection method operates may receive the privacy policies of the standard privacy regulations, the requirement cards and the APP through a wired connection or a wireless connection. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connections, wiFi connections, bluetooth connections, wiMAX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.

In this embodiment, if the consistency check is passed, a prompt indicating that the consistency check is passed is displayed on the front page. The method uses a privacy compliance training model and a consistency check model to perform front detection of privacy policy and requirements (the requirements are contained in a requirement card) on the APP to be released, intervenes before the APP is released outside, and detects privacy policy risk items. The privacy compliance training model and the consistency check model are NLP natural language processing models. The standard privacy regulations of the application comprise legal regulations of APP related to the financial field, in particular comprising legal regulations related to functions such as page skip payment and the like. The APP of the present application may include shopping APP, insurance application APP, financial loan APP, and APPs of various large financial institutions (e.g., banks, etc.). The application provides an online front-end detection mode for the APP, which can effectively reduce the risk of the APP being taken off the rack.

According to the method, the pre-detection of the privacy compliance of the software is realized, the early warning detection is carried out before the APP is released, and the privacy risk of the APP to be released can be found in advance; the privacy compliance training model is built through standard privacy regulations and used for outputting risk values of the demand cards, determining privacy risk conditions of the demand cards, finding problems in advance and reducing APP illegal putting-down risks; through carrying out the consistency check to demand card and policy text to confirm whether agree with between demand card and the policy text, and then confirm whether the APP that waits to issue is about to be agreed with according to the privacy policy of information and software that the demand card obtained, thereby can be convenient for privacy policy's real-time and accurate update, avoid the software to be required to be rectified or the condition of putting down a frame after issuing.

As an optional aspect of the present application, the step of receiving a privacy policy of a standard privacy rule, a demand card and an APP, and constructing a privacy compliance training model based on the standard privacy rule includes:

In this embodiment, the present application presets risk factors for class 4 privacy compliance: the method is characterized by comprising the steps of providing a network message, carrying out personal protection influence evaluation, agreeing with a partner and establishing an internal protection mechanism. Presetting a next-stage risk scene of each risk factor: the risk scene under the report backup of the network information is the personal information outbound and data entrusting treatment; the risk scene under personal protection influence evaluation is an automatic decision and sensitive personal information is processed; the risk scene under the contract of the partner is data desensitization and data encryption. The risk scene under the internal protection mechanism needs to be established is as follows: and (5) safety education training and data classification management.

A privacy compliance training model based on regulations of a corresponding target country (e.g., china, uk, etc.) is established, which is a pre-stored data security and personal information protection related regulation of the target country, for example, personal information protection law, data security law, etc. Splitting the standard privacy regulations according to the risk scenes to obtain target training texts corresponding to each risk scene.

The step of splitting the standard privacy regulations according to the risk scenes to obtain target training texts corresponding to each risk scene comprises the following steps:

matching the function tag with the function catalog of the privacy regulation;

taking the text under the successfully matched function directory as an initial training text of the corresponding risk scene;

and preprocessing the initial training text to obtain the target training text.

In this embodiment, the risk scenario carries a functional tag, e.g., an outbound; for the specificity of the standard privacy regulations, the texts of the standard privacy regulations have directories, and the titles in the directories are related text topics, namely function directories, such as an outbound related specification, a transaction related specification and the like. By matching the function tag with the function catalog of the standard privacy regulations, the required text can be obtained quickly. The matching process is as follows: firstly, removing stop words of the function catalogue, such as: correlation, etc., this, that; and calculating cosine similarity between the function catalogue and the function label after the stop words are removed, and if the cosine similarity is greater than or equal to a threshold value, determining that the matching is successful. The function tag is preset and accurate, so that the operation of removing the stop word can not be executed.

Specifically, the step of preprocessing the initial training text to obtain the target training text includes:

removing punctuation marks and stop words of the initial training text, and identifying whether the language of the initial training text is a preset language to be restored or not;

if yes, part-of-speech recognition is carried out on each word of the initial training text, and part-of-speech is obtained;

and restoring the words of the initial training text into an original form according to the parts of speech.

In this embodiment, the preprocessing operation is performed on the initial training text, so as to obtain the target training text. The preprocessing operation includes removing punctuation marks, removing stop words, such as: this, that, the like; wherein, the English text belongs to the language to be restored, and for the English text, all words are converted into lowercase; for English text, performing word shape reduction operation: judging the part of speech of the word, and reducing the part of speech into an original word.

As an optional aspect of the present application, the step of splitting the corresponding policy text from the privacy policy according to the requirement card includes:

judging the function label to which the required card belongs to obtain a target label;

matching the target tag with a function catalog of the privacy policy;

And taking the text under the function directory of the privacy policy successfully matched as the policy text corresponding to the requirement card.

In this embodiment, the step of determining the functional label to which the demand card belongs, and obtaining the target label includes: and carrying out keyword detection operation on the demand card, and taking the function label corresponding to the target keyword as the target label if the target keyword is detected in the demand card. Front-end detection privacy policy and demand card: in order to ensure that the privacy policy and app are issued outwards, internal detection is realized as early as possible, a demand card is generated when the demand is put forward, and the card code is Q, wherein the normal business demand is a sentence demand; when filling in the requirement, the risk scene needs to be selected by oneself; the associated demand card code of the checked risk scene is recorded as Q. And extracting the latest version of the demand card from the demand library, and simultaneously pulling the privacy policy of the same version as the demand card from the privacy policy library. If there are multiple versions, the latest version is detected. And splitting the policy text corresponding to the demand card from the privacy policy, wherein the code is P.

As an optional aspect of the present application, the step of extracting features of the target training text based on TF-IDF text feature extraction technology to obtain risk factors and risk parameters of the risk scene includes:

The IDF is calculated by the following formula:

IDF _i ＝log(|D|/N _i ) Wherein D represents the total number of target training texts, and Ni represents the number of target training texts in which keywords appear.

In this embodiment, the present application uses TF-IDF text feature extraction to train an initial privacy compliance training model, first, IF and IDF of a training text set need to be obtained, and the present application uses two parameters of TF and IDF to represent the importance of words in a target training text. Wherein TF represents the importance of the words in the target training texts, and IDF represents the importance of the words in the target training texts. For example, in a financial scenario, preset keywords mostly include: the method comprises the steps that a tax number, an order, commodities, financial accounting and the like are extracted from a certain target training text, wherein the word frequency of the tax number is 40, the word frequency of the order is 30, the word frequency of the commodities is 20 and the word frequency of the financial accounting is 10; and calculating the average value of the word frequencies to be 25, and setting the risk parameter of the risk scene corresponding to the target training text to be 25. Further, in a certain target training text, if the IDF of the nano tax number is 2, the IDF of the order is 4, the IDF of the commodity is 3, and the IDF of the financial is 7, the risk parameter of the risk factor corresponding to the target training text is the average value 4 of the IDFs.

IDF means that if a word appears in a very small number of target training texts, the word has strong distinction to the target training texts, the corresponding feature value is high, the IDF value is high, and the IDF _i ＝log(|D|/N _i ) D refers to the total number of target training texts, ni refers to the number of target training texts in which the keyword i appears, and it is apparent that the smaller Ni is, the larger the value of IDF is. Identification of risk parameters k1, k2, k3, k4 of class 4 risk factors by IDF, in business practice, k1 personally informations appear in very few texts, so this windRisk factors are at a maximum. TF is word frequency: the feature value is represented by the number of occurrence times like a BOW model, that is, the more the number of occurrence times of the words in the text, the greater the weight of the feature value. Determining risk parameters of 8 risk scenes by word frequency: a1, a2, b1, b2, c1, c2, d1, d2. Therefore, according to the TF and IDF results, the risk parameters corresponding to each risk factor and each risk scene are determined, and the risk factors and the risk scenes are ranked from high to low. The ordering is exemplified as follows:

The privacy compliance risk factors are 4 kinds in total, and the privacy compliance risk factors are sequentially from high to low: the method comprises the steps of (k 1) providing a report to a network message, (k 2) performing personal protection influence evaluation, (k 3) agreeing with a partner, and (k 4) establishing an internal protection mechanism. The next level of risk scenes of the risk factors are also arranged from high to low in risk: the k1 risk factor is de risk scene sequentially: personal information output (a 1), data request processing (a 2). The risk scenes under the k2 risk factors are in turn: automated decision (b 1), processing sensitive personal information (b 2). The risk scenes under the k3 risk factors are as follows: data desensitization (c 1), data encryption (c 2). The risk scenes under the k4 risk factors are as follows: security education training (d 1), data classification management (d 2).

As an optional aspect of the application, the step of performing consistency check on the requirement card and the policy text includes:

matching the policy text with the standard privacy legislation;

In this embodiment, when the consistency check fails, the related personnel send a notification that the APP is not allowed to issue externally. The consistency check process is completed by a preset consistency check model, and the consistency check model is an NLP natural language processing model. The check logic of the consistency check model is as follows: 1) The code number of the policy text is P, the code number of the demand card is Q, the values of the demand card Q and the policy text P are defaulted to be 1, and when Q=P=1, the consistency check passes; when (Q is not equal to P) or (q=p is not equal to 1), the consistency check is not passed. 2) Calculating a risk value of the policy text P: matching P with a standard privacy rule, if P is successfully matched with the standard privacy rule, describing that corresponding terms (e.g. outbound terms, payment terms) are described in P, then the risk value of P is 1, describing that P is risk-free, and marking the risk level of the policy text as risk-free; and (3) carrying out cosine similarity calculation on the demand vector corresponding to the P and the Q and the policy vector, and if the cosine value is 1, describing that the P=Q=1, and determining that the consistency check passes. If the cosine value is not 1, the consistency check is determined to not pass. The requirement card Q triggers privacy risk, and then a risk value is calculated by combining the privacy risk model; for example, the demand card Q relates to data outbound, triggering an a1 outbound scene under the k1 risk factor, and the risk level of Q is: q=k1a1. 3) If the matching of P and the standard privacy rule fails, and the corresponding term is not described in P, it is directly determined that the consistency check fails, and at this time, the risk value of P is the risk value of the corresponding requirement card Q, for example, the risk value of P is p=k1a1, and the risk value of Q corresponding to the same risk scene as P is p=k1a1.

The step of converting the demand card and the policy text into vectors respectively to obtain demand vectors and policy vectors comprises the following steps:

In this embodiment, the consistency check is determined by using a vector alignment method, and the default requirement card Q has a vector similar to the vector of the policy text P, and an initial value of 1 is set. Since the scene representation lengths (i.e., vectors) of the policy texts are different in practical applications, it is necessary to process the vector of the demand card Q and the vector of the policy text P. The main modes used at present are: the unimportant words in the required card Q and the policy text P are removed, so that the lengths of the two vectors are kept consistent, and at present, some keywords are set mainly by experience to process, but the accuracy of the keywords cannot be guaranteed. The application adopts the following modes: the required word segmentation text and the policy word segmentation text are subjected to merging operation to obtain a merging word segmentation text, whether the vocabulary of the original required word segmentation text and the vocabulary of the policy word segmentation text exist in the merging word segmentation text or not is judged, if yes, the vocabulary is represented by the word frequency of the vocabulary, if not, the vocabulary is represented by 0, and the text about personal financial information is exemplified as follows:

Text 1I/want/personal financial information out of the way-

Text 2I/want/personal financial information/outbound-

Vector I/want/personal financial information outbound/outbound +.

P1＝(1,1,1,0)

Q1＝(1,1,1,1)

The "/" is an IK word segmentation and an intelligent segmented spacer, and the merged word segmentation text is shown as a Vector, and the aligned policy Vector and the aligned demand Vector are P1 and Q1 respectively. And then determining consistency according to cosine values of the two vectors. The smaller the cosine value is, the more vertical the 2 vectors are, the closer the cosine value is to 1, the more parallel the vectors are, and the similarity is high. The policy vector and the demand vector are determined through the merging mode to calculate, so that the accuracy is high.

As an optional aspect of the present application, the step of inputting the requirement card into the privacy compliance training model to obtain the output risk value includes:

In the embodiment, the required card marks the risk scene, detection and judgment on the required card are not needed to determine the corresponding risk scene, and efficiency is effectively improved. And obtaining the risk value of the demand card according to the privacy compliance training model. The expression of the risk value=risk parameter of risk factor x risk parameter of risk scene output by the privacy compliance training model, namely the characteristic value (risk value) of TF-IDF of the privacy compliance training model is: TF-IDF (i, j) =tf _i ×IDF _j Risk value. The output risk values include k1a1, k1a2, k2b1, k2b2, k3c1, k3c2, k4d1, k4d2. Wherein the smaller the number output, the higher the risk.

Furthermore, the step of constructing the privacy compliance training model based on the risk parameters includes:

In this embodiment, the demand card of this embodiment is not marked with a risk scene. The original preset risk scene is used as a label of a corresponding policy text, a classifier of an initial privacy compliance training model is trained, and the privacy compliance training model with classification capacity is obtained, wherein the classifier selects Naive Bayes (Naive Bayes algorithm), the classifier is suitable for being used on text samples, the Naive Bayes principle is adopted to assume that the samples are mutually independent, and based on independent assumption, probability calculation is greatly simplified, and memory and time are saved. In practical application, the requirement card is input into the privacy compliance training model, and the category of the requirement card, namely the affiliated risk scene, is output as the second target scene. The above is described: the risk factors comprise at least one risk scene, so that the risk scene has the risk factor to which the risk scene belongs, and a second target factor is determined.

The software development lifecycle sequentially includes: demand proposition, demand design, code writing, vulnerability detection and online. The application optimizes the links according to the requirements. The method comprises the steps of presetting a demand library and a privacy policy library, and a privacy compliance training model and a consistency check model. Loading requirements and privacy policies in a library; generating a demand card according to the demand; training a privacy compliance training model based on standard privacy regulations; and detecting consistency of the policy text P and the demand card Q, outputting a risk value, and determining whether to issue the risk value to the outside. The application avoids compliance risk: and establishing a privacy compliance training model, quantifying laws and regulations, and avoiding illegal and illegal risks for enterprises in advance. And the detection efficiency is improved: the consistency check model realizes automatic privacy detection, only detects the latest version of privacy policy, supports real-time update, can detect a plurality of required cards simultaneously, and improves detection efficiency. And (3) detection of the front-end: the method provided by the application is integrated into a software development life cycle, can output risk values and rules (such as a risk scene triggering the exit of k1a1 data), can discover problems in advance, and can reduce app illegal off-shelf risks. The method is focused on carrying out consistency check on the requirements before the APP version is on line and the privacy policy text by utilizing the privacy policy detection model before each APP version is on line, so that the privacy policy is updated in real time, and the supervision and notification risk is avoided.

It is emphasized that the privacy compliance training model may also be stored in nodes of a blockchain in order to further ensure the privacy and security of the privacy compliance training model.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The intelligent city construction method can be applied to the intelligent government affair field, so that the construction of intelligent cities is promoted.

Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by way of computer readable instructions, stored on a computer readable storage medium, which when executed may comprise processes of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a software privacy compliance pre-detection apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus is specifically applicable to various electronic devices.

As shown in fig. 3, the software privacy compliance pre-detection apparatus 300 according to the present embodiment includes: a receiving module 301, a splitting module 302, a checking module 303 and an output module 304. Wherein: the receiving module 301 is configured to receive a standard privacy rule, a privacy policy of a demand card and an APP, and construct a privacy compliance training model based on the standard privacy rule; the splitting module 302 is configured to split a corresponding policy text from the privacy policy according to the requirement card; the checking module 303 is configured to perform consistency checking on the requirement card and the policy text; and the output module 304 is configured to input the requirement card into the privacy compliance training model when the consistency check fails, so as to obtain an output risk value.

In the embodiment, the method and the device realize early warning detection before the release of the APP by pre-detecting the privacy compliance of the software, so that the privacy risk of the APP to be released can be found in advance; the privacy compliance training model is built through standard privacy regulations and used for outputting risk values of the demand cards, determining privacy risk conditions of the demand cards, finding problems in advance and reducing APP illegal putting-down risks; through carrying out the consistency check to demand card and policy text to confirm whether agree with between demand card and the policy text, and then confirm whether the APP that waits to issue is about to be agreed with according to the privacy policy of information and software that the demand card obtained, thereby can be convenient for privacy policy's real-time and accurate update, avoid the software to be required to be rectified or the condition of putting down a frame after issuing.

The receiving module 301 further includes an obtaining sub-module, an extracting sub-module, and a constructing sub-module, where the obtaining sub-module is configured to obtain a plurality of preset risk factors, each risk factor includes at least one risk scene, and split the standard privacy rule according to the risk scene to obtain a target training text corresponding to each risk scene; the extraction submodule is used for extracting features of the target training text based on a TF-IDF text feature extraction technology to obtain risk factors and risk parameters of the risk scene; the construction sub-module is used for constructing the privacy compliance training model based on the risk parameters.

The risk scene carries a functional label, and the acquisition sub-module comprises a matching unit, a determining unit and a preprocessing unit; the matching unit is used for matching the function tag with the function catalog of the privacy regulation; the determining unit is used for taking the text under the successfully matched function directory as an initial training text of the corresponding risk scene; the preprocessing unit is used for preprocessing the initial training text to obtain the target training text.

The preprocessing unit comprises a first recognition subunit, a second recognition subunit and a reduction subunit, wherein the first recognition subunit is used for removing punctuation marks and stop words of the initial training text and recognizing whether the language of the initial training text is a preset language to be reduced or not; the second recognition subunit is used for recognizing the parts of speech of each word of the initial training text to obtain the parts of speech when the language of the initial training text is the language to be restored; and the restoring subunit is used for restoring the words of the initial training text into the original form according to the part of speech.

The splitting module 302 includes a judging sub-module, a matching sub-module, and a policy text determining sub-module, where the judging sub-module is configured to judge a function tag to which the demand card belongs, and obtain a target tag; the matching submodule is used for matching the target tag with the function catalog of the privacy policy; and the policy text determination submodule is used for taking texts under the function directory of the privacy policy successfully matched as the policy texts corresponding to the requirement cards.

The extraction submodule comprises a first calculation submodule and a second calculation submodule. The first calculation submodule is used for respectively extracting word frequencies of preset keywords from each target training text, and respectively calculating average values of the word frequencies corresponding to the target training texts to serve as risk parameters of risk scenes corresponding to the target training texts; the second calculation submodule is used for calculating IDFs of preset keywords of each target training text respectively, and calculating average values of the IDFs corresponding to the target training texts respectively to serve as risk parameters of the risk factors corresponding to the target training texts.

In some optional implementations of this embodiment, the second computing submodule is further configured to: the IDF is calculated by the following formula:

The inspection module 303 includes a rule matching sub-module, a first inspection determination sub-module, a vector conversion sub-module, a second inspection determination sub-module, and a third inspection determination sub-module, where the rule matching sub-module is configured to match the policy text with the standard privacy rule; the first checking and determining submodule is used for determining that the requirement card and the policy text consistency check are not passed when the matching fails; the vector conversion sub-module is used for respectively converting the demand card and the policy text into vectors when matching is successful, obtaining a demand vector and a policy vector, and carrying out cosine similarity calculation on the demand vector and the policy vector to obtain a cosine value; the second checking and determining submodule is used for determining that the requirement card and the policy text consistency check pass when the cosine value is 1; and the third checking and determining submodule is used for determining that the requirement card and the policy text consistency check are not passed when the cosine value is not 1.

The vector conversion submodule comprises a word segmentation unit, a merging unit, a comparison unit and a characterization unit, wherein the word segmentation unit is used for sequentially carrying out word segmentation operation on the demand card and the policy text to obtain a demand word segmentation text and a policy word segmentation text; the merging unit is used for merging the required word segmentation text and the policy word segmentation text to obtain a merged word segmentation text; the comparison unit is used for respectively comparing whether the vocabulary of the required word segmentation text and the vocabulary of the policy word segmentation text exist in the merging word segmentation text; the characterization unit is used for characterizing the existing vocabulary through word frequency, and obtaining the demand vector and the policy vector through characterizing the non-existing vocabulary through 0.

The requirement card marks risk scenes, and the output module 304 includes a first target scene determining submodule and a risk value calculating submodule, wherein the first target scene determining submodule is used for the privacy compliance training model to take the risk scene marked with the requirement card as a first target scene and take a risk factor to which the first target scene belongs as a first target factor; the first target scene determining submodule is used for determining risk parameters of the first target scene and risk parameters of the first target factor by the privacy compliance training model, calculating the risk parameters of the target scene and the risk parameters of the target factor, and outputting the risk value.

In some optional implementations of this embodiment, the constructing submodule is further configured to use the risk scene as a label of a corresponding policy text, train a classifier of an initial privacy compliance training model based on the policy text, and obtain the privacy compliance training model; the output module 304 further includes a classification sub-module and an output sub-module; the classifying sub-module is used for inputting the requirement card into the privacy compliance training model, the classifier of the privacy compliance training model outputs a risk scene of the requirement card as a second target scene, and a risk factor to which the second target scene belongs is used as a second target factor; the output submodule is used for determining and calculating risk parameters of the second target scene and risk parameters of the second target factor by the privacy compliance training model, and outputting the risk value.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 200 includes a memory 201, a processor 202, and a network interface 203 communicatively coupled to each other via a system bus. It should be noted that only computer device 200 having components 201-203 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 201 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 200. Of course, the memory 201 may also include both internal storage units of the computer device 200 and external storage devices. In this embodiment, the memory 201 is generally used to store an operating system installed on the computer device 200 and various application software, such as computer readable instructions of a software privacy compliance preamble detection method. In addition, the memory 201 may be used to temporarily store various types of data that have been output or are to be output.

The processor 202 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 202 is generally used to control the overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or process data, such as computer readable instructions for executing the software privacy compliance pre-detection method.

The network interface 203 may comprise a wireless network interface or a wired network interface, which network interface 203 is typically used to establish communication connections between the computer device 200 and other electronic devices.

In this embodiment, a privacy compliance training model is built through standard privacy regulations for outputting risk values of demand cards, determining privacy risk conditions of the demand cards, and capable of discovering problems in advance and reducing APP illegal off-shelf risks

The present application provides yet another embodiment, namely, a computer-readable storage medium storing computer-readable instructions that can be brought to

At least one processor is configured to execute the steps of the software privacy compliance pre-detection method described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. The software privacy compliance pre-detection method is characterized by comprising the following steps of:

consistency checking is carried out on the demand card and the policy text;

2. The method for software privacy compliance pre-detection as set forth in claim 1,

the step of receiving a privacy policy of a standard privacy rule, a demand card and an APP, and constructing a privacy compliance training model based on the standard privacy rule includes:

3. The method for pre-detection of privacy compliance of software according to claim 2, wherein the requirement card is marked with a risk scene, and the step of inputting the requirement card into the privacy compliance training model to obtain the output risk value comprises:

4. The method of software privacy compliance pre-detection of claim 2, wherein the step of constructing the privacy compliance training model based on the risk parameters comprises:

5. The software privacy compliance pre-detection method of claim 1, wherein the step of performing a compliance check on the demand card and policy text comprises:

matching the policy text with the standard privacy legislation;

6. The method of claim 5, wherein the step of converting the demand card and the policy text into vectors, respectively, to obtain a demand vector and a policy vector comprises:

7. The method for pre-detection of software privacy compliance according to claim 2, wherein the step of extracting features of the target training text based on TF-IDF text feature extraction technology to obtain risk parameters of each of the risk factors and the risk scenes comprises:

8. A software privacy compliance pre-detection device, comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the software privacy compliance pre-detection method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the software privacy compliance pre-detection method of any of claims 1 to 7.