Disclosure of Invention
The invention provides a user session resource data protection method and a software product applying AI decision, which combine large data of resource text to carry out joint debugging on an AI neural network so as to ensure the performance of the AI neural network. In the application stage of the AI neural network, the Online session resource text is subjected to the large data anonymous desensitization protection by combining the large data anonymous desensitization technology, so that the accuracy and the rationality of the large data anonymous desensitization protection can be improved. In addition, the Online session resource text can relate to the fields of metauniverse, digital service and the like, so that the method and the software product have high reusability and high expandability. In order to achieve the technical purpose, the invention adopts the following technical scheme.
The first aspect is a user session resource data protection method applying AI decision, applied to a big data AI decision server, the method comprising:
obtaining an original Online conversation resource text, and carrying out text detail reconstruction on the original Online conversation resource text to obtain an Online conversation resource reconstructed text;
performing sensitive data vector mining on the Online session resource reconfiguration text to obtain a target sensitive data characterization vector;
obtaining a general sensitive text processing network, and adopting the general sensitive text processing network and the target sensitive data characterization vector to carry out joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network;
Combining the network variables of the general sensitive text processing network, and performing improvement operation on the network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network;
obtaining a general sensitive data characterization vector and a to-be-anonymous Online session resource text, and performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector;
and carrying out sensitive data anonymous protection on the to-be-anonymous Online session resource text by adopting the final sensitive text processing network and the sensitive data splicing vector to obtain the Online session resource desensitization text meeting the big data protection condition.
In some optional embodiments, the performing sensitive data anonymization protection on the Online session resource text to be anonymized by using the final sensitive text processing network and the sensitive data stitching vector to obtain an Online session resource desensitized text meeting a big data protection condition includes:
extracting text features of the to-be-anonymous Online conversation resource text by adopting the final sensitive text processing network to obtain a to-be-anonymous sensitive text vector of the to-be-anonymous Online conversation resource text;
Performing sensitive element generalization operation on the sensitive text vector to be anonymized by adopting the sensitive data splicing vector to obtain a sensitive text generalization vector;
and performing text recovery operation on the sensitive text generalization vector by adopting the final sensitive text processing network to obtain the Online session resource desensitization text meeting the big data protection condition.
In some optional embodiments, the performing text feature extraction on the to-be-anonymous Online session resource text by using the final sensitive text processing network to obtain a to-be-anonymous sensitive text vector of the to-be-anonymous Online session resource text includes:
adopting the final sensitive text processing network to perform sensitive data vector mining processing on the to-be-anonymous Online session resource text to obtain text description data of the to-be-anonymous Online session resource text;
performing region projection operation on the text description data by adopting the final sensitive text processing network to obtain a text region positioning tag of the text description data;
and generating a sensitive text vector to be anonymous of the to-be-anonymous Online conversation resource text through the text region positioning tag by adopting the final sensitive text processing network.
In some optional embodiments, the performing a token vector stitching operation on the generic sensitive data token vector and the target sensitive data token vector to obtain a sensitive data stitching vector includes:
summarizing the general sensitive data characterization vector to obtain a summarized sensitive data characterization vector;
and vector aggregation is carried out on the summarized sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector.
In some optional embodiments, the performing, in combination with the network variable of the general sensitive text processing network, an improvement operation on the network variable of the intermediate sensitive text processing network to obtain a final sensitive text processing network includes:
extracting at least one network element to be improved from the intermediate sensitive text processing network;
extracting corresponding improved auxiliary units from the universal sensitive text processing network through the network unit to be improved;
and carrying out improvement operation on the element configuration variables of the element to be improved by combining the element configuration variables of the improvement auxiliary unit to obtain the final sensitive text processing network.
In some optional embodiments, the modifying the element configuration variable of the network element to be modified in combination with the element configuration variable of the modified auxiliary element to obtain the final sensitive text processing network includes:
determining a unit configuration variable weighting factor for the network element to be modified and a unit configuration variable weighting factor for the modified auxiliary unit;
and vector aggregation is carried out on the unit configuration variables of the to-be-improved network unit and the unit configuration variables of the improved auxiliary unit through the unit configuration variable weighting factors of the to-be-improved network unit and the unit configuration variable weighting factors of the improved auxiliary unit, so that the final sensitive text processing network is obtained.
In some optional embodiments, the adopting the general sensitive text processing network and the target sensitive data characterization vector performs joint debugging on the set sensitive text processing network to obtain an intermediate sensitive text processing network, and includes:
adopting the network variables of the general sensitive text processing network to roll back the network variables of the set sensitive text processing network to obtain a default sensitive text processing network;
And debugging the default sensitive text processing network by adopting the target sensitive data characterization vector to obtain the intermediate sensitive text processing network.
In some optional embodiments, the debugging the default sensitive text processing network using the target sensitive data characterization vector to obtain the intermediate sensitive text processing network includes:
acquiring an Online session resource debugging text;
adopting the target sensitive data characterization vector and the default sensitive text processing network to carry out sensitive data anonymous protection on the Online session resource debugging text to obtain a text anonymous protection prediction result;
determining the text anonymous protection prediction result and debugging cost data of a template Online conversation resource text;
and improving network variables of the default sensitive text processing network through the debugging cost data to obtain the universal sensitive text processing network.
In some optional embodiments, the performing sensitive data anonymization protection on the Online session resource debugging text by using the target sensitive data characterization vector and the default sensitive text processing network to obtain a text anonymization protection prediction result includes:
Adopting the default sensitive text processing network to conduct text feature extraction on the Online session resource debugging text to obtain a sensitive data characterization vector sample;
performing sensitive element generalization operation on the sensitive data representation vector sample by adopting the target sensitive data representation vector to obtain a sensitive text generalization vector sample of the Online session resource debugging text;
and generating a text anonymous protection prediction result of the Online session resource debugging text through the sensitive text generalization vector sample by adopting the default sensitive text processing network.
In some optional embodiments, the generating, by using the default sensitive text processing network and the sensitive text generalization vector sample, a text anonymous protection prediction result of the Online session resource debug text includes:
performing text recovery operation on the sensitive text generalization vector sample by adopting the default sensitive text processing network to obtain an Online session resource recovery text;
performing content discrimination operation on the Online session resource debugging text to obtain a content discrimination result of the Online session resource debugging text;
and adopting the content discrimination result to carry out content significance adjustment on the Online session resource recovery text to obtain the text anonymous protection prediction result.
In some alternative embodiments, the method further comprises:
performing text detail analysis on the Online session resource desensitization text to obtain a text detail analysis result of the Online session resource desensitization text;
and carrying out text detail reconstruction on the Online session resource desensitization text according to the text detail analysis result to obtain the Online session resource desensitization reconstruction text.
In some optional embodiments, the reconstructing text details of the Online session resource desensitization text according to the text detail analysis result to obtain an Online session resource desensitization reconstruction text includes:
acquiring an AI text reconstruction network;
performing text reconstruction on the Online session resource desensitization text by adopting the AI text reconstruction network to obtain an Online session resource desensitization reconstruction text;
the text reconstruction of the Online session resource desensitization text by adopting an AI text reconstruction network comprises the following steps:
obtaining a reconstructed text sample and setting an AI text reconstruction network;
performing disturbance adding operation on the reconstructed text sample to obtain a disturbance text sample;
and adopting the disturbance text sample to debug the set AI text reconstruction network to obtain the AI text reconstruction network.
The second aspect is a big data AI decision server comprising a memory and a processor; the memory is coupled to the processor; the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the computer instructions, when executed by the processor, cause the big data AI decision server to perform the method of the first aspect.
A third aspect is a software product for implementing a user session resource data protection method applying AI decisions, comprising a computer program/instruction, wherein the computer program/instruction, when executed, implements the method of performing the first aspect.
A fourth aspect is a computer readable storage medium having stored thereon a computer program which, when run, performs the method of the first aspect.
The embodiment of the invention can obtain the original Online conversation resource text, and reconstruct text details of the original Online conversation resource text to obtain the Online conversation resource reconstructed text; performing sensitive data vector mining on the Online session resource reconfiguration text to obtain a target sensitive data characterization vector; obtaining a general sensitive text processing network and a target sensitive data representation vector, wherein the general sensitive text processing network is used for adjusting a sensitive data representation form into a general sensitive data representation form; adopting a general sensitive text processing network and a target sensitive data representation vector to perform joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network, wherein the intermediate sensitive text processing network is used for further adjusting a sensitive data representation form according to big data protection conditions, and improving network variables of the general sensitive text processing network and network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network; obtaining a general sensitive data characterization vector and a to-be-anonymous Online session resource text, and performing characterization vector splicing operation on the general sensitive data characterization vector and a target sensitive data characterization vector to obtain a sensitive data splicing vector; and performing sensitive data anonymous protection on the to-be-anonymous Online session resource text by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain an Online session resource desensitization text meeting the big data protection condition, so that the accuracy and the rationality of the data anonymous desensitization protection can be ensured when the original Online session resource text is subjected to data anonymous desensitization protection.
Detailed Description
Hereinafter, the terms "first," "second," and "third," etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", or "a third", etc., may explicitly or implicitly include one or more such feature.
Fig. 1 shows a flow chart of a user session resource data protection method applying AI decision according to an embodiment of the present invention, where the user session resource data protection method applying AI decision may be implemented by a big data AI decision server, and the big data AI decision server may include a memory and a processor; the memory is coupled to the processor; the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the processor, when executing the computer instructions, causes the big data AI decision server to perform steps 101-106.
And 101, obtaining an original Online session resource text, and carrying out text detail reconstruction on the original Online session resource text to obtain an Online session resource reconstructed text.
In the embodiment of the invention, the original Online session resource text comprises the Online session resource text which needs to be subjected to text detail optimization reconstruction. For example, the original Online session resource text may be an Online session resource text in which a phrase exists. For another example, the original Online session resource text may be an Online session resource text with wrongly written words.
Under some possible design ideas, various schemes can be adopted to reconstruct text details of the original Online conversation resource text. For example, text detail reconstruction is performed on the original Online conversation resource text based on the set AI text reconstruction network.
The set AI text reconstruction network can be built on the basis of a convolutional neural network, a Bayesian network and an activation function. The set AI text reconstruction network can optimize text details of the original Online conversation resource text, so that a better Online conversation resource reconstruction text is obtained.
And 102, performing sensitive data vector mining on the Online session resource reconfiguration text to obtain a target sensitive data characterization vector.
Under some possible design ideas, after the Online conversation resource reconstruction text is obtained, sensitive data vector mining (feature extraction) can be performed on the Online conversation resource reconstruction text, so as to obtain a target sensitive data characterization vector.
Further, the target sensitive data characterization vector includes text features for documenting a characterization form of the sensitive data.
Under some possible design ideas, various schemes can be adopted to perform sensitive data vector mining on the Online session resource reconstruction text, so as to obtain a target sensitive data characterization vector.
For example, various networks combined with an AI algorithm can be adopted to perform sensitive data vector mining on the Online session resource reconstruction text, so as to obtain a target sensitive data characterization vector.
And 103, obtaining a general sensitive text processing network and a target sensitive data representation vector, and adopting the general sensitive text processing network and the target sensitive data representation vector to perform joint debugging on the set sensitive text processing network to obtain an intermediate sensitive text processing network.
In the embodiment of the invention, the general sensitive text processing network comprises a network which can adjust the sensitive data representation form of the Online session resource text to the general sensitive data representation form. The generic sensitive text processing network can be understood as a basic sensitive text processing network, so that the generic sensitive data representation can be adaptively understood as a basic sensitive data representation, which can be understood as a generic desensitization protection output form designed for data desensitization protection. On the basis, the migration debugging of the set sensitive text processing network can be realized based on the general sensitive text processing network and the target sensitive data characterization vector, so that the intermediate sensitive text processing network is obtained, and the intermediate sensitive text processing network is the sensitive text processing network after the set sensitive text processing network completes the migration debugging.
Those skilled in the art will appreciate that the generic sensitive text processing network may be a convolutional neural network, a deep learning network, or the like.
The Online session resource text in the embodiment of the invention can comprise different types of Online session resource text. For example, the Online session resource text in the embodiment of the invention can be an Online session resource text generated in the meta-space service interaction process, an Online session resource text generated in the digital financial service interaction process, or an Online session resource text generated in the Online chat process.
The sensitive data representation form can comprise an anonymous desensitization output form of sensitive data information in the Online session resource text.
Under some possible design ideas, the sensitive data representation form can be flexibly designed, such as processing based on K anonymity or processing based on a content shielding mode.
Under some possible design considerations, the generic sensitive data representation may include sensitive data representations of a variety of Online session resource texts. For example, there are 5 sensitive data characterization forms, which are all different. For ease of description, these 5 sensitive data characterization forms may be considered as generic sensitive data characterization forms.
Under some possible design considerations, big data protection conditions may include sensitive data representation forms that the anonymous request system expects the Online session resource text to output. For example, the initial sensitive data representation form of the to-be-anonymous Online session resource text is the sensitive data representation form of the personal information shielding and the business interaction content shielding, but the sensitive data representation form of only the personal information for desensitizing protection can be a big data protection condition if the anonymous request system hopes to convert the to-be-anonymous Online session resource text into the sensitive data representation form for only the personal information for desensitizing protection.
Under some possible design considerations, the generic sensitive data characterization vector includes text features that can be used to document a characterization form of sensitive data.
In view of the ability to treat a variety of different sensitive data characterization forms as a generic sensitive data characterization form, the generic sensitive data characterization vector may also include features of the variety of different sensitive data characterization forms. For example, the generic sensitive data representation comprises 5 different sensitive data representations, and the generic sensitive data representation vector may comprise 5 different sensitive data representation vectors.
Under some possible design ideas, before the universal sensitive data representation vector is obtained, a sensitive data output processing network can be adopted to conduct text feature extraction on Online session resource texts in various different sensitive data representation forms, so that text features in various different sensitive data representation forms are obtained.
For example, a sensitive data output processing network may be used to refine text features of multiple Online session resource texts, so as to obtain multiple sensitive data characterization vectors.
Those skilled in the art will recognize that the sensitive data output processing network may be a neural network constructed according to actual requirements.
Under some possible design considerations, the target sensitive data characterization vector includes text features for documenting a characterization form of the sensitive data.
Under some possible design ideas, the output modes of the target sensitive data characterization vector and the general sensitive data characterization vector can be various. For example, the target sensitive data characterization vector and the generic sensitive data characterization vector may be in the form of a linear array, or the like.
In some possible embodiments, the untrained sensitive text processing network may be debugged prior to obtaining the generic sensitive text processing network, resulting in the generic sensitive text processing network. For example, the universal sensitive data characterization vector and the Online session resource debugging text can be used for debugging an untrained sensitive text processing network, so that the universal sensitive data characterization form adjustment is obtained.
In view of the similarity in performance of the intermediate sensitive text processing network to the general sensitive text processing network, the intermediate sensitive text processing network may be adapted to some specific sensitive data representation. Therefore, the universal sensitive text processing network and the target sensitive data characterization vector can be adopted to carry out joint debugging (such as migration debugging) on the setting sensitive text processing network so as to improve the timeliness of debugging the setting sensitive text processing network. Those skilled in the art will appreciate that joint debugging may be performed in combination with supervised training and unsupervised training.
Wherein the debug sample set includes sample resources for debugging the neural network. For example, when the set sensitive text processing network is debugged, the target sensitive data characterization vector is adopted to debug the set sensitive text processing network, so that the debug sample set is the target sensitive data characterization vector.
In the actual implementation process, the number of the Online session resource texts is usually not large, and if the Online session resource texts are directly adopted to debug the intermediate sensitive text processing network to be debugged, the performance of the intermediate sensitive text processing network is low. Therefore, the universal sensitive text processing network with better performance can be adopted to jointly debug the intermediate sensitive text processing network to be debugged.
Exemplary, the step of "using a general sensitive text processing network and the target sensitive data characterization vector to perform joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network" may include: adopting network variables of a general sensitive text processing network to roll back the network variables of the set sensitive text processing network to obtain a default sensitive text processing network; and debugging the default sensitive text processing network by adopting a target sensitive data characterization vector to obtain the intermediate sensitive text processing network.
Where the network variables may be understood as model parameters, the rollback process may be understood as a model parameter initialization process, and the set-up sensitive text processing network may include a neural network that has not been commissioned. For example, the setting sensitive text processing network may be a convolutional neural network model, but it is not possible to perform desensitization and anonymization protection on the Online session resource text based on big data protection conditions.
Under some possible design considerations, in order for the generic sensitive text processing network to perform joint debugging on the set-up sensitive text processing network, the network structure of the generic sensitive text processing network and the network structure of the set-up sensitive text processing network are generally the same.
For example, if the network structure of the general sensitive text processing network includes three layers of functional units, the network structure page setting the sensitive text processing network may include three layers of functional units.
Under some possible design ideas, when the network variables of the universal sensitive text processing network are adopted to carry out rollback processing on the network variables of the intermediate sensitive text processing network to be debugged, the network variables of the set sensitive text processing network can be set through the network variables of the universal sensitive text processing network, so that the default sensitive text processing network has the performance of universal sensitive data representation form adjustment.
For example, if the network variables of the three layers of functional units in the general sensitive text processing network are p1, p2 and p3, respectively, the network variables of the three layers of functional units in the sensitive text processing network may be p1, p2 and p3.
Under some possible design ideas, in order for the intermediate sensitive text processing network to adjust the sensitive data representation form of the Online session resource text to be a big data protection condition, the Online session resource text may be adopted to debug the default sensitive text processing network, thereby obtaining the intermediate sensitive text processing network.
Illustratively, the step of debugging the default sensitive text processing network using the target sensitive data characterization vector to obtain an intermediate sensitive text processing network may include: acquiring an Online session resource debugging text; adopting a target sensitive data characterization vector and the default sensitive text processing network to carry out sensitive data anonymous protection on the Online session resource debugging text, and obtaining a text anonymous protection prediction result; determining a text anonymous protection prediction result and debugging cost data of a template Online conversation resource text; network variables of the default sensitive text processing network are improved through the debugging cost data to obtain the universal sensitive text processing network.
The Online session resource debugging text can be understood as a training sample of the Online session resource text. Debug cost data may be understood as lost information or lost data.
Under some possible design ideas, the idea of debugging the default sensitive text processing network (the initialized sensitive text processing network obtained after the network variable rollback processing) can be a process of implementing sensitive data anonymity protection on Online session resource debugging texts by adopting continuous learning of the default sensitive text processing network. Exemplary, the step of "performing sensitive data anonymization protection on the Online session resource debugging text by using the target sensitive data characterization vector and the default sensitive text processing network to obtain a text anonymization protection prediction result" may include: adopting the default sensitive text processing network to conduct text feature extraction on the Online session resource debugging text to obtain a sensitive data characterization vector sample; performing sensitive element generalization operation on the sensitive data representation vector sample by adopting the target sensitive data representation vector to obtain a sensitive text generalization vector sample of the Online session resource debugging text; and generating a text anonymous protection prediction result of the Online session resource debugging text through the sensitive text generalization vector sample by adopting the default sensitive text processing network.
Under some possible design ideas, when text feature extraction is performed on the Online session resource debugging text by using a default sensitive text processing network, text feature extraction may be performed on the Online session resource debugging text by using a text feature extraction subnet (feature encoder), so as to obtain a sensitive data characterization vector sample.
Under some possible design ideas, when the default sensitive text processing network is adopted to refine text characteristics of the Online session resource debugging text, the default sensitive text processing network can be adopted to perform sensitive data vector mining on the Online session resource debugging text, so as to obtain text description data (which can be understood as characteristic information) of the Online session resource debugging text. And then, debugging text description data of the text through the Online session resource to obtain a sensitive data characterization vector sample. Exemplary, the step of "using a default sensitive text processing network to refine text features of an Online session resource debug text to obtain a sensitive data token vector sample" may include: adopting a default sensitive text processing network to perform sensitive data vector mining processing on the Online session resource debugging text to obtain text description data of the Online session resource debugging text; performing region projection operation on the text description data by adopting a default sensitive text processing network to obtain a text region positioning tag of the text description data; and generating a sensitive data characterization vector sample of the Online session resource debugging text through a text region positioning label by adopting a default sensitive text processing network.
For example, the region projection operation may be understood as a location mapping process, and text region location tags are used to reflect the location distribution characteristics of text description data. Further, the text description data of the Online session resource debug text includes information that can represent an Online session resource text feature of the Online session resource debug text.
Under some possible design ideas, the mining bias of the feature vectors also has differences when the sensitive data vector mining is carried out on the Online session resource debugging text through the difference of the Online session resource debugging text content.
Under some possible design ideas, various schemes can be adopted to perform sensitive data vector mining on Online session resource debugging text. For example, gradient units of the universal sensitive text processing network to be debugged can be adopted to carry out moving average processing on the Online session resource text, so that text description data of the Online session resource debugging text can be obtained. For another example, a moving average operator of the universal sensitive text processing network to be debugged can be adopted to carry out moving average processing on the Online session resource text, so that text description data of the Online session resource debugging text is obtained.
Under some possible design ideas, after text description data of the Online session resource debugging text is obtained, the text description data can be subjected to region projection operation to obtain a text region positioning tag of the text description data.
Under some possible design ideas, a default sensitive text processing network can be adopted, and a sensitive data characterization vector sample of the Online session resource debugging text can be generated through a text region positioning label.
For example, the text region localization tag may be adjusted to a sensitive data characterization vector instance using set intermediate features. The setting intermediate features comprise feature vectors configured in advance in a default sensitive text processing network, and the feature vectors can adjust text region positioning labels to be sensitive data characterization vector samples.
Under some possible design ideas, after the sensitive data representation vector of the Online session resource debugging text is obtained, a target sensitive data representation vector can be adopted, and sensitive element generalization operation is carried out on the sensitive data representation vector sample, so that a sensitive text generalization vector sample of the Online session resource debugging text is obtained. The target sensitive data representation vector is adopted, and various schemes can be adopted when sensitive element generalization operation is carried out on the Online session resource debugging text.
For example, the target sensitive data token vector and the sensitive data token vector sample may be summed to obtain a sensitive text generalization vector sample (text feature vector after further anonymization processing) of the Online session resource debug text. For another example, regularization processing can be performed on the target sensitive data characterization vector and the sensitive data characterization vector sample, so as to obtain a sensitive text generalization vector sample of the Online session resource debugging text.
For example, the evaluation index (mean+variance) of the sensitive data representation vector sample may be aligned to the evaluation index (mean+variance) of the general sensitive data representation vector, so as to obtain the sensitive text generalization vector sample.
Under some possible design ideas, after the sensitive text generalization vector examples are obtained, a default sensitive text processing network can be adopted, and a text anonymous protection prediction result of the Online session resource debugging text can be generated through the sensitive text generalization vector examples. Illustratively, the step of "generating a text anonymous protection prediction result of the Online session resource debug text by using the default sensitive text processing network and the sensitive text generalization vector sample" may include: performing text recovery operation (text feature decoding processing) on the sensitive text generalization vector sample by adopting the default sensitive text processing network to obtain an Online session resource recovery text (feature decoding text); performing content discrimination operation on the Online session resource debugging text to obtain a content discrimination result of the Online session resource debugging text; and adopting the content discrimination result to carry out content significance adjustment on the Online session resource recovery text to obtain the text anonymous protection prediction result. Wherein the content discriminating operation can be understood as a semantic splitting process.
Under some possible design ideas, when the sensitive text processing network is set to be the generation countermeasure network, a feature decoding unit in the generation countermeasure network can be adopted to perform text recovery operation on the sensitive text generalization vector sample, so as to obtain an Online session resource recovery text.
The Online session resource recovery text comprises Online session resource text with the target sensitive data characterization vector. However, because of the difference of the contents of the debugging texts of different Online session resources, vector reinforcement can be performed on the Online session resource recovery text through the contents of the Online session resource debugging texts, so that the obtained text anonymous protection prediction result is as complete and accurate as possible.
Under some possible design ideas, content discrimination operation can be performed on the Online session resource debugging text to obtain a content discrimination result of the Online session resource debugging text. And then, adopting a content discrimination result to carry out content significance adjustment on the Online session resource recovery text, and obtaining a text anonymous protection prediction result.
Under some possible design ideas, when the content discrimination result is adopted to perform content saliency adjustment (feature strengthening treatment) on the Online session resource recovery text, the content discrimination result and text description data of the Online session resource recovery text can be overlapped, so that the content saliency adjustment on the Online session resource recovery text is realized.
Under some possible design ideas, after the text anonymous protection prediction result is obtained, the text anonymous protection prediction result and the debugging cost data of the template Online conversation resource text can be determined, so that network variables for setting the sensitive text processing network can be adjusted through the debugging cost data, and an intermediate sensitive text processing network can be obtained.
The debugging cost data comprises the similarity degree of sensitive data representation forms between the anonymous protection prediction result of the judging text and the template Online conversation resource text. For example, the debugging cost data can be a variable value, and when the variable value is smaller, the higher the similarity degree of the sensitive data representation form between the text anonymous protection prediction result and the template Online conversation resource text is indicated, the better the running quality of the network is. Conversely, when the variable value is larger, the lower the similarity of the sensitive data representation form between the text anonymous protection prediction result and the template Online session resource text is indicated, and the poorer the running quality of the network is.
Under some possible design considerations, a cost function (such as a cross entropy cost function) may be used to determine the text anonymously protecting the prediction result and debug cost data for the template Online session resource text.
Under some possible design considerations, when the sensitive text processing network is set to be a generating countermeasure network, a decision subnet in the generating countermeasure network can be used to determine a text anonymous protection prediction result and debug cost data of a template Online session resource text.
After obtaining the debugging cost data under some possible design ideas, the network variables of the set sensitive text processing network can be improved by the debugging cost data, so that the intermediate sensitive text processing network is obtained.
For example, when the debugging cost data is large, network variables of the set sensitive text processing network can be adjusted. And then debugging the final sensitive text processing network to see whether the debugging cost data is improved. And (3) circularly debugging based on the thought until the debugging cost data meets the requirement, and determining the current final sensitive text processing network as an intermediate sensitive text processing network.
Under some possible design ideas, when the untrained sensitive text processing network is debugged, the richness and the volume of the debug sample set of the untrained sensitive text processing network can be ensured because the general sensitive data representation form can comprise the sensitive data representation forms of various Online session resource texts, and therefore, the general sensitive data representation vector can comprise the sensitive data representation vectors of various Online session resource texts. Therefore, the universal sensitive data characterization vector can be adopted to debug the untrained sensitive text processing network, so that the universal sensitive text processing network is obtained. The process of debugging the untrained sensitive text processing network by adopting the general sensitive data representation form Online session resource text can refer to the process of debugging the default sensitive text processing network.
In the embodiment of the invention, the universal sensitive text processing network to be debugged is debugged, so that the universal sensitive text processing network can grasp the characteristics of various sensitive data representation forms and has universal performance of adjusting the sensitive data representation forms of the Online session resource texts. And then, the intermediate sensitive text processing network to be debugged is jointly debugged by adopting the universal sensitive data characterization vector, so that the timeliness of network debugging is improved.
In the process of debugging the network, text strengthening is carried out on the Online session resource text by adopting the content discrimination result of the Online session resource debugging text, so that the running quality of the network can be further improved, the Online session resource text subjected to sensitive data representation form adjustment through the network is matched with big data protection conditions, and personalized and targeted data anonymous protection is realized.
And 104, combining the network variables of the general sensitive text processing network, and performing improvement operation on the network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network.
Under some possible design ideas, in order to further improve the performance of the intermediate sensitive text processing network, the intermediate sensitive text processing network can be adjusted by adopting a universal sensitive text processing network with better usability, so as to obtain a final sensitive text processing network. The final sensitive text processing network is better in quality, and the Online session resource text subjected to the final sensitive text processing network for the adjustment of the sensitive data representation form can be more close to the big data protection condition.
Under some possible design ideas, the step of "combining network variables of the general-purpose sensitive text processing network to perform an improvement operation on network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network" may include: extracting at least one network element to be improved from the intermediate sensitive text processing network; extracting a corresponding improved auxiliary unit from the universal sensitive text processing network through the to-be-improved network unit; and carrying out improvement operation on the element configuration variables of the to-be-improved network element by combining the element configuration variables of the improved auxiliary element to obtain the final sensitive text processing network.
Those skilled in the art will appreciate that the functional units/network elements are part of a neural network, each functional unit having a different function.
Under some possible design considerations, when network variables of the generic sensitive text processing network are modified with network variables of the intermediate sensitive text processing network, at least one network element to be modified may first be extracted from the intermediate sensitive text processing network. Wherein the network element to be improved comprises functional units whose performance is to be improved. For example, when the performance of a sensitive data vector mining layer (mining function unit) in the intermediate sensitive text processing network is poor, the sensitive data vector mining layer can be determined as a network element to be improved and extracted. For another example, when the performance of the sensitive data vector mining layer and the downsampling layer in the intermediate sensitive text processing network is poor, both the sensitive data vector mining layer and the downsampling layer may be extracted and determined to be a network element to be improved.
Under some possible design ideas, the corresponding improved auxiliary units can be extracted from the general sensitive text processing network through the to-be-improved network units. Wherein the improvement assisting unit comprises a functional unit which serves as a reference when adjusting the network element to be improved.
For example, when the performance of the sensitive data vector mining layer and the downsampling layer in the middle sensitive text processing network is poor, the sensitive data vector mining layer and the downsampling layer in the general sensitive text processing network can be extracted accordingly, and the sensitive data vector mining layer and the downsampling layer in the general sensitive text processing network are determined to be improved auxiliary units.
Under some possible design ideas, after the network element to be improved and the improvement auxiliary element are extracted, the element configuration variable of the improvement auxiliary element can be improved on the element configuration variable of the network element to be improved, so as to obtain the final sensitive text processing network. When the unit configuration variables of the improving auxiliary unit are adopted to improve the unit configuration variables of the to-be-improved network unit, the unit configuration variables of the improving auxiliary unit and the unit configuration variables of the to-be-improved network unit can be fused, so that the final sensitive text processing network is obtained. Illustratively, the step of performing an improvement operation on the element configuration variables of the network element to be improved using the element configuration variables of the improvement auxiliary element to obtain a final sensitive text processing network may include: determining a unit configuration variable weighting factor of the network element to be improved and a unit configuration variable weighting factor of the auxiliary unit to be improved; and carrying out parameter vector aggregation on the unit configuration variables of the network element to be improved and the unit configuration variables of the auxiliary unit to be improved through the unit configuration variable weighting factors of the network element to be improved and the unit configuration variable weighting factors of the auxiliary unit to be improved, so as to obtain the final sensitive text processing network.
For example, the element configuration variable of the network element to be improved is in1, and the element configuration variable of the improvement assisting element is in2. Wherein the unit configuration variable weighting factor of the network unit to be improved is x1, and the unit configuration variable weighting factor of the auxiliary unit to be improved is y1. When the element configuration variables of the network element to be improved and the element configuration variables of the auxiliary element to be improved are vector-polymerized, the latest element configuration variable in=in1×1+in2×y1 can be obtained. In this way, the configuration variables of the units of the intermediate sensitive text processing network can be changed, so that the final sensitive text processing network can not only adjust the sensitive data representation form of the Online session resource text to be a big data protection condition, but also improve the performance of adjusting the sensitive data representation form of the Online session resource text through the general sensitive text processing network.
And 105, obtaining a general sensitive data characterization vector and an Online session resource text to be anonymized, and performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector.
Under some possible design considerations, the Online session resource text to be anonymous may include Online session resource text that makes sensitive data characterization form adjustments. The embodiment of the invention is not limited to the representation form and the content of the sensitive data of the resource text of the Online conversation to be anonymized.
Wherein, since various different sensitive data characterization forms can be regarded as the general sensitive data characterization form, the general sensitive data characterization vector can also include features of the various different sensitive data characterization forms. For example, the generic sensitive data representation comprises 5 different sensitive data representations, and the generic sensitive data representation vector may comprise 5 different sensitive data representation vectors.
Under some possible design ideas, the general sensitive data characterization vector and the target sensitive data characterization vector can be subjected to characterization vector splicing operation to obtain a sensitive data splicing vector.
The general sensitive data characterization vector may include a plurality of sensitive data characterization vectors, so that the general sensitive data characterization vector may be integrated with the target sensitive data characterization vector after the summarization operation. Exemplary, the step of performing the token vector splicing operation on the general sensitive data token vector and the target sensitive data token vector to obtain a sensitive data splicing vector may include: summarizing the general sensitive data characterization vector to obtain a summarized sensitive data characterization vector; and vector aggregation is carried out on the summarized sensitive data characterization vector and the target sensitive data characterization vector, so that a sensitive data splicing vector is obtained.
Various schemes can be adopted to summarize the general sensitive data characterization vectors. For example, a plurality of sensitive data characterization vectors may be averaged to obtain a summarized sensitive data characterization vector. For another example, the variances of a plurality of sensitive data characterization vectors may be determined, resulting in summarized sensitive data characterization vectors.
After the summarized sensitive data characterization vector is obtained, vector aggregation can be carried out on the summarized sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector.
For example, the summarized sensitive data characterization vector and the target sensitive data characterization vector may be summed to obtain the sensitive data splice vector.
Under some possible design ideas, the steps of performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector and performing improvement operation on network variables of the intermediate sensitive text processing network by adopting network variables of the general sensitive text processing network to obtain a final sensitive text processing network are not limited in implementation sequence. For example, the step of performing the token vector splicing operation on the general sensitive data token vector and the target sensitive data token vector to obtain the sensitive data splicing vector may be performed first, or the step of performing the improvement operation on the network variable of the intermediate sensitive text processing network by using the network variable of the general sensitive text processing network may be performed first to obtain the final sensitive text processing network. For another example, it may also be implemented synchronously.
And 106, performing sensitive data anonymity protection on the Online conversation resource text to be anonymized by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain the Online conversation resource desensitized text meeting the big data protection condition.
Under some possible design ideas, a final sensitive text processing network and a sensitive data splicing vector can be adopted to carry out sensitive data anonymous protection on the Online conversation resource text to be anonymous, so that the Online conversation resource desensitization text meeting the big data protection condition is obtained. Exemplary, the step of anonymously protecting sensitive data of the Online session resource text to be anonymized by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain an Online session resource desensitized text meeting a big data protection condition may include: adopting a final sensitive text processing network to refine text characteristics of the Online conversation resource text to be anonymous to obtain a sensitive text vector to be anonymous of the Online conversation resource text to be anonymous; performing sensitive element generalization operation on the sensitive text vector to be anonymized by adopting the sensitive data splicing vector to obtain a sensitive text generalization vector; and performing text recovery operation on the sensitive text generalization vector by adopting a final sensitive text processing network to obtain the Online session resource desensitization text meeting the big data protection condition.
Under some possible design ideas, when the final sensitive text processing network is a generated countermeasure network, a text feature extraction subnet (feature encoder) in the generated countermeasure network can be adopted to extract text features of the to-be-anonymous Online conversation resource text, so as to obtain a to-be-anonymous Online conversation resource text sensitive data representation form of the to-be-anonymous Online conversation resource text.
Under some possible design ideas, a final sensitive text processing network can be adopted to perform sensitive data vector mining processing on the Online conversation resource text to be anonymized, so as to obtain text description data of the Online conversation resource text to be anonymized. And then, obtaining the sensitive text vector to be anonymous through text description data of the Online conversation resource text to be anonymous. Exemplary, the step of performing text feature extraction on the Online conversation resource text to be anonymized by adopting the final sensitive text processing network to obtain a sensitive text vector to be anonymized of the Online conversation resource text to be anonymized may include: adopting a final sensitive text processing network to perform sensitive data vector mining processing on the to-be-anonymous Online session resource text to obtain text description data of the to-be-anonymous Online session resource text; performing region projection operation on the text description data by adopting a final sensitive text processing network to obtain a text region positioning tag of the text description data; and generating a sensitive text vector to be anonymous of the Online conversation resource text to be anonymous through the text region positioning tag by adopting a final sensitive text processing network.
The sensitive data vector mining can be performed on the to-be-anonymous Online session resource text by adopting various schemes. For example, a gradient unit of a final sensitive text processing network can be adopted to carry out moving average processing on the to-be-anonymous Online session resource text, so as to obtain text description data of the to-be-anonymous Online session resource text. For another example, a moving average operator of the final sensitive text processing network may be used to perform a moving average processing on the to-be-anonymous Online session resource text, so as to obtain text description data of the to-be-anonymous Online session resource text. When the text description data is subjected to the region projection operation, the preset distribution condition can be adopted to carry out the region projection operation on the text description data. When the sensitive text vector to be anonymized of the Online conversation resource text to be anonymized is generated according to the text region locating label, the text region locating label can be adjusted to the sensitive text vector to be anonymized by setting intermediate features.
Under some possible design ideas, sensitive element generalization operation (feature generalization processing) is carried out on a sensitive text vector to be anonymized by adopting a sensitive data splicing vector through various schemes to obtain a sensitive text generalization vector.
For example, the sensitive data stitching vector and the sensitive text vector to be anonymized can be integrated, so that a sensitive text generalization vector is obtained.
Under some possible design ideas, when the final sensitive text processing network is a generated countermeasure network, a feature decoding unit in the generated countermeasure network can be used for extracting text features of the Online conversation resource text to be anonymous, so as to obtain the Online conversation resource desensitization text meeting the big data protection condition.
Under some possible design ideas, after the Online conversation resource desensitization text is obtained, the quality degree of the Online conversation resource desensitization text can be analyzed, and when the quality degree of the Online conversation resource desensitization text is poor, the quality degree of the Online conversation resource desensitization text can be improved. Illustratively, the method may further comprise: text detail analysis is carried out on the Online session resource desensitization text to obtain a text detail analysis result of the Online session resource desensitization text; and carrying out text detail reconstruction on the Online session resource desensitization text according to the text detail analysis result to obtain the Online session resource desensitization reconstruction text.
The text detail analysis result of the Online conversation resource desensitization text comprises information which can represent the quality of the Online conversation resource text. For example, the text detail parsing result may include information such as word accuracy of the Online conversation resource desensitization text, online conversation resource text size, and the like.
Under some possible design ideas, text detail reconstruction can be carried out on the Online conversation resource desensitization text through text detail analysis results, so that the reconstructed Online conversation resource desensitization text is obtained, and the detail quality of the Online conversation resource desensitization text is improved.
Under some possible design ideas, an AI text reconstruction network can be used for text reconstruction of the Online session resource desensitized text. The AI text reconstruction network may be a deep learning model, or may be another type of neural network model. The network layer structure of the AI text reconstruction network can be flexibly adjusted according to actual requirements by a person skilled in the art.
Under some possible design ideas, before text reconstruction is performed on the Online conversation resource desensitized text by adopting the AI text reconstruction network, a set AI text reconstruction network can be obtained, and the set AI text reconstruction network is debugged, so that the AI text reconstruction network is obtained. Wherein, the step of reconstructing the network for the set AI text may include: obtaining a reconstructed text sample and an AI text reconstruction network to be debugged; performing disturbance adding operation on the reconstructed text sample to obtain a disturbance text sample; and debugging the set AI text reconstruction network by adopting the disturbance text sample to obtain the AI text reconstruction network.
The reconstructed text sample can comprise optimized Online session resource text in any sensitive data representation form. The AI text reconstruction network can be set up based on convolutional neural network, bayesian network and activation function.
Under some possible design ideas, because of the lack of the reconstructed text sample, the reconstructed text sample may be subjected to a disturbance adding operation (noise adding process) to obtain a disturbance text sample. And then, debugging the AI text reconstruction network to be debugged by adopting the disturbance text sample, thereby obtaining the AI text reconstruction network. Wherein the perturbation addition operation includes a process of actively reducing the quality level of the reconstructed text sample. There are various ways of disturbing the adding operation. For example, the word or sentence may be added by mistake.
After the disturbance text sample is obtained, the AI text reconstruction network to be debugged can be debugged by adopting the disturbance text sample, so that the AI text reconstruction network is obtained.
Under some possible design ideas, after the AI text reconstruction network is obtained, the AI text reconstruction network can be adopted to reconstruct text details of the Online conversation resource desensitized text. For example, if the Online conversation resource desensitization text has the problem of lower word accuracy, an AI text reconstruction network can be adopted to reconstruct text details of the Online conversation resource desensitization text, so that the quality degree of the Online conversation resource desensitization text is improved.
The embodiment of the invention provides a user session resource data protection method applying an AI decision, which comprises the following steps: the method comprises the steps of obtaining a general sensitive text processing network, an intermediate sensitive text processing network, a general sensitive data characterization vector, a target sensitive data characterization vector and a to-be-anonymous Online session resource text, wherein the general sensitive text processing network is used for adjusting a sensitive data characterization form into a general sensitive data characterization form, and the intermediate sensitive text processing network is used for further adjusting the sensitive data characterization form according to big data protection conditions; performing improvement operation on network variables of the intermediate sensitive text processing network by adopting network variables of the universal sensitive text processing network to obtain a final sensitive text processing network; performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector; and carrying out sensitive data anonymization protection on the Online conversation resource text to be anonymized by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain the Online conversation resource desensitization text meeting the big data protection condition. The network variables of the middle sensitive text processing network are improved by adopting the network variables of the general sensitive text processing network, so that the Online conversation resource desensitization text generated by the final sensitive text processing network is more matched with big data protection conditions, and the accuracy and rationality of the data anonymous desensitization protection can be ensured when the data anonymous desensitization protection is carried out on the original Online conversation resource text.
In addition, sensitive data stitching vectors are also employed in generating Online session resource desensitized text. Because the sensitive data splicing vector is obtained by vector aggregation of the general sensitive data characterization vector and the target sensitive data characterization vector, the sensitive data splicing vector can be compatible with different requirements of data anonymous protection, and the precision and the reliability of adjusting the sensitive data characterization form of the Online session resource text can be further improved.
In addition, the embodiment of the invention can reconstruct details of the Online session resource desensitization text, thereby improving the quality degree of the Online session resource desensitization text.
In the embodiment of the invention, the big data AI decision server can obtain a general sensitive text processing network, an intermediate sensitive text processing network, a general sensitive data characterization vector, a target sensitive data characterization vector and an Online session resource text to be anonymized; the large data AI decision server adopts the network variables of the general sensitive text processing network to carry out improvement operation on the network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network; the big data AI decision server performs characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector; the big data AI decision server adopts a final sensitive text processing network and sensitive data splicing vectors to carry out sensitive data anonymous protection on the to-be-anonymous Online session resource text to obtain the Online session resource desensitization text meeting the big data protection condition, so that the accuracy and rationality of the data anonymous desensitization protection can be ensured when the data anonymous desensitization protection is carried out on the original Online session resource text.
Based on the foregoing, in some independent embodiments, after performing sensitive data anonymity protection on the Online session resource text to be anonymized by using the final sensitive text processing network and the sensitive data stitching vector to obtain an Online session resource desensitized text meeting a big data protection condition, the method further includes: responding to a session resource pushing request sent by a pushing platform system, and determining a question-answer preference label of an online session client pointed by the session resource pushing request; and pushing the Online session resource desensitization text to the Online session client when the question and answer preference tag is matched with the Online session resource desensitization text.
Therefore, before the Online session resource desensitization text is pushed, the accuracy of data pushing can be guaranteed through the matching processing of the question-answer preference labels, and the resource waste caused by pushing deviation is reduced.
Based on the foregoing, in some independent embodiments, the determining, in response to a session resource push request sent by a push platform system, a question-answer preference tag of an online session client to which the session resource push request points includes steps 201-206.
Step 201, acquiring an online question-answer information set of the online session client in response to the question-answer record of the online session client, wherein the online question-answer information set comprises W groups of online question-answer information with time sequence, and W is an integer greater than or equal to 1.
Step 202, acquiring an additional inquiry response information set according to the online inquiry response information set, wherein the additional inquiry response information set comprises W groups of additional inquiry response information with time sequence.
Step 203, acquiring an online question-answer interaction description set through a first dialogue identification component included in an online dialogue analysis algorithm based on the online question-answer information set, wherein the online question-answer interaction description set comprises W online question-answer interaction descriptions.
And 204, acquiring a challenge response interaction description set through a second dialogue recognition component included in the online dialogue analysis algorithm based on the challenge response information set, wherein the challenge response interaction description set comprises W challenge response interaction descriptions.
Step 205, based on the online question-answer interaction description set and the additional question-answer interaction description set, obtaining a preference analysis weight corresponding to the online question-answer information set through a preference analysis component included in the online dialogue analysis algorithm.
And 206, determining the question-answer preference labels of the online question-answer information set according to the preference analysis weights.
Therefore, the user question and answer requirements characterized by the online question and answer interaction description and the additional question and answer interaction description can be fully considered in the process of outputting the preference analysis weight by combining the online question and answer information and the additional question and answer information to determine the question and answer preference label, so that the reliability of the preference analysis weight can be ensured, and the determination accuracy of the question and answer preference label is further improved.
Based on the foregoing, in some independent embodiments, the obtaining, by a preference analysis component included in the online dialog analysis algorithm, a preference analysis weight corresponding to the online question-answer information set based on the online question-answer interaction description set and the additional question-answer interaction description set includes: based on the online question-answer interaction description set, W first description vectors are obtained through a first scene attention module included in the online dialogue analysis algorithm, wherein each first description vector corresponds to one online question-answer interaction description; based on the interactive description set of the challenge and response, acquiring W second description vectors through a second scene attention module included in the online dialogue analysis algorithm, wherein each second description vector corresponds to one interactive description of the challenge and response; performing stitching processing on the W first description vectors and the W second description vectors to obtain W target description vectors, wherein each target description vector comprises a first description vector and a second description vector; and based on the W target description vectors, acquiring the preference analysis weight corresponding to the online question-answer information set through the preference analysis component included in the online dialogue analysis algorithm.
The embodiment of the invention also provides a software product for realizing the user session resource data protection method applying the AI decision, which comprises a computer program/instruction, wherein the method is realized to be executed when the computer program/instruction is executed.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when run performs the above method.
In summary, a user session resource data protection method and a software product applying AI decision are provided, and the method and the software product combine large data of resource text to carry out joint debugging on the AI neural network so as to ensure the performance of the AI neural network. In the application stage of the AI neural network, the Online session resource text is subjected to the large data anonymous desensitization protection by combining the large data anonymous desensitization technology, so that the accuracy and the rationality of the large data anonymous desensitization protection can be improved. In addition, because the Online session resource text can relate to the fields of metauniverse, digital service and the like, the method and the software product have high reusability and high expandability
The foregoing is only a specific embodiment of the present invention. Variations and alternatives will occur to those skilled in the art based on the detailed description provided herein and are intended to be included within the scope of the invention.