CN114610772A - User portrait mining method based on big data and cloud computing server - Google Patents

User portrait mining method based on big data and cloud computing server Download PDF

Info

Publication number
CN114610772A
CN114610772A CN202210076579.4A CN202210076579A CN114610772A CN 114610772 A CN114610772 A CN 114610772A CN 202210076579 A CN202210076579 A CN 202210076579A CN 114610772 A CN114610772 A CN 114610772A
Authority
CN
China
Prior art keywords
data
service data
service
attribute
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210076579.4A
Other languages
Chinese (zh)
Inventor
龚世燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202210076579.4A priority Critical patent/CN114610772A/en
Publication of CN114610772A publication Critical patent/CN114610772A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

When the method is implemented, due to the fact that anonymization processing is carried out on service data to be analyzed based on reference service data, attribute privacy levels and privacy risk indexes (the degree of negative influence generated after user information is stolen) can be considered, so that when the anonymization processing is carried out, the service data to be analyzed can be divided into user attribute information to be correspondingly processed, not only can the privacy of the user be guaranteed not to be leaked, but also the group attributes of most users can be guaranteed to be reflected to the greatest extent through the service data after anonymization processing, so that the individual privacy of the users can be protected, and the user portrait mining requirements of a service provider platform can be met.

Description

User portrait mining method based on big data and cloud computing server
The application is a divisional application with the application number of "202110109932. X", the application date of "26.01.2021", and the application name of "big data analysis method and cloud computing server under the associated cloud service scene".
Technical Field
The application relates to the technical field of cloud services and big data, in particular to a user portrait mining method based on big data and a cloud computing server.
Background
The advent of the big data age, from some perspectives, is the result of the combination of massive data with perfect computing power. Particularly, mass data are generated by the mobile internet, the internet of things and the like, and the problem of processing of collection, storage, calculation, analysis and the like of the mass data is perfectly solved by the big data computing technology. At present, some enterprises and companies establish departments related to big data, so that the big data is highly valued by the enterprises and companies.
Generally, big data has value in mining and applying the big data, and the scene application of the big data can be mainly divided into the following categories: the first is function, the second is data source, the third is data analysis, the fourth is industry, and the fifth is user portrayal. For a functional big data application scene, from the longitudinal angle of big data scene application, application scenes of big data analysis in various functional fields are introduced, and the big data scenes and cases of accurate marketing, data wind control, efficiency improvement, decision support and product operation are mainly introduced. For a big data application scenario of a data source, from the viewpoint of data type and data source, companies which currently own the data source on the market, the data source, the data type and the data application case are introduced. For a big data application scene of data analysis, a common data mining and statistical analysis method, a common data mining and statistical analysis model and a common data mining and statistical analysis algorithm are introduced from the perspective of data analysis, and for industry and user figures, the two methods are currently mainstream big data scene applications. With the continuous improvement of Cloud Computing (Cloud Computing) services, the current application of big data scenes needs to be performed in combination with various associated Cloud service scenes, however, for service providers, the application of big data mainly lies in the mining of user figures, but this is always a pain point of many service providers.
The inventor has found through research and analysis that the main cause of the above problems is caused by mismatch between business data and data application scenarios, and therefore, in order to meet the user portrait mining requirements of service providers, data application scenarios corresponding to some business data need to be accurately determined.
Disclosure of Invention
One of the embodiments of the present application provides a big data-based user portrait mining method, which is applied to a cloud computing server, where the cloud computing server communicates with a user end device and a service provider platform, and the method includes: extracting associated service data corresponding to the service interaction event identifier from the acquired service data to be analyzed containing the service interaction event identifier; performing data analysis and scene recognition on the associated service data corresponding to the service interaction event identification through a service data analysis model which is trained in advance to obtain a recognition result of i data application scenes; and the identification result of the data application scene is used for indicating the service provider platform to carry out user portrait mining so as to realize optimization of service products.
The second embodiment of the present application provides a cloud computing server, which includes a processing engine, a network module, and a memory; the processing engine and the memory communicate through the network module, and the processing engine reads the computer program from the memory and operates to perform the above-described method.
In the embodiment of the application, because the data analysis and the scene recognition are performed based on the service data analysis model which is trained in advance, and the associated service data processed by the service data analysis model is extracted from the acquired service data to be analyzed including the service interaction event identifier and corresponds to the service interaction event identifier, different interaction conditions of the service data can be taken into consideration by using the artificial intelligence model, so that accurate recognition for different data application scenes is ensured, and high correlation between the service data to be analyzed and the data application scenes can be ensured. Therefore, high matching between the service data and the data application scene can be ensured, the service provider platform can realize user portrait mining of the service data according to the recognition results of different data application scenes, the user portrait mining requirements of the service provider are met, and optimization of service products is realized.
In the description that follows, additional features will be set forth, in part, in the description. These features will be in part apparent to those skilled in the art upon examination of the following and the accompanying drawings, or may be learned by production or use. The features of the present application may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations particularly pointed out in the detailed examples that follow.
Drawings
The present application will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is an exemplary system scene architecture diagram, shown in accordance with some embodiments of the present invention;
FIG. 2 is a flow diagram illustrating an exemplary big data based user portrait mining method and/or process, according to some embodiments of the invention;
FIG. 3 is a block diagram of an exemplary big data based user representation mining device, according to some embodiments of the invention; and
fig. 4 is a schematic diagram illustrating hardware and software components in an exemplary cloud computing server, according to some embodiments of the invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
The user portrait mining method based on big data according to the embodiment of the present invention may be applied to a system scene architecture as shown in fig. 1, where the system scene architecture includes a user end device 10, a cloud computing server 20, and a service provider platform 30.
The customer premise equipment 10 may correspond to the cloud service interaction state, and is configured to generate the service data or the operation data corresponding to the service user in the cloud service interaction state, and transmit the generated service data or operation data to the cloud computing server 20. The cloud service interaction state may be understood as a state in which the customer premises device 10 and the facilitator platform 30 interact with each other, and the service may be provided by the facilitator platform 30.
The cloud computing server 20 is configured to process service data generated by the user end device 10, and perform data application scenario analysis on a service interaction event identifier included in the service data; or the cloud computing server 20 is configured to determine a service interaction event identifier from the operation data generated by the user end device 10, and perform data application scenario analysis on the service interaction event identifier. The cloud computing server 20 may also send the data application scenario analysis result to the service provider platform 30, or generate an application scenario analysis report according to the data application scenario analysis result and then send the application scenario analysis report to the service provider platform 30. It is to be appreciated that the facilitator platform 30 can implement corresponding user portrait mining based on the data application scenario analysis results, thereby enabling updating and optimization of later products or services. The user portrait mining method based on big data provided by the embodiment of the invention can be executed through the cloud computing server 20.
The service provider platform 30 may be a central service device of a service provider in a cloud service interaction state, and the service provider may know a user portrait label of a service user in a corresponding service interaction process through a data application scenario analysis result or an application scenario analysis report of the cloud computing server 20, or may perform targeted product research and development or service update and the like for a certain service user group. For example, for a certain business service product, business data generated by most business users in the product use interaction process corresponds to a data application scenario of a shopping category, while for another business service product, business data generated by most business users in the product use interaction process corresponds to a data application scenario of an office category, so that a facilitator can perform targeted product service upgrade on the data application scenario of the shopping category in the two business service products. Or, for a certain business service product, most of the group of business users corresponding to the data application scenario of the shopping category is young males, so that when the product service is upgraded, the group can be targeted to the young males, and further analysis of potential interest needs and targeted product service upgrade are performed based on the data application scenario of the shopping category corresponding to the business data.
Data transmission between the user end device 10, the cloud computing server 20, and the service provider platform 30 may be performed through a Wireless network (Wireless network) or a Wired network (Wired network), where the Wireless network may be, for example, a Wireless local area network (Wireless LAN, WLAN), a cellular network, or the like, and is not limited herein.
Of course, the user portrait mining method based on big data provided in the embodiment of the present invention is not limited to be used in the system scenario architecture shown in fig. 1, and may also be used in other possible system scenario architectures, which is not limited in the embodiment of the present invention.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
It can be understood that the technical solution of the user portrait mining method based on big data provided by the embodiment of the present invention can be summarized as follows: extracting associated service data corresponding to the service interaction event identifier from the acquired service data to be analyzed containing the service interaction event identifier; performing data analysis and scene recognition on the associated service data corresponding to the service interaction event identification through a service data analysis model which is trained in advance to obtain a recognition result of i data application scenes; and the identification result of the data application scene is used for indicating the service provider platform to carry out user portrait mining so as to realize the optimization of the service product.
In the embodiment of the invention, because the data analysis and the scene recognition are carried out based on the service data analysis model which is trained in advance, and the associated service data processed by the service data analysis model is extracted from the acquired service data to be analyzed containing the service interaction event identifier and corresponds to the service interaction event identifier, different interaction conditions of the service data can be taken into account by utilizing the artificial intelligence model, so that the accurate recognition aiming at different data application scenes is ensured, and the high correlation between the service data to be analyzed and the data application scenes can be ensured. Therefore, high matching between the service data and the data application scene can be ensured, the service provider platform can realize user portrait mining of the service data according to the recognition results of different data application scenes, the user portrait mining requirements of the service provider are met, and optimization of service products is realized.
Optionally, the technical solution of the user portrait mining method based on big data provided by the embodiment of the present invention can be summarized as follows: determining associated service data based on the service data to be analyzed; determining the recognition result of i data application scenes corresponding to the associated service data based on a preset service data analysis model; and the identification result of the data application scene is used for indicating the service provider platform to carry out user portrait mining so as to realize the optimization of the service product.
It is to be understood that the above two further descriptions of the summary of the present solution technique can be referred to the method steps shown in fig. 2.
Referring to fig. 2, another embodiment of a method for mining a user portrait based on big data according to an embodiment of the present invention is provided, and the method may be applied to the system scenario architecture shown in fig. 1, and the method may be executed by a data application scenario analysis device provided by an embodiment of the present invention, and the data application scenario analysis device may be implemented by, for example, the cloud computing server 20 shown in fig. 1. The flow of the method is described below.
Step 201: and extracting associated service data corresponding to the service interaction event identifier from the acquired service data to be analyzed containing the service interaction event identifier.
In this embodiment of the present invention, the service data to be analyzed may be service data generated by the user end device 10 as shown in fig. 1, or the service data to be analyzed may also be service data including a service interaction event identifier determined from operation data generated by the user end device 10, or the service data to be analyzed may also be service data uploaded to the cloud computing server 20 by a service user or other service provider platform through a communication network. The service data to be analyzed may be a data set without timeliness, or may also be a data stream with timeliness, for example, the service data generated by the user end device 10 is usually operation data, and then the service data to be analyzed may be a data stream with timeliness determined from the operation data and including the same service interaction event identifier. Of course, the obtaining manner of the service data to be analyzed may also include other possible manners, which is not limited in this embodiment of the present invention.
In order to improve the efficiency of the data application scenario analysis process, after the service data is obtained, appropriate data preprocessing may be performed on the service data first, and then the service data after the data preprocessing is used as the service data to be analyzed input to the service data analysis model.
In a specific implementation process, when there is service data input, it may be detected whether the input service data includes a service interaction event identifier, if the input service data includes the service interaction event identifier, the input service data is subjected to data preprocessing, and if the input service data does not detect the service interaction event identifier, the service data is skipped over, and a next group of input service data is processed continuously. The method for extracting the associated service data corresponding to the service interaction event identifier from the acquired service data to be analyzed containing the service interaction event identifier specifically comprises the following steps:
step 301: an identification feature in the service interaction event identification is detected.
The detection of the identification feature refers to determining a relative position of the identification feature in the service interaction event identifier, where the identification feature may be a number or a letter, or a spliced segment of a plurality of service data feature contents on the service interaction event identifier.
Step 302: and verifying the service interaction event identification.
Since there is a possibility that there is a deviation in the service interaction event identifier in the input service data, it can be determined whether there is a deviation in the service interaction event identifier through the identifier feature. For example, when numbers are used as the identification features, the correlation index between the queue centers of the two digital queues and the preset reference index may be compared and analyzed, and if a certain difference exists between the correlation index between the queue centers of the two digital queues and the preset reference index, it indicates that the service interaction event identification has a deviation, the service data may be corrected, so that the difference between the correlation index between the queue centers of the two digital queues and the preset reference index is eliminated as much as possible. Through the adjustment of the data structure of the business data movement, the difference between the relevance index and the preset reference index between the queue centers of the two digital queues can be eliminated as much as possible.
Step 303: and simplifying the service interaction event identification.
Specifically, other procedural identifiers except the service interaction event identifier are removed to obtain associated service data corresponding to the service interaction event identifier, so that the interference of redundant service data characteristic contents on data application scene analysis can be reduced, and meanwhile, the calculated amount in a training process or an analysis process is reduced. When the identifier reduction is performed, the service data may be reduced according to a preset data traffic processing threshold, where the preset data traffic processing threshold may be XXXmb/s, for example.
In the embodiment of the invention, the acquired service data is likely to be temporary data, but the temporary data is not a decisive factor for the analysis result of the data application scene, so that when the temporary data is acquired, the temporary data can be correspondingly converted, for example, into data blocks meeting the analysis condition, and thus, the calculation amount in the training process or the analysis process can be obviously reduced. Specifically, the conversion process of the temporary data may be completed before the feature is identified, or may be performed after the identifier is reduced, which is not limited in the embodiment of the present invention.
Step 202: extracting local service data characteristic content and global service data characteristic content from the associated service data corresponding to the service interaction event identifier through a service data analysis model, wherein the local service data characteristic content comprises portrait information of an interaction event label in the associated service data corresponding to the service interaction event identifier and a detection result of the association degree of each data fragment, and the global service data characteristic content comprises the change condition of the interaction event state in the associated service data corresponding to the service interaction event identifier.
In the embodiment of the present invention, after the data preprocessing of the input service data is completed, the associated service data corresponding to the service interaction event identifier obtained by the data preprocessing may be input into a service data parsing model trained in advance, and then the local service data feature content and the global service data feature content are extracted from the associated service data corresponding to the service interaction event identifier through the service data parsing model, where the local service data feature content may include portrait information of an interaction event tag in the associated service data corresponding to the service interaction event identifier and a detection result of an association degree of each data fragment, and the global service data feature content may include a change condition of an interaction event state in the associated service data corresponding to the service interaction event identifier.
The business data analysis model is obtained by carrying out sample training on a training sample set corresponding to a plurality of business interaction event identifications, and the training sample set corresponding to each business interaction event identification is marked with the recognition result of i data application scenes in advance. The training process of the service data analysis model will be specifically described in the following embodiments, and will not be described in detail herein.
Step 203: splicing the extracted service data characteristic contents through a service data analysis model, and carrying out scene recognition on the spliced service data characteristic contents according to a scene recognition network obtained through sample training in the service data analysis model to obtain recognition results of i data application scenes, wherein i is a positive integer greater than 1.
In the embodiment of the invention, splicing can be performed according to the extracted service data characteristic content, and the identification result of the i data application scenes in the associated service data corresponding to the service interaction event identifier is judged according to the spliced service data characteristic content and the scene identification network obtained by sample training.
Specifically, the i data application scenarios may be categories of common data application scenarios, for example, the i data application scenarios may include 7 data application scenarios, which are a shopping category scenario, an office category scenario, an industrial production category scenario, a government service category scenario, a game category scenario, a smart city monitoring category scenario, and a smart medical category scenario, and of course, the i data application scenarios may also include other possible data application scenarios, which are not described herein repeatedly.
In the embodiment of the invention, the business data analysis model is obtained by training a training sample set corresponding to a plurality of business interaction event identifications, and the training sample set corresponding to each business interaction event identification is marked with the recognition result of i data application scenes in advance. The model training refers to a process of performing data application scene analysis on a training sample set corresponding to a service interaction event identifier in the training sample set through an original machine learning model, then performing comparison analysis on a data application scene analysis result and an actual data application scene result, and continuously updating model parameters of the original machine learning model according to a difference comparison result between the data application scene analysis result and the actual data application scene result until the test accuracy of the finally obtained model can meet a set accuracy requirement.
Before the training of the service data analysis model is performed through the training sample set, the training sample set corresponding to each service interaction event identifier in the training sample set needs to be labeled in advance.
Specifically, for a training sample set corresponding to a service interaction event identifier, taking i data application scenarios as 7 data application scenarios, namely a shopping category, an office category, an industrial production category, a government affair service category, a game category, a smart city monitoring category and a smart medical category, as examples, each data application scenario of the training sample set corresponding to the service interaction event identifier can be pre-labeled through a pre-historical processing record, and thus, for the training sample set corresponding to each service interaction event identifier, after pre-labeling, 7 groups of data application scenario recognition result distributions can be obtained. For example, the identification results of 7 data application scenarios obtained by pre-marking a training sample set corresponding to one service interaction event identifier are taken as an example for explanation, where a shopping category is a main data application scenario, an office category scenario is a secondary main data application scenario, an identification result of a government affair service category scenario is inferior to the office category scenario, and identification results of other data application scenarios are all represented and are not met.
Due to the fact that the pre-marking of the data application scenes is weak in objectivity, in order to enable the distribution of the recognition results of the pre-marked data application scenes to be more accurate, the training sample set corresponding to each service interaction event identifier can be pre-marked by a plurality of pre-marking strategies, and finally the global fusion result of the recognition results of each data application scene pre-marked by the plurality of pre-marking strategies is taken as the final recognition result. Illustratively, for a set of traffic data, pre-tagging is performed by several pre-tagging policies.
It is understood that for the original machine learning model, the composition of the original machine learning model may include a data input layer, j content extraction layers, j content correction layers, a full connection layer, and a model evaluation layer (lossy layer), j being a positive integer. The model training process of the embodiment of the present invention will be described below with reference to the above original machine learning model, and in addition, the processing performed on each layer will also be performed in the description of the training process, which will not be described herein again.
In the embodiment of the invention, the training of the model is a process of deep learning for a plurality of times on the training sample set corresponding to the business interaction event identifier in the training sample set, and the process of deep learning each time is also a process of analyzing the training sample set corresponding to the business interaction event identifier. In a specific implementation process, because the number of training sample sets corresponding to the service interaction event identifiers included in the training sample sets is large, a large amount of time is consumed in one machine learning process, and therefore, only the training sample sets corresponding to part of the service interaction event identifiers in the training sample sets can be learned in each learning process. Specifically, the training sample set corresponding to part of the service interaction event identifiers may be randomly reserved from the training sample set, the number of the training sample set corresponding to part of the service interaction event identifiers may be set according to experience, or may be set according to the amount of the training sample set corresponding to the service interaction event identifiers included in the training sample set, and in addition, the training sample set may be randomly sampled through a preset probability distribution algorithm to perform training learning.
In the embodiment of the present invention, the learning/training processes of the training sample sets corresponding to different service interaction event identifiers are the same, so the learning/training process is described below by taking the training sample set corresponding to one service interaction event identifier as an example.
Step 601: and the data input layer receives a training sample set corresponding to the service interaction event identification.
Specifically, the data input layer may perform data preprocessing on the training sample set corresponding to the received service interaction event identifier to obtain associated service data corresponding to the service interaction event identifier, or the service data received by the data input layer may also be associated service data corresponding to the service interaction event identifier after data preprocessing, and for the data preprocessing process, reference may be made to the description of the above-described embodiment, which is not described herein again.
Step 602: the content extraction processing is performed j times by j content extraction layers.
After the associated service data corresponding to the service interaction event identifier is input into the data input layer, the first content extraction layer is entered to perform content extraction processing. For the computer device performing data processing, the training sample set corresponding to the service interaction event identifier is stored in a form of a service data stream record, so that the subsequent processing on the training sample set corresponding to the service interaction event identifier is also performed based on the service data stream record. Correspondingly, in the content extraction layer, the service data stream record of the associated service data corresponding to the service interaction event identifier is subjected to content extraction processing according to a content extraction algorithm with a preset time interval and a preset data traffic processing threshold.
The content extraction algorithm is a processing algorithm (which can be generally understood as a convolution process) for locally associated traffic data streams in the traffic data stream record. For the service data, the locally associated service data streams are closely related in time, for example, the service data streams with the closer generation times generally have the same data characteristics, so that the correlation between the service data streams with the closer generation times is stronger, whereas the correlation between the service data streams with the longer time interval between the generation times is weaker, and therefore, the relevant characteristic content of the corresponding service data can be obtained by processing the locally associated service data streams of the service data and splicing the locally associated service data streams.
The content extraction processing is a process of combining a content extraction algorithm with a preset data flow processing threshold value and each service record of the generalization index of the content extraction algorithm of the round, then performing content identification and analysis, then moving to the next content extraction algorithm of the round according to a preset time interval, and performing content identification and analysis after combining the content extraction algorithm of the next round and each service record of the generalization index of the content extraction algorithm of the round. The smaller the data traffic processing threshold of the content extraction algorithm is, the higher the identification accuracy of the service data is, the more the information amount of the feature content acquired from the service data is, and correspondingly, the larger the calculation amount of the whole content extraction process is, so that the data traffic processing threshold of the content extraction algorithm can be measured according to the actual situation when being selected, for example, the preset data traffic processing threshold may be 10Mb/s, and of course, the data traffic processing threshold of the content extraction algorithm may also be other possible values.
Generally, the preset time interval may be set to 1min, and of course, the preset time interval may also be set to other values, for example, the preset time interval may be set to 2min or 3min, which is not limited in this embodiment of the present invention.
For example, for the technical solution of performing content extraction processing by the content extraction layer, the traffic processing threshold of the traffic data flow record of the training sample set corresponding to the traffic interaction event identifier may be, for example, 15Mb/s, but for convenience of illustration, part of the sub-records in the traffic data flow record are selected to be illustrated, for example, the traffic data flow record with the traffic processing threshold of 3Mb/s, the preset data traffic processing threshold of the content extraction algorithm is 5Mb/s, and the preset time interval is 1 min.
When content extraction processing is performed on the service data stream records of the training sample set corresponding to the service interaction event identifier, a first round of content extraction algorithm is started, for example, the first round of content extraction algorithm is combined with the service records of the corresponding part in the generalization index records of the round of content extraction algorithm, then content identification and analysis are performed, and an analysis result r4 is obtained. And after the processing of all the content extraction algorithms is completed, the service data characteristic content block after the content extraction processing can be obtained. The service data feature content block comprises local service data feature content and global service data feature content extracted through a content extraction layer, and when the service data feature contents are different, the identification results of corresponding data application scenes may be different. The j content extraction layers are usually the first content extraction layers for extracting the local service data feature content, and the last content extraction layers for extracting the global service data feature content, which may be specifically set according to the actual application, for example, when j is 24, the local service data feature content may be extracted through the first 8 content extraction layers, and the global service data feature content may be extracted through the last 16 content extraction layers.
In the original machine learning model, the generalized index records of each content extraction algorithm are randomly configured, and then the recorded information in the generalized index records is continuously updated by performing sample training on the original machine learning model.
In the embodiment of the invention, the number j of the content extraction layers can be adaptively adjusted according to the prior relevant business processing, or updated according to the modeling training process of the actual machine learning model. For example j may be 10, although j may also be other possible values, such as 5, 15, 20, 21, 22, 23, 24, 25, 30, etc. Generally, it is reasonable to choose between 5 and 20, but it is not excluded that some computer devices with higher computational performance may set the number of content extraction layers to 30.
Step 603: the content correction processing is performed j times by j content correction layers.
After the content extraction processing, the dispersion of the content in the obtained business data feature content blocks after the content extraction processing may be large, and the interference degree between the content blocks may also be large, which is not beneficial to the convergence of the machine learning model, so that a content correction layer may be arranged after each content extraction layer to convert the content blocks in the business data feature content blocks after the content extraction processing into more concentrated content blocks, thereby accelerating the convergence speed of the model.
Step 604: and obtaining the service data characteristic content records of the i groups through at least one full connection layer.
In the embodiment of the invention, a full connection layer is a process of combining the service data characteristic content block after content correction with a preset content record. The at least one fully connected layer may include a plurality of fully connected layers with high dimensions and a fully connected layer with an i dimension, the number of fully connected layers with high dimensions may be 3, for example, and the degree of dimension may be 256 or 512, or other possible values, the number of dimensions i of the fully connected layers of the i group is the same as the type i of the data application scenario, for example, the type i of the data application scenario is 7, and then the dimension i of the fully connected layers of the i group is also 7. It is understood that the service data feature content records of the i group may be understood as service data feature content records with dimension i, for example, the service data feature content records of the i group may be: { record 1, record 2, record 3,.. record i-1, record i }.
In the embodiment of the invention, the mapping processing based on the i-dimensional characteristics can be finally carried out on the service data characteristic contents in the service data characteristic content block after the content correction through the combination with the preset content record of the i group, so that the i data generalization indexes in the obtained service data characteristic content record of the i group correspond to the identification results of the i data application scenes one by one, and the identification results of the i data application scenes are obtained.
The processing process of the full connection layer is substantially the process of splicing and classifying the previously extracted service data characteristic contents, that is, the preset content record can correspond to a scene recognition network, and the process of continuously updating the preset content record in the training process can be understood as the process of training a sample to obtain the scene recognition network, so that the finally obtained preset content record in the service data analysis model can achieve the technical effects of splicing and classifying the service data characteristic contents accurately enough.
Step 605: and determining a difference comparison result between the predicted identification result of the i data application scenes and the identification result of the i data application scenes marked in advance through the model evaluation layer, and updating model parameters of the original machine learning model according to the difference comparison result to obtain a service data analysis model.
The predicted identification result of the i data application scenarios refers to the i data generalization indexes in the service data feature content records of the i groups.
In the embodiment of the invention, the difference comparison result between the predicted identification result of the i data application scenes and the pre-marked identification result of the i data application scenes can be determined through a cross entropy loss algorithm. Generally, where previous-result represents the recognition result of the i data application scenarios labeled in advance, text-result represents the recognition result of the i data application scenarios predicted, and f (previous-result, text-result) is the cross entropy of the previous-result and the text-result, that is, the difference comparison result between the recognition result of the i data application scenarios predicted and the recognition result of the i data application scenarios labeled in advance, the smaller the cross entropy, the smaller the difference degree corresponding to the difference comparison result.
Illustratively, if i is 3, and is a shopping category scene, an office category scene and a game category scene respectively, the predicted recognition results of the 3 data application scenes are 0.6, 0.8 and 0.24 in sequence, and the recognition results of the 3 data application scenes marked in advance are 1, 0 and 1 in sequence, then:
f(previous-result,text-result)
=-(1*log0.6+0*log0.8+1*log0.24)=0.8416。
that is, the difference degree corresponding to the comparison result of the difference between the recognition result of the predicted i kinds of data application scenes and the recognition result of the i kinds of data application scenes marked in advance is 0.8416. In the embodiment of the present invention, a difference comparison result between the predicted identification result of the i data application scenarios and the identification result of the i data application scenarios marked in advance may also be obtained through an euclidean distance algorithm, and of course, the difference comparison result may also be determined through other possible loss algorithms, which is not illustrated here.
In the embodiment of the invention, if the difference degree corresponding to the obtained difference comparison result is determined to be greater than or equal to the preset difference degree threshold value, the model parameters of the original machine learning model are updated according to the difference comparison result. The model parameters of the original machine learning model mainly include parameters corresponding to generalized index records of each content extraction algorithm in the content extraction layer and parameters corresponding to at least one preset content record in the full connection layer, and if the content correction layer further includes a content correction coefficient, the parameters of the original machine learning model further include the content correction coefficient. Specifically, the learning rate in the gradient descent algorithm is an important parameter in machine learning, which affects the speed of updating the model parameters of the machine learning model based on the loss gradient, and generally speaking, the learning rate is higher, the learning speed of the model is higher, but when the learning rate is too high, the parameters of the model may not be accurately updated, so that an appropriate value needs to be set.
In the embodiment of the invention, after the update record of the model parameters of the original machine learning model is obtained, the updated model parameters of the original machine learning model can be determined according to the update record, the sample training is continuously carried out for many times according to the updated original machine learning model until the difference degree corresponding to the difference comparison result is smaller than the preset difference degree threshold value, and the original machine learning model updated for the last time is used as the business data analysis model. When updating the parameters, the parameters may be updated through a back propagation algorithm (BP) algorithm, or may be updated through other algorithms, which is not limited herein.
In an alternative embodiment, since the process of processing the service data stream record or the service data feature content block is low in noise resistance, the obtained service data analysis model is also a machine learning model with low noise resistance. However, the processing of business data in practical applications is relatively complex and variable, and therefore, it is difficult to accurately express the business data only through a machine learning model with low noise resistance, and it is necessary to introduce relevant assistance processing indexes of noise content to improve the expression capability, generalization capability and model stability of the machine learning model. Further, on the basis of the above, the noise content rejection process may be added by the following steps 801 to 806.
Step 801: and the data input layer receives a training sample set corresponding to the service interaction event identification. Step 802: the content extraction processing is performed j times by j content extraction layers.
Step 803: the content correction processing is performed j times by j content correction layers.
Step 804: and carrying out noise content elimination processing on the service data characteristic content block after content correction to obtain the service data characteristic content block after noise content elimination.
Step 805: and obtaining the service data characteristic content records of the i groups through at least one full connection layer.
Step 806: and determining a difference comparison result between the predicted identification result of the i data application scenes and the identification result of the i data application scenes marked in advance through the model evaluation layer, and updating the parameters of the original machine learning model according to the difference comparison result to obtain a service data analysis model.
Steps 801 to 803 and steps 805 to 806 are the same as the corresponding embodiments, and therefore, the description of the corresponding parts is referred to for these steps, and will not be repeated herein.
In the embodiment of the invention, after each content correction layer, noise content elimination processing can be carried out once, so that the service data characteristic content block with the eliminated noise content is obtained, and then the service data characteristic content block with the eliminated noise content is input to the full connection layer and is the service data characteristic content block with the eliminated noise content for the last time. Specifically, the noise content elimination processing is realized by a noise content elimination algorithm, and the noise content elimination algorithm can adopt a rectification algorithm with low noise resistance.
By carrying out noise content elimination processing, relevant assistance processing indexes of noise content are added to the trained machine learning model, the noise content processing capacity of the machine learning model is increased, the complexity of the business data analysis model is increased, and the accuracy of the business data analysis model is further improved.
In an alternative embodiment, since some service data feature contents with time delay exist in the service data feature content block, it is necessary to perform a timing update process on the service data feature content block. For example, steps 901 to 906 are related steps to which the timing update process is added.
Step 901: and the data input layer receives a training sample set corresponding to the service interaction event identification.
Step 902: the content extraction processing is performed j times by j content extraction layers.
Step 903: the content correction processing is performed j times by j content correction layers.
Step 904: and performing time sequence updating processing on the service data characteristic content block after content correction to obtain the service data characteristic content block after time sequence updating.
Step 905: and obtaining the service data characteristic content records of the i groups through at least one full connection layer.
Step 906: and determining a difference comparison result between the predicted identification result of the i data application scenes and the identification result of the i data application scenes marked in advance through the model evaluation layer, and updating the parameters of the original machine learning model according to the difference comparison result to obtain a service data analysis model.
In the embodiment of the present invention, a time sequence updating layer may be added after one or more content extraction layers of the j content extraction layers to obtain the service data feature content block after time sequence updating, and then the service data feature content block after time sequence updating is input to the content correction layer. For example, if the number of content extraction layers is 15, a timing update layer may be set after the 3 rd, 6 th, 9 th and 12 th content extraction layers. Or, a time sequence updating layer may be added after one or more content correction layers of the j content correction layers to obtain a service data feature content block after time sequence updating, and then the service data feature content block after time sequence updating is input to the full connection layer.
Steps 901 to 903 and steps 905 to 906 are the same as the contents of the above embodiments, and therefore, the description of the corresponding parts for these steps is referred to and will not be described in detail herein.
Therefore, the service data characteristic content with strong relevance in the service data characteristic content block can be subjected to time sequence processing through time sequence updating processing, so that the service data characteristic content block subjected to time sequence updating processing can keep main service data characteristic content in the service data characteristic content block and remove service data characteristic content with time delay in the service data characteristic content block, and therefore the influence of the service data characteristic content with time delay on model training is reduced. In addition, the number of the characteristic contents of the service data is reduced, and the subsequent calculation amount is correspondingly reduced, so that the model training speed and the data application scene analysis speed are accelerated, and the real-time service requirement is met. In the embodiment of the present invention, the noise content elimination processing and the time sequence updating processing may also be added to the original machine learning model for training at the same time, and a person skilled in the art may flexibly select the noise content elimination processing and the time sequence updating processing according to actual requirements, which is not limited in the embodiment of the present invention. In the embodiment of the invention, after the service data analysis model is obtained by training, whether the identification result of the data application scene obtained by analyzing the service data analysis model is accurate can be verified, the service data analysis model can be verified by verifying the sample set, and the verification process is substantially similar to the training process, so that the process is not repeated. If the accuracy of the business data analysis model obtained through the verification of the verification sample set can meet the set business requirements, the business data analysis model can be used for analyzing the data application scene.
It can be understood that, based on the service data analysis model obtained by the training, the flow of analyzing the service data to be analyzed to obtain the data application scenario analysis result is as follows.
Step 1201: and the data input layer extracts the associated service data corresponding to the service interaction event identification from the acquired service data to be analyzed.
Step 1202: and performing content extraction processing for j times on the service data stream record of the associated service data corresponding to the service interaction event identifier through j content extraction layers. In the embodiment of the present invention, after the training of the service data analysis model is completed, the generalized index record for combining with each content extraction algorithm of the service data stream record in the j content extraction layers is already determined, and then in the analysis process of the service data to be analyzed, each content extraction algorithm in the service data stream record of the associated service data corresponding to the service interaction event identifier and the generalized index record at the corresponding position determined in the service data analysis model are subjected to content extraction processing. Wherein, the data flow processing threshold value and the time interval of the content extraction algorithm are also determined in the service data analysis model. The content extraction layer can extract the portrait information of the interactive event label and the detection result of the relevancy of each data segment from the relevant service data corresponding to the service interactive event identification, and the service data characteristic content such as the change condition of the interactive event state in the relevant service data corresponding to the service interactive event identification, so as to provide the service data characteristic content for the subsequent network layer to perform scene recognition of the data application scene.
Step 1203: and performing content correction processing on the service data characteristic content blocks subjected to the content extraction processing for j times through j content correction layers. After each content extraction layer, content correction processing is carried out on the service data characteristic content blocks after the content extraction processing through one content correction layer, so that the convergence speed of the processing process is increased, and the analysis speed is increased.
Step 1204: and carrying out noise content elimination processing on the service data characteristic content block after content correction to obtain the service data characteristic content block after noise content elimination.
Step 1205: and performing time sequence updating processing on the service data characteristic content block with the noise content removed to obtain a service data characteristic content block with the time sequence updated. In the embodiment of the present invention, the time sequence updating layer may be after the content correction layer, the content extraction layer, or the noise content elimination algorithm, and the above steps take the time sequence updating layer after the noise content elimination algorithm as an example. The time sequence updating layer can perform time sequence processing on the service data characteristic content with strong relevance in the service data characteristic content block, and reduce the service data characteristic content with time delay in the service data characteristic content block, so that the interference of the service data characteristic content with time delay on an analysis result is reduced, and the robustness of a model obtained by training is improved.
Step 1206: and obtaining the service data characteristic content records of the i groups through at least one full connection layer. Similarly, after the training of the service data analysis model is completed, the preset content record in at least one full connection layer is also determined, and then in the analysis process of the service data to be analyzed, the service data feature content block input into the full connection layer is combined with the preset content record determined in the service data analysis model. The full connection layer can splice and identify scenes of the service data characteristic contents extracted from the preorder layers through preset content records obtained by training, so that i groups of service data characteristic content records are output, and each group of records in the i groups can represent an identification result of a data application scene dimension, so that the identification result of i data application scenes is obtained.
In the embodiment of the invention, in the trained service data analysis model, the local and global service data feature contents in the associated service data corresponding to the original service interaction event identifier are extracted and processed to a certain extent mainly through a content extraction layer, a content correction layer, a noise content elimination algorithm, a time sequence updating layer and the like, and the extracted service data feature contents are spliced and classified through a full connection layer, so that the identification result of i data application scenes in the associated service data corresponding to the original service interaction event identifier is obtained. Because the processing method of each layer in the analysis process is the same as the corresponding part in the training process, the description of the corresponding part in the training process may be referred to for the processing process of each layer, which is not described herein in detail. It should be understood that, although step 1204 and step 1205 are shown in the above flow, it should be understood that step 1204 and step 1205 are not optional steps, and furthermore, those skilled in the art can flexibly adjust the execution sequence of step 1205, and are not limited herein.
The data application scene analysis is not only used in the product optimization and upgrade aspects of a certain business service product, and the service provider can also know the evaluation condition of business users in corresponding business handling or business interaction according to the result of the data application scene analysis, and if more business users correspond to negative evaluation, the root of the negative evaluation fed back by the business users can be searched according to the specific data application scene analysis result, so that the business optimization and processing are carried out in time, and the business requirements of the business users are met as far as possible.
In summary, in the embodiment of the present invention, a service data analysis model may be used to perform data application scenario analysis on a service interaction event identifier in service data to be analyzed, so as to output a recognition result of multiple data application scenarios occurring in the service interaction event identifier. Firstly, the output data application scenario analysis result is the recognition result of a plurality of data application scenarios occurring in the service interaction event identifier, and because a plurality of data application scenarios may exist in the service interaction event identifier at the same time, the data application scenario expressing the service interaction event identifier through the distribution of the recognition results of the plurality of data application scenarios can be more accurate.
Secondly, in the analysis process of the embodiment of the invention, the analysis result of the data application scene can be obtained only by inputting the service data to be analyzed into the service data analysis model, and compared with the prior technical scheme of firstly extracting the characteristic content of the service data and then classifying the characteristic content of the service data, the operation steps are simpler and more convenient.
In addition, in the service data analysis model provided by the embodiment of the invention, the complex service data characteristic content representation is directly learned from the training sample set corresponding to the service interaction event identifier through the content extraction layer, the content correction layer and the full connection layer, so that the finally obtained service data analysis model has stronger expression capability, the noise anti-interference capability of the model is increased through noise content elimination, and the expression capability of the service data analysis model is further enhanced.
By the design, after the identification results of the data application scenes are obtained, the identification results can be issued to the corresponding service provider platform, so that the service provider platform can perform user portrait mining on the service data based on different data application scenes based on the identification results, the correlation among the data mining results, the service data and the data service scenes is ensured, and accurate and reliable decision basis can be provided for subsequent product service optimization of the service provider platform.
It can be understood that, on the basis of the above content, the service provider platform may request to obtain the service data to be analyzed and the identification result of the corresponding data application scenario, so as to perform user portrait mining, but when performing user portrait mining, it is necessary to ensure that the individual privacy of the user is not revealed, and therefore, the cloud computing server needs to perform anonymization processing on the service data to be analyzed and then send the service data to the service provider platform, so as to avoid revealing the user privacy caused by excessive mining behavior that the service provider platform may implement, and to achieve the purpose, the technical solution may further include the following content: responding to a calling request uploaded by a service provider platform, wherein the calling request is used for requesting to call the service data to be analyzed and the identification result of the data application scene; performing data protection processing on the service data to be analyzed based on the calling request and the identification result of the data application scene to obtain target service data; and issuing the target service data and the identification result of the data application scene to the service provider platform so as to enable the service provider platform to carry out user portrait mining based on the target service data. Therefore, the issued target service data is subjected to data protection processing, so that the privacy disclosure of the user caused by excessive mining possibly implemented by a service provider platform can be avoided. For example, the data protection process may be an anonymization process, such as hiding, deleting, or modifying part of the data in the service data to be processed.
Further, when data anonymization processing is performed, not only it is necessary to ensure that privacy of the user is not revealed, but also it is necessary to reflect group attributes of most users to the greatest extent through service data after anonymization processing, to achieve this, in the above steps, based on the call request and the identification result of the data application scenario, data protection processing is performed on the service data to be analyzed, so as to obtain target service data, which may include the following contents:
determining first user attribute information corresponding to service data to be analyzed and second user attribute information corresponding to reference service data according to a request item message corresponding to the calling request and application scene tag information of an identification result of the data application scene, wherein the first user attribute information and the second user attribute information respectively comprise a plurality of attribute content blocks with different attribute privacy levels, and the reference service data is used for carrying out anonymization processing analysis on the service data to be analyzed; the determining first user attribute information corresponding to the service data to be analyzed and determining second user attribute information corresponding to the reference service data include: determining the first user attribute information corresponding to the service data to be analyzed according to an attribute privacy level threshold, wherein the attribute privacy level average value of user attributes between any two uninterrupted attribute content blocks in the first user attribute information is the attribute privacy level threshold; determining second user attribute information corresponding to the reference service data according to an attribute privacy level threshold, wherein the attribute privacy level average value of user attributes between any two uninterrupted attribute content blocks in the second user attribute information is the attribute privacy level threshold;
extracting an original user attribute label of the service data to be analyzed in any attribute content block of the first user attribute information, and determining an attribute content block with the minimum attribute privacy level in the second user attribute information as a target attribute content block; mapping the original user attribute label to the target attribute content block according to a preset anonymization processing index and a data call record, obtaining an original mapping label in the target attribute content block, and generating data pairing indication information between the service data to be analyzed and the reference service data according to the original user attribute label and the original mapping label; acquiring a sensitive user attribute fragment in the target attribute content block by taking the original mapping tag as a reference tag, mapping the sensitive user attribute fragment to the attribute content block where the original user attribute tag is located according to the inverse data pairing indication information corresponding to the data pairing indication information, acquiring the target user attribute fragment corresponding to the sensitive user attribute fragment in the attribute content block where the original user attribute tag is located, and determining the reference tag of the target user attribute fragment as a target user attribute tag;
obtaining an attribute label matching result of the original user attribute label mapped to the target attribute content block; according to the privacy correlation degree between the target user attribute segment and the candidate user attribute segment corresponding to the multiple data security items to be matched on the attribute label matching result, traversing the target attribute content features corresponding to the target user attribute label in the second user attribute information until the obtained privacy risk index of the attribute content block where the target attribute content features are located is consistent with the privacy risk index of the target user attribute label in the first user attribute information, stopping obtaining the target attribute content features in the next attribute content block, and carrying out anonymization processing on the service data to be analyzed according to the attribute matching result between the target user attribute label and the last obtained target attribute content features to obtain the target service data.
It can be understood that, when the above-mentioned contents are implemented, the service data to be analyzed is anonymized based on the reference service data, so that the attribute privacy level and the privacy risk index (the degree of negative influence generated after the user information is stolen) can be considered, and thus, when the anonymization processing is performed, the service data to be analyzed can be split into the user attribute information to be correspondingly processed, so that not only can the privacy of the user not be leaked, but also the group attributes of most users can be reflected by the anonymized service data, and thus, the individual privacy of the user can be protected, and the user portrait mining requirement of the service provider platform can be met.
Next, in view of the above method for mining a user portrait based on big data, an exemplary apparatus 300 for mining a user portrait based on big data is further provided in the embodiments of the present invention, as shown in fig. 3, the apparatus 300 for mining a user portrait based on big data may include the following functional modules.
The data extraction module 310 is configured to extract associated service data corresponding to the service interaction event identifier from the acquired service data to be analyzed that includes the service interaction event identifier.
The feature extraction module 320 is configured to extract, through the service data analysis model, local service data feature content and global service data feature content from associated service data corresponding to the service interaction event identifier, where the local service data feature content includes portrait information of an interaction event tag in the associated service data corresponding to the service interaction event identifier and a detection result of an association degree of each data segment, and the global service data feature content includes a change condition of an interaction event state in the associated service data corresponding to the service interaction event identifier.
And the scene recognition module 330 is configured to splice the extracted service data feature contents through the service data analysis model, and perform scene recognition on the spliced service data feature contents according to a scene recognition network obtained through sample training in the service data analysis model to obtain recognition results of i data application scenes, where i is a positive integer greater than 1.
It is understood that the above description of the functional modules may refer to the above description of the corresponding method embodiments.
Further, referring to fig. 4 in combination, the cloud computing server 20 may include a processing engine 21, a network module 22 and a memory 23, and the processing engine 21 and the memory 23 communicate through the network module 22.
Processing engine 21 may process the relevant information and/or data to perform one or more of the functions described herein. For example, in some embodiments, processing engine 21 may include at least one processing engine (e.g., a single core processing engine or a multi-core processor). By way of example only, the Processing engine 21 may include a Central Processing Unit (CPU), an Application-Specific Integrated Circuit (ASIC), an Application-Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination thereof.
Network module 22 may facilitate the exchange of information and/or data. In some embodiments, the network module 22 may be any type of wired or wireless network or combination thereof. Merely by way of example, the Network module 22 may include a cable Network, a wired Network, a fiber optic Network, a telecommunications Network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth Network, a Wireless personal Area Network, a Near Field Communication (NFC) Network, or the like, or any combination thereof. In some embodiments, network module 22 may include at least one network access point. For example, the network module 22 may include wired or wireless network access points, such as base stations and/or network access points.
The Memory 23 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 23 is configured to store a program, and the processing engine 21 executes the program after receiving the execution instruction.
It is to be understood that the configuration shown in fig. 4 is merely illustrative, and that cloud computing server 20 may include more or fewer components than shown in fig. 4, or have a different configuration than shown in fig. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof.
The foregoing disclosure of embodiments of the present invention will be apparent to those skilled in the art. It should be understood that the process of deriving and analyzing technical terms, which are not explained, by those skilled in the art based on the above disclosure is based on the contents described in the present application, and thus the above contents are not an inventive judgment of the overall scheme.
It should be appreciated that the system and its modules shown above may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.
Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the numbers allow for adaptive variation. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, and the like, cited in this application is hereby incorporated by reference in its entirety. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application can be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

Claims (7)

1. A user portrait mining method based on big data is applied to a cloud computing server, the cloud computing server is communicated with user side equipment and a service provider platform, and the method comprises the following steps:
responding to a calling request uploaded by a service provider platform, wherein the calling request is used for requesting to call service data to be analyzed and an identification result of a data application scene;
performing data protection processing on the service data to be analyzed based on the calling request and the identification result of the data application scene to obtain target service data; the data protection processing comprises hiding, deleting or modifying part of data in the service data to be analyzed;
and issuing the target service data and the identification result of the data application scene to the service provider platform so as to enable the service provider platform to carry out user portrait mining based on the target service data.
2. The method of claim 1, wherein prior to the step of responding to the invocation request uploaded by the facilitator platform, the method further comprises:
extracting associated service data corresponding to the service interaction event identifier from the acquired service data to be analyzed containing the service interaction event identifier;
performing data analysis and scene recognition on the associated service data corresponding to the service interaction event identification through a service data analysis model which is trained in advance to obtain a recognition result of i data application scenes; and the identification result of the data application scene is used for indicating the service provider platform to carry out user portrait mining so as to realize optimization of service products.
3. The method of claim 2, wherein the traffic data to be analyzed is a time-sensitive data stream.
4. The method of claim 2, wherein performing data analysis and scene recognition on associated service data corresponding to the service interaction event identifier through a service data parsing model which is trained in advance to obtain recognition results of i types of data application scenes comprises:
extracting local service data characteristic content and global service data characteristic content from the associated service data corresponding to the service interaction event identifier through a service data analysis model, wherein the local service data characteristic content comprises portrait information of an interaction event label in the associated service data corresponding to the service interaction event identifier and a detection result of the association degree of each data segment, and the global service data characteristic content comprises a change condition of an interaction event state in the associated service data corresponding to the service interaction event identifier;
splicing the extracted service data characteristic contents through the service data analysis model, and carrying out scene recognition on the spliced service data characteristic contents according to a scene recognition network obtained through sample training in the service data analysis model to obtain recognition results of i data application scenes, wherein i is a positive integer greater than 1; the business data analysis model is obtained by carrying out sample training on a training sample set corresponding to a plurality of business interaction event identifications, and the recognition result of i data application scenes is marked in advance in the training sample set corresponding to each business interaction event identification.
5. The method according to claim 1, wherein performing data protection processing on the service data to be analyzed based on the invocation request and the recognition result of the data application scenario to obtain target service data comprises:
determining first user attribute information corresponding to service data to be analyzed and second user attribute information corresponding to reference service data according to a request item message corresponding to the calling request and application scene tag information of an identification result of the data application scene, wherein the first user attribute information and the second user attribute information respectively comprise a plurality of attribute content blocks with different attribute privacy levels, and the reference service data is used for carrying out anonymization processing analysis on the service data to be analyzed;
extracting an original user attribute label of the service data to be analyzed in any attribute content block of the first user attribute information, and determining an attribute content block with the minimum attribute privacy level in the second user attribute information as a target attribute content block;
mapping the original user attribute label to the target attribute content block according to a preset anonymization processing index and a data call record, obtaining an original mapping label in the target attribute content block, and generating data pairing indication information between the service data to be analyzed and the reference service data according to the original user attribute label and the original mapping label;
acquiring a sensitive user attribute fragment in the target attribute content block by taking the original mapping tag as a reference tag, mapping the sensitive user attribute fragment to the attribute content block where the original user attribute tag is located according to the inverse data pairing indication information corresponding to the data pairing indication information, acquiring the target user attribute fragment corresponding to the sensitive user attribute fragment in the attribute content block where the original user attribute tag is located, and determining the reference tag of the target user attribute fragment as a target user attribute tag;
obtaining an attribute label matching result of the original user attribute label mapped to the target attribute content block; according to the privacy correlation degree between the target user attribute segment and the candidate user attribute segment corresponding to the multiple data security items to be matched on the attribute label matching result, traversing the target attribute content features corresponding to the target user attribute label in the second user attribute information until the obtained privacy risk index of the attribute content block where the target attribute content features are located is consistent with the privacy risk index of the target user attribute label in the first user attribute information, stopping obtaining the target attribute content features in the next attribute content block, and carrying out anonymization processing on the service data to be analyzed according to the attribute matching result between the target user attribute label and the last obtained target attribute content features to obtain the target service data.
6. The method according to claim 5, wherein the determining first user attribute information corresponding to the service data to be analyzed and determining second user attribute information corresponding to the reference service data comprise:
determining the first user attribute information corresponding to the service data to be analyzed according to an attribute privacy level threshold, wherein the attribute privacy level average value of user attributes between any two uninterrupted attribute content blocks in the first user attribute information is the attribute privacy level threshold;
and determining second user attribute information corresponding to the reference service data according to an attribute privacy level threshold, wherein the attribute privacy level average value of the user attributes between any two uninterrupted attribute content blocks in the second user attribute information is the attribute privacy level threshold.
7. A cloud computing server comprising a processing engine, a network module, and a memory; the processing engine and the memory communicate through the network module, the processing engine reading a computer program from the memory and operating to perform the method of any of claims 1-6.
CN202210076579.4A 2021-01-26 2021-01-26 User portrait mining method based on big data and cloud computing server Withdrawn CN114610772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210076579.4A CN114610772A (en) 2021-01-26 2021-01-26 User portrait mining method based on big data and cloud computing server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210076579.4A CN114610772A (en) 2021-01-26 2021-01-26 User portrait mining method based on big data and cloud computing server
CN202110109932.XA CN112818023B (en) 2021-01-26 2021-01-26 Big data analysis method and cloud computing server in associated cloud service scene

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110109932.XA Division CN112818023B (en) 2021-01-26 2021-01-26 Big data analysis method and cloud computing server in associated cloud service scene

Publications (1)

Publication Number Publication Date
CN114610772A true CN114610772A (en) 2022-06-10

Family

ID=75859671

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202210076579.4A Withdrawn CN114610772A (en) 2021-01-26 2021-01-26 User portrait mining method based on big data and cloud computing server
CN202110109932.XA Active CN112818023B (en) 2021-01-26 2021-01-26 Big data analysis method and cloud computing server in associated cloud service scene
CN202210076607.2A Withdrawn CN114610773A (en) 2021-01-26 2021-01-26 Data application scene recognition method based on big data and cloud computing server

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN202110109932.XA Active CN112818023B (en) 2021-01-26 2021-01-26 Big data analysis method and cloud computing server in associated cloud service scene
CN202210076607.2A Withdrawn CN114610773A (en) 2021-01-26 2021-01-26 Data application scene recognition method based on big data and cloud computing server

Country Status (1)

Country Link
CN (3) CN114610772A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422593A (en) * 2022-09-14 2022-12-02 戴丽 Information optimization processing method and server based on Internet and digital technology
CN117150551A (en) * 2023-09-04 2023-12-01 北京超然聚力网络科技有限公司 User privacy protection method and system based on big data

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391867B (en) * 2021-06-16 2022-07-01 刘叶 Big data service processing method and service server based on digitization and visualization
CN113553609B (en) * 2021-09-17 2022-08-02 支付宝(杭州)信息技术有限公司 Method and system for predicting service by combining multiple parties based on privacy protection
CN114168632A (en) * 2021-12-07 2022-03-11 泰康保险集团股份有限公司 Abnormal data identification method and device, electronic equipment and storage medium
CN114168973A (en) * 2021-12-21 2022-03-11 江西省锐华互联网科技有限公司 APP security vulnerability analysis method based on cloud computing and server
CN114415829B (en) * 2021-12-29 2022-08-19 广州市影擎电子科技有限公司 Cross-platform equipment universal interface implementation method and system
CN114417405B (en) * 2022-01-11 2022-10-14 中软数智信息技术(武汉)有限公司 Privacy service data analysis method based on artificial intelligence and server
CN114281553B (en) * 2022-03-08 2022-05-13 开泰远景信息科技有限公司 Business processing method and system and cloud platform
CN114710542B (en) * 2022-03-23 2023-12-26 中国工商银行股份有限公司 Generalized routing mock method and device based on rpc
CN114648364B (en) * 2022-03-30 2023-04-18 成都净蓝科技有限公司 Method and system for analyzing sales data of electronic commerce website

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908606A (en) * 2017-10-31 2018-04-13 上海壹账通金融科技有限公司 Method and system based on different aforementioned sources automatic report generation
CN110365755A (en) * 2019-06-28 2019-10-22 深圳数位传媒科技有限公司 A kind of information recommendation method and device triggered in real time based on key scenes
CN110413882B (en) * 2019-07-15 2023-10-31 创新先进技术有限公司 Information pushing method, device and equipment
CN111191041A (en) * 2019-11-22 2020-05-22 腾讯云计算(北京)有限责任公司 Characteristic data acquisition method, data storage method, device, equipment and medium
CN111091351A (en) * 2019-12-16 2020-05-01 北京政信1890智能科技有限公司 User portrait construction method and device, electronic equipment and readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422593A (en) * 2022-09-14 2022-12-02 戴丽 Information optimization processing method and server based on Internet and digital technology
CN117150551A (en) * 2023-09-04 2023-12-01 北京超然聚力网络科技有限公司 User privacy protection method and system based on big data
CN117150551B (en) * 2023-09-04 2024-02-27 东方魂数字科技(北京)有限公司 User privacy protection method and system based on big data

Also Published As

Publication number Publication date
CN112818023A (en) 2021-05-18
CN112818023B (en) 2022-03-18
CN114610773A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN112818023B (en) Big data analysis method and cloud computing server in associated cloud service scene
CN108427939B (en) Model generation method and device
CN112633962B (en) Service recommendation method and device, computer equipment and storage medium
CN113298121B (en) Message sending method and device based on multi-data source modeling and electronic equipment
CN111371767A (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN108090351A (en) For handling the method and apparatus of request message
CN112749181B (en) Big data processing method aiming at authenticity verification and credible traceability and cloud server
CN110929806A (en) Picture processing method and device based on artificial intelligence and electronic equipment
CN111324738A (en) Method and system for determining text label
CN110955770A (en) Intelligent dialogue system
CN112684396A (en) Data preprocessing method and system for electric energy meter operation error monitoring model
CN115687934A (en) Intention recognition method and device, computer equipment and storage medium
CN110489730A (en) Text handling method, device, terminal and storage medium
CN113472860A (en) Service resource allocation method and server under big data and digital environment
CN113626826A (en) Intelligent contract security detection method, system, equipment, terminal and application
CN109726398B (en) Entity identification and attribute judgment method, system, equipment and medium
CN111241297A (en) Map data processing method and device based on label propagation algorithm
WO2023077815A1 (en) Method and device for processing sensitive data
WO2021151354A1 (en) Word recognition method and apparatus, computer device, and storage medium
CN111786937B (en) Method, apparatus, electronic device and readable medium for identifying malicious request
CN114493850A (en) Artificial intelligence-based online notarization method, system and storage medium
CN112784990A (en) Training method of member inference model
CN114528496B (en) Multimedia data processing method, device, equipment and storage medium
CN111506510B (en) Software quality determining method and related device
CN113837183B (en) Multi-stage certificate intelligent generation method, system and medium based on real-time mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220610