CN115563654A - Digital marketing big data processing method - Google Patents
Digital marketing big data processing method Download PDFInfo
- Publication number
- CN115563654A CN115563654A CN202211469771.6A CN202211469771A CN115563654A CN 115563654 A CN115563654 A CN 115563654A CN 202211469771 A CN202211469771 A CN 202211469771A CN 115563654 A CN115563654 A CN 115563654A
- Authority
- CN
- China
- Prior art keywords
- entry
- feature
- big data
- individual
- digital marketing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 230000035945 sensitivity Effects 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 26
- 238000004140 cleaning Methods 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims description 41
- 238000004364 calculation method Methods 0.000 claims description 21
- 238000000605 extraction Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 8
- 239000002131 composite material Substances 0.000 claims 1
- 238000011002 quantification Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 11
- 239000000047 product Substances 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013479 data entry Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 241000764238 Isis Species 0.000 description 1
- 241001131927 Placea Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioethics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of big data processing, and provides a digital marketing big data processing method, which comprises the following steps: acquiring digital marketing big data and establishing a database; carrying out characteristic preliminary cleaning on the digital marketing big data in the database; acquiring the characteristics of all digital marketing big data, acquiring positive connection parameters and negative connection parameters according to the distribution relation of entries in a database among the characteristics, acquiring the profitability of the characteristics according to the density expression of the characteristics in the database, acquiring the connectivity among the characteristics according to the positive connection parameters and the negative connection parameters, and carrying out sensitivity quantification on the characteristics according to the connectivity and the profitability; acquiring the sensitivity of the entries in the database by using the characteristic sensitivity to obtain the entries corresponding to the sensitive data; and carrying out security processing on the sensitive data in the obtained digital marketing big data. The invention aims to solve the problem that when large digital marketing data are encrypted, the time consumption is too long due to the huge data volume.
Description
Technical Field
The application relates to the field of big data processing, in particular to a digital marketing big data processing method.
Background
With the development of science and technology and the arrival of the digital era, the traditional marketing mode, such as the promotion and promotion of off-line physical stores, is not dominant in the selling process of commodities because of small coverage, and the corresponding digital marketing is more popular because of the accuracy and the coverage of a large area. In the process of digital marketing, the corresponding big data is generated correspondingly for the commodities of each enterprise, and the big data is very important for updating and promoting subsequent products of the enterprise, so that the safety of the digital marketing big data is an important problem for the enterprise, and the digital marketing big data needs to be subjected to corresponding safety processing.
Disclosure of Invention
The invention provides a method for processing digital marketing big data, which aims to solve the problems that the data volume is huge and the time consumption is too long when the existing algorithm is used for encrypting the digital marketing big data, and adopts the following technical scheme:
one embodiment of the invention provides a digital marketing big data processing method, which comprises the following steps:
constructing a database of the digital marketing big data, and performing characteristic cleaning on all entries of the digital marketing big data in the database;
acquiring the characteristics of all entries, acquiring the characteristic relevance of each characteristic in each entry according to the position relation between different characteristics in the same entry, taking the mean value of the characteristic relevance of each characteristic in each entry in all entries as the positive contact parameter of each characteristic, acquiring the negative contact parameter of each characteristic according to the integral occurrence frequency between the characteristics which never appear in the same entry and the occurrence frequency of the characteristics in a certain entry range, and acquiring the contact of each characteristic according to the positive contact parameter and the negative contact parameter;
acquiring the profitability of each characteristic according to the inter-entry density of the characteristics appearing in different entries and the intra-entry density appearing in the same entry, and acquiring the sensitivity of each characteristic according to the associativity and the profitability of each characteristic;
and by utilizing the sensitivity of the characteristics in the digital marketing big data, taking the sum of the sensitivities of all the characteristics in the same entry as the sensitivity of the entry, acquiring the sensitive data contained in the entry according to the sensitivity of the entry, and carrying out safety processing on the sensitive data.
Optionally, the step of constructing the database of the digital marketing big data is as follows:
and acquiring the digital marketing big data, classifying and establishing a database based on the sources, and performing structured processing on the digital marketing big data of the same source in the database by using a form entry mode according to the obtaining time of the big data to obtain the preprocessed digital marketing big data.
Optionally, the step of performing feature cleaning includes:
repeated characters in entries corresponding to all digital marketing big data in the database are obtained, and characters corresponding to a small part of unrepeated features are cleaned, so that the workload of subsequent feature extraction and feature sensitivity calculation is reduced.
Optionally, the method for acquiring the features of all the entries includes:
and (3) taking the text data of each entry as the input of the named body recognition technology, and outputting the obtained entity as the characteristic of the digital marketing big data.
Optionally, the method for obtaining the feature relevance of each feature in each entry includes:
wherein,is shown asIn the individual entryThe characteristic relevance of each characteristic is determined by the characteristic relevance,is as followsThe total number of all features in an individual entry,is shown asIn the individual entryA characteristic ofThe characteristic association parameter of each characteristic is obtained by the position relation of two characteristics appearing in the same entry.
Optionally, the method for acquiring the positive contact parameter of each feature includes:
wherein,is shown asA positive connection parameter of the individual characteristic,for the number of structured entries of the digitized marketing big data in the database,is shown asThe first in the individual entryThe number of times that an individual feature occurs,is shown asIs divided byThe total number of occurrences of other features than the individual feature,is shown asIn each entryFeature relevance of individual features.
Optionally, the method for obtaining the negative contact parameter of each feature includes:
wherein,is shown asThe negative connection parameter of the individual characteristic,indicates neverThe first of the features that an individual feature appears in the same entryThe characteristics of the device are as follows,then this is indicatedSome never beforeThe total number of features that an individual feature appears in the same entry,is shown asThe total number of times that an individual feature appears in the database,denotes the firstThe total number of times that an individual feature appears in the database,is shown inWithin the range of the individual entryThe frequency of occurrence of a feature is such that,is shown inWithin the range of the individual entryThe frequency of occurrence of the individual features is,is shown in commonThe range of each entry is defined as,the term range is a range formed by a certain number of terms.
Optionally, the method for obtaining the contact of each feature includes:
wherein,is the firstThe relevance of the individual characteristics is such that,is as followsEach feature is being associated with a normalized parameter,is as followsThe individual features are negatively linked to the normalized parameters.
Optionally, the method for obtaining the profitability of each feature includes:
wherein,is as followsThe inter-entry density of the individual features,is a firstSecond adjacent occurrence ofThe distance between the two entries where the individual features are located,is the maximum number of adjacent occurrences; said firstDensity within entry of individual featureThe calculation method comprises the following steps:
wherein,is as followsThe in-entry density of the individual features,is shown asIs characterized in thatThe number of occurrences in an individual entry,denotes the firstIs characterized in thatIn the individual entryThe position of the secondary occurrence is,is shown asIs characterized in thatIn the individual entryThe position of the secondary occurrence is,denotes the firstThe length of an individual entry; the invention has the advantages that the product of the inter-entry density, the intra-entry density and the total occurrence frequency according to the characteristics is as follows: the sensitivity of the big data is quantified by utilizing the characteristic characteristics through the characteristic extraction of the digital marketing big data, so that a large amount of sensitive data screening calculation amount is saved; sensitivity calculation is carried out through positive and negative connectivity and characteristic income, sensitive data screening of the digital marketing big data is carried out more accurately, then the digital marketing big data is processed safely, the amount of processed basic data is greatly reduced, and the processing time is shortened.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive labor.
Fig. 1 is a schematic flow chart of a digital marketing big data processing method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of a method for processing digital marketing big data according to an embodiment of the present invention is shown, where the method includes the following steps:
and S001, acquiring the digital marketing big data and establishing a database.
Because the large digital marketing data is very scattered and irregular in structure relative to the structured data in the database, it is very inconvenient for subsequent feature extraction and feature sensitivity calculation. The concrete expression is that irregular data (particularly data structures) need to be searched when feature extraction is carried out, so that the calculation amount is greatly increased. And the data from different sources are not very strong in connectivity, when the data feature identification is performed and the feature sensitivity calculation is performed by using the data features, the data from different sources are not strong in connectivity, so that the feature extraction is too much, and further the feature sensitivity calculation is inaccurate and the dimension disaster is caused. Therefore, a database based on data sources needs to be established for the digital marketing big data, and then the digital marketing big data in the database needs to be structured.
The method comprises the steps of firstly acquiring digital marketing big data, recording the digital marketing big data when the digital marketing big data are collected by an enterprise, and further classifying the digital marketing big data according to data sources, wherein the data from the same source are classified into one type.
The database is established for the digital marketing big data of each source, and preferably, the database is established by using the prior art such as Hbase technology, which is a well-known technology and will not be described in detail herein.
Carrying out structuring processing on the digital marketing big data from the same source in each database according to the obtaining time of the big data by using the form of the table entries, and obtaining the big dataEntries, where the total number of entries in each database may not be the same, are used uniformly for convenience of description hereinAnd (4) performing representation.
The preprocessed digital marketing big data is obtained through the acquisition of the digital marketing big data, the classification and database establishment based on the sources and the corresponding structural processing.
The sensitivity of the entries of the structured digital marketing big data in each database is different from that of the commodities. The concrete behavior is the connectivity between the different features extracted in the terms, and the revenue of the contribution to marketing is different. The description of the commodity by the characteristics with stronger contact is more accurate, and the description of the commodity by the characteristics with weaker contact is more fuzzy; the greater the impact of the corresponding features on marketing benefits, the more important it is among all the features of the good.
And S002, determining the characteristics of the digital marketing big data in the database, carrying out primary cleaning, and obtaining the corresponding characteristics of all the digital marketing big data.
When the entries in the database are used for analysis, the length of the entire entries in the database may be too long, and the entries may contain noise of other non-valid information. Therefore, the method and the device perform initial feature cleaning on all the entries corresponding to the digital marketing big data in the database, extract the features in the entries of the database through the named body recognition technology by utilizing the data after the initial feature cleaning, and calculate the sensitivity of the data by taking the features as the labels of the entries in the database.
The method comprises the steps of carrying out primary characteristic cleaning on entries corresponding to all digital marketing big data in a database, and specifically, obtaining repeated characters of the entries corresponding to all the digital marketing big data in the database. Because the characteristics are used for describing important words in the vocabulary entry, characters corresponding to most characteristics are repeated, and correspondingly, characters corresponding to a small part of characteristics which do not repeatedly appear exist, but the characteristics are irrelevant and important in big data, and the big data is not concerned about a small number of data and only about the general dynamic trend. The method is used for carrying out the initial cleaning of the features, so that the workload in the subsequent feature extraction and feature sensitivity calculation can be reduced, and a few features which are irrelevant to the general dynamic trend are eliminated.
Further, the data obtained by the preliminary feature cleaning is subjected to feature extraction by utilizing a named body recognition technology, specifically, the input data is the data corresponding to the vocabulary entry, and then the entity obtained by the output of the named body recognition technology is the feature in the digital marketing big data and is expressed as a word form in the vocabulary entry.
Specifically, by using the method, feature extraction is performed on all entries in the database after the structuring processing of the digital marketing big data, so that all features can be obtainedThe following are:
in which the subscripts denote different features, e.g.I.e. representing dataAll the digitalized marketing big data in the libraryThe characteristics of the device are as follows,,the maximum feature number extracted after the preliminary feature cleaning of the corresponding digital marketing big data in the current database and the maximum feature number in each databaseMay be different, and are used herein for convenience of description and uniformityAnd (4) performing representation.
And S003, carrying out sensitivity quantification on the characteristics according to the acquired characteristics of all the digital marketing big data.
The sensitivity refers to a parameter for quantifying the importance degree of the extracted features in the digital marketing big data or whether safety processing is necessary; calculating the relationship between the characteristics in the digital marketing big data and the profitability of the characteristics to the marketing contribution; the more strongly the certain characteristic is connected with the rest characteristics, the more important the certain characteristic is compared with the other characteristics in the process of digital marketing, namely, the digital marketing can be carried out under the coordination of most characteristics, so as to generate big data of the corresponding characteristics; and the higher the frequency of appearance of the characteristic is, the more uniform the characteristic is, the more corresponding the income on the characteristic is in the process of marketing, so the more sensitive the characteristic is, the more sensitive the corresponding big data of the digital marketing corresponding to the characteristic is, and the stronger the necessity of safety processing is.
It should be noted that, for the relationship between the features, it includes positive and negative relationship. Positive connectivity refers to the presence of one feature, often accompanied by the presence of the remaining features, and negative connectivity refers to the presence of one feature, often the absence of most features, so this property is used to quantify the connectivity between features.
Further, when the characteristics of the commodity are generally described, the stronger the connectivity between the two characteristics is, the smaller the distance between the euclidean distance between the article descriptions corresponding to the two characteristics in a term should be, that is, one characteristic enhances the other characteristic; the weaker the connectivity of the corresponding two entries, the longer the Euclidean distance of the two features in the same entry, i.e. one feature supplements the other, so that each feature containsThe character bars of (2) are subjected to calculation of Euclidean distances between features to determine the features therebyPositive associations with the remaining features.
In particular, in the followingA characteristicFor example, it is in direct contact withThe quantization mode of (1) is as follows:
wherein,is shown asThe positive connection parameter of the individual feature,for the number of structured entries of the digitized marketing big data in the database,is shown asThe first in the individual entryThe number of times that an individual feature occurs,is shown asExcept for the first in each entryThe total number of occurrences of other features than the individual feature,denotes the firstIn the individual entryFeature relevance of individual features.
The first mentionedIn each entryIs a characteristicIs related to the characteristic of (i) i.e.The calculating method comprises the following steps:
wherein,is shown asIn the individual entryThe characteristic relevance of each characteristic is determined by the characteristic relevance,is a firstThe total number of all features in an individual entry,denotes the firstIn the individual entryA characteristic ofThe feature of each feature is associated with a parameter.
The first mentionedIn the individual entryA characteristic ofA characteristicIs related to a parameter, i.e.The calculating method comprises the following steps:
wherein,is shown asA characteristicIn the first placeThe number of occurrences in an individual entry,is shown asIs a characteristicIn the first placeIn each entryThe location of the secondary occurrence;is shown asIs characterized in thatThe nearest to the word entry when it appearsThe location of the features, it being noted thatIs characterized in thatThe number of occurrences of an entry is not necessarilySecond, firstThe nearest one corresponding to different occurrence times of each featureThe feature occurrence positions may be the same.
It should be construed thatA plurality of words contained in each entry form a word sequence from left to right, and characteristicsAlso on the entry is a word, which may appear multiple times in the word sequence, then the word is included in the wordThe position in the sequence being a featureIn the first placeThe position of occurrence in the entry, similarlyThe same features can also be obtained inThe position of occurrence in the individual entry. Wherein,the larger the value isA characteristic ofThe closer the positions of the characteristics appearing on the same entry are, the stronger the contact between the two characteristics is;the smaller the two characteristics, the farther the two characteristics appear on the same entry, the weaker the connectivity of the two characteristics is;
larger indicates on the same entryA characteristicThe closer the position of the feature to all other features on the entry is, the more other features are close to the feature, and the feature is shown to be close to other features on the entryThe stronger the relevance of the features; the farther the feature appears from all other features on the entry, the less other features are close to the feature, which means that the association of the feature with other features on the entry is weaker;
larger indicates that the word is on all entriesThe stronger the relationship between each feature and all other features on all the entries, the more important and sensitive the feature is in the database, and the more relevant change can reflect the overall change trend of the digital marketing big data.
The overall relevance of a feature to other features within the same entry is considered positive, the stronger the relevance, the stronger the corresponding sensitivity of the feature, and the more important it is in the database.
Further, for the secondIs a characteristicIs in direct contact withIs quantified by the remaining features anda characteristicAnd all the characteristics are described for the same digital marketing, namely all the characteristics are subordinate to the process of the digital marketing. But with the contrary characteristics between them, i.e. characteristicsAppearContrary to the presence of other features, i.e. ofA characteristicThe more the number of occurrences, the more the remaining features occur, and because the cardinality of the features is large, they still do not occur in the same entry, which indicates that the negative relationship between them is larger. So utilizeIs calculated as an overall negative relation with the total number of occurrences of the remaining conflicting features, and then a partial negative relation is calculated by multiplying the frequencies of occurrences within the range, and the overall and partial negative relations are multiplied to represent the fourthIs a characteristicNegative links to the remaining features.
Specifically, in the order ofA characteristicFor example, its negative connectionThe quantization method is as follows:
wherein,denotes the firstThe negative connection parameter of the individual characteristic,indicates never comesThe first of the features that an individual feature appears in the same entryThe characteristics of the device are as follows,then this indicates that these have never been comparedThe total number of features that an individual feature appears in the same entry,is shown asThe total number of times that an individual feature appears in the database,is shown asThe total number of occurrences of a feature in the database,is shown inWithin the range of the individual entryThe frequency of occurrence of the individual features is,is shown inWithin the range of the individual entryThe frequency of occurrence of the individual features is,is shown in commonAn entry range, which is a range formed by a certain number of entries.
Preferably, the term range gives an empirical value of 100 terms; specifically, willEvery 100 entries in each entry are divided into a group, and the result isGroup, i.e.A range of entries.
Feature(s)Number and never of occurrencesThe larger the ratio of the total times of a certain feature appearing in the same entry is, the larger the feature cardinality is, the more the feature cardinality is, the feature cardinality is still not appeared at the same time, namely, the stronger the negative relation between the two features is; and features within a certain range of entriesFrequency and uncombination characteristicsThe ratio of the occurrence frequencies of certain features appearing in the same entry can also indicate that the stronger the negative relationship between the two, the more negative relationshipThe larger the feature is, the more times the feature and the features irrelevant to the feature are appeared, but the feature still does not appear in the same entry, which indicates that the feature has more irrelevant features, so that the overall importance of the feature in the database is reduced, and the sensitivity is also reduced.
Further, the positive connection and the negative connection of all the characteristics are calculated by the method, and then the positive connection and the negative connection of all the characteristics are normalized to calculate the connectivity.
In particular, in the followingContact of individual characteristicsFor example, the calculation method is as follows:
wherein,is the firstThe relevance of the individual characteristics is such that,is as followsEach feature is being associated with a normalized parameter,is as followsEach feature is negatively linked to the normalized parameter.
The method is used for calculating the relevance of all the characteristics, and the relevance of all the characteristics can be obtained. The positive relation of the features increases the corresponding sensitivity, the negative relation reduces the corresponding sensitivity, the integral relation of the features is obtained by subtracting the negative relation from the positive relation, the larger the positive relation is, the smaller the negative relation is, namely, the more the related features of the features are and the fewer the unrelated features are, the relation is also increased, and the corresponding importance and sensitivity in the database are also larger; conversely, if the positive link is smaller and the negative link is larger, the irrelevant feature is far more than the relevant feature, so that the importance of the feature in the database is greatly reduced, and the feature has no greater sensitivity.
Further, the profitability of the characteristics is calculated, wherein the profitability of the characteristics refers to the income corresponding to each characteristic when the digital marketing big data is used for marketing, and the theoretical logic means that the more times each attribute appears in all entries, the greater the income of the digital marketing big data is when the digital marketing big data is used for marketing.
Further, when performing the feature profitability calculation, the first step is utilizedIs a characteristicThe density and number of occurrences in the global database are calculated. Because the data entry time of the database is based on time series entry, the first oneIs a characteristicThe more and more uniform the appearance density is, the more relevant the digital marketing big data in the database is to the second time of marketingA characteristicThe most contribution, i.e. the corresponding gain. And the characteristic profitIs composed of two parts including density and overall frequency of occurrence, the first part is used in densityA characteristicThe distance between the different terms appearing is calculated, and the larger the value is, the more the description isIs a characteristicThe more times this occurs, this is the density between entries. And then multiplied by the density within the entry for the repeated occurrences within one entryIs a characteristicThe more times it occurs, the more important the feature is in the entry, and then the moreA characteristicThe product of the number of times of occurrence of the whole is taken asA characteristicThe characteristic yield of (1).
In particular, in the followingA characteristicFor example, the characteristic profitThe calculating method comprises the following steps:
wherein,is a firstThe inter-entry density of the individual features,is as followsThe in-entry density of the individual features,denotes the firstIs a characteristicThe total number of occurrences is,is the total number of all entries, whereinInter-entry density of individual featuresThe calculation method comprises the following steps:
wherein,is as followsSecond adjacent occurrence ofThe distance between the two entries where the individual features are located,for the maximum number of adjacent occurrences, it should be noted thatSecond adjacent occurrence ofDistance between two entries of a featureThe meaning of (A) is: for exampleFirst occurrence isThe second occurrence isThe third occurrence isThe first occurrence is adjacent to the second occurrence and isThe adjacent ones of the first and second layers are next to each other,the second occurrence is adjacent to the third occurrence, thenThe adjacent ones of the first and second layers are next to each other,。
and the firstDensity within entry of individual featureThe calculation method comprises the following steps:
wherein,denotes the firstIs characterized in thatThe number of occurrences in an individual entry,denotes the firstIs characterized in thatIn the individual entryThe position of the secondary occurrence is,is shown asIs characterized in thatIn the individual entryThe position of the secondary occurrence is,is shown asLength of an individual entry.
Further, in the second placeA characteristicCharacteristic profit ofFor example, the feature yields are obtained after normalizationThe obtained characteristic income comprises inter-entry density and intra-entry density, wherein the inter-entry density is obtained by the mean value of the distances between the two entries when the characteristics appear in different entries, and the smaller the mean value of the distances between the two entries containing the same characteristic is, the more the entries containing the characteristic are distributed uniformly, and the larger the characteristic income is; the density in the entries is obtained by the ratio of the sum of the distances between the continuous occurrences of the same features in the same entry to the total length of the entries, the larger the ratio is, the more sparse the features in the same entry are, the less the number of occurrences of the features contributes more, and the feature profit is also larger.
And performing characteristic income calculation on all the characteristics by using the method, and obtaining the profitability of all the characteristics after normalization.
Further, the firstA characteristicSensitivity of (2)Is calculated from the relationship between the remaining features and the overall yield, in particularIs a characteristicFor example, its sensitivityThe calculation method comprises the following steps:
wherein,is as followsThe sensitivity of the individual characteristics of the material,is the firstThe relevance of the individual characteristics is such that,is a firstThe profitability of the individual characteristics.
The sensitivity of all the characteristics can be obtained by calculating the sensitivity of all the characteristics by the method. The stronger the connection between a certain feature and other features, the more important the feature is relative to the whole digital marketing process, the greater the profit is, the most contributed in the whole digital marketing process, the more sensitive the attribute is, the more safety processing is needed, otherwise, the less sensitive the feature is, the less important the feature is, and the processing is not needed.
And S004, acquiring entries corresponding to the structured sensitive data corresponding to the digital marketing big data in the database by utilizing the characteristic sensitivity of the quantized digital marketing big data.
Specifically, the sensitivity corresponding to each feature is obtained in the above process, and the overall sensitivity calculation is performed on each entry to obtain the sensitivity of each entryAn entryFor example, the calculation method is as follows:
wherein,is shown asThe sensitivity of the individual terms is such that,is shown asThe number of all features in an individual entry,indicates the first in the entrySensitivity of individual characteristics, then entry sensitivityThe larger the entry, the more sensitive the entry is, preferably, the first threshold is givenAnd (6) judging.
And performing overall sensitivity calculation on each entry by using the method, and judging and obtaining the corresponding entry corresponding to the sensitive data according to a first threshold, wherein the data contained in the entry with the sensitivity greater than the first preset threshold is the sensitive data.
Sensitive data in all the digital marketing big data are the most relevant data with the strongest contact in the marketing process for the whole database, and the sensitive data are more important in the database compared with other data, so that the data is safely processed in the subsequent process, the data volume in the processing process can be greatly reduced, and the processing time is shortened.
And S005, carrying out safety processing on the sensitive data in the acquired digital marketing big data.
Specifically, the digital marketing big data is subjected to data partitioning, wherein the data partitioning comprises sensitive data and non-sensitive data, further, the sensitive data is subjected to security processing, the security processing of the whole digital marketing big data can be completed, and specifically, the sensitive data can be subjected to security processing and can be encrypted by using an AES algorithm.
The present invention is not limited to the above-described preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (9)
1. A digital marketing big data processing method is characterized by comprising the following steps:
constructing a database of the digital marketing big data, and performing characteristic cleaning on all entries of the digital marketing big data in the database;
acquiring the characteristics of all entries, acquiring the characteristic relevance of each characteristic in each entry according to the position relation between different characteristics in the same entry, taking the mean value of the characteristic relevance of each characteristic in each entry in all entries as the positive contact parameter of each characteristic, acquiring the negative contact parameter of each characteristic according to the integral occurrence frequency between the characteristics which never appear in the same entry and the occurrence frequency of the characteristics in a certain entry range, and acquiring the contact of each characteristic according to the positive contact parameter and the negative contact parameter;
obtaining the profitability of each feature according to the inter-entry density of the features appearing in different entries and the intra-entry density of the features appearing in the same entry, and obtaining the sensitivity of each feature according to the contact and the profitability of each feature;
and by utilizing the sensitivity of the characteristics in the digital marketing big data, taking the sum of the sensitivities of all the characteristics in the same entry as the sensitivity of the entry, acquiring the sensitive data contained in the entry according to the sensitivity of the entry, and carrying out safety processing on the sensitive data.
2. The digital marketing big data processing method of claim 1, wherein the step of constructing the database of the digital marketing big data is:
and acquiring the digital marketing big data, classifying and establishing a database based on the sources, and performing structured processing on the digital marketing big data of the same source in the database by using a form entry mode according to the obtaining time of the big data to obtain the preprocessed digital marketing big data.
3. The digital marketing big data processing method of claim 1, wherein the step of performing feature cleaning comprises:
repeated characters in entries corresponding to all digital marketing big data in the database are obtained, and characters corresponding to a small part of unrepeated features are cleaned, so that the workload of subsequent feature extraction and feature sensitivity calculation is reduced.
4. The method for processing the digital marketing big data according to claim 1, wherein the method for acquiring the characteristics of all entries comprises the following steps:
and (3) taking the text data of each entry as the input of the named body recognition technology, and outputting the obtained entity as the characteristic of the digital marketing big data.
5. The method for processing the digital marketing big data according to claim 1, wherein the method for acquiring the feature relevance of each feature in each entry comprises the following steps:
wherein,Is shown asIn each entryThe feature relevance of the individual features is such that,is as followsThe total number of all features in an individual entry,denotes the firstIn each entryA characteristic ofThe characteristic association parameter of each characteristic is obtained by the position relation of two characteristics appearing in the same entry.
6. The digital marketing big data processing method of claim 1, wherein the positive connection parameter of each feature is obtained by:
wherein,denotes the firstA positive connection parameter of the individual characteristic,for the number of structured entries of the digitized marketing big data in the database,denotes the firstThe first in the individual entryThe number of times that an individual feature occurs,is shown asIs divided byThe total number of occurrences of other features than the individual feature,denotes the firstIn each entryFeature relevance of individual features.
7. The digital marketing big data processing method of claim 1, wherein the method for acquiring the negative connection parameter of each feature is as follows:
wherein,denotes the firstThe negative connection parameter of the individual characteristic,indicates neverThe first of the features that the feature appears in the same entryThe characteristics of the composite material are that,then this indicates that these have never been comparedThe total number of features that an individual feature appears in the same entry,is shown asThe total number of times that an individual feature appears in the database,denotes the firstThe total number of occurrences of a feature in the database,is shown inWithin the range of the individual entryThe frequency of occurrence of a feature is such that,is shown inWithin the range of the individual entryThe frequency of occurrence of a feature is such that,is shown in commonAn entry range, which is a range formed by a certain number of entries.
8. The digital marketing big data processing method according to claim 1, wherein the method for acquiring the connectivity of each feature is as follows:
9. The digital marketing big data processing method of claim 1, wherein the method for acquiring the profitability of each feature comprises the following steps:
wherein,is as followsThe inter-entry density of the individual features,is as followsSecond adjacent occurrence ofThe distance between the two entries where the individual features are located,is the maximum number of adjacent occurrences; the first mentionedDensity within entry of individual featureThe calculation method comprises the following steps:
wherein,is as followsThe in-entry density of the individual features,is shown asIs characterized in thatThe number of occurrences in an individual entry,is shown asIs characterized in thatIn the individual entryThe position of the secondary occurrence is,is shown asIs characterized in thatIn the individual entryThe position of the secondary occurrence is,is shown asThe length of an individual entry; and obtaining the profitability of the characteristics according to the product of the inter-entry density, the intra-entry density and the total occurrence frequency of the characteristics and the ratio of the total number of the entries.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211469771.6A CN115563654B (en) | 2022-11-23 | 2022-11-23 | Digital marketing big data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211469771.6A CN115563654B (en) | 2022-11-23 | 2022-11-23 | Digital marketing big data processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115563654A true CN115563654A (en) | 2023-01-03 |
CN115563654B CN115563654B (en) | 2023-03-31 |
Family
ID=84770775
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211469771.6A Active CN115563654B (en) | 2022-11-23 | 2022-11-23 | Digital marketing big data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563654B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795112A (en) * | 2023-02-08 | 2023-03-14 | 吉林交通职业技术学院 | Data transmission method in scientific research innovation platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012985A (en) * | 2010-11-19 | 2011-04-13 | 国网电力科学研究院 | Sensitive data dynamic identification method based on data mining |
CN105404886A (en) * | 2014-09-16 | 2016-03-16 | 株式会社理光 | Feature model generating method and feature model generating device |
CN113157678A (en) * | 2021-04-19 | 2021-07-23 | 中国人民解放军91977部队 | Multi-source heterogeneous data association method |
US20210303725A1 (en) * | 2020-03-30 | 2021-09-30 | Google Llc | Partially customized machine learning models for data de-identification |
-
2022
- 2022-11-23 CN CN202211469771.6A patent/CN115563654B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012985A (en) * | 2010-11-19 | 2011-04-13 | 国网电力科学研究院 | Sensitive data dynamic identification method based on data mining |
CN105404886A (en) * | 2014-09-16 | 2016-03-16 | 株式会社理光 | Feature model generating method and feature model generating device |
US20210303725A1 (en) * | 2020-03-30 | 2021-09-30 | Google Llc | Partially customized machine learning models for data de-identification |
CN113157678A (en) * | 2021-04-19 | 2021-07-23 | 中国人民解放军91977部队 | Multi-source heterogeneous data association method |
Non-Patent Citations (2)
Title |
---|
朱世玲;郑彦;: "改进的文本特征选取算法研究" * |
杨云鹿: "支持隐私保护的数据挖掘方法研究及实现" * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795112A (en) * | 2023-02-08 | 2023-03-14 | 吉林交通职业技术学院 | Data transmission method in scientific research innovation platform |
CN115795112B (en) * | 2023-02-08 | 2023-04-11 | 吉林交通职业技术学院 | Data transmission method in scientific research innovation platform |
Also Published As
Publication number | Publication date |
---|---|
CN115563654B (en) | 2023-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI612488B (en) | Computer device and method for predicting market demand of commodities | |
WO2019174422A1 (en) | Method for analyzing entity association relationship, and related apparatus | |
CN109165294B (en) | Short text classification method based on Bayesian classification | |
CN104636447B (en) | A kind of intelligent Evaluation method and system towards medicine equipment B2B websites user | |
Liang et al. | Product marketing prediction based on XGboost and LightGBM algorithm | |
CN107563645A (en) | A kind of Financial Risk Analysis method based on big data | |
CN108763496B (en) | Dynamic and static data fusion customer classification method based on grids and density | |
CN110826618A (en) | Personal credit risk assessment method based on random forest | |
CN105608600A (en) | Method for evaluating and optimizing B2B seller performances | |
CN113379457A (en) | Intelligent marketing method oriented to financial field | |
CN115423575B (en) | Internet-based digital analysis management system and method | |
CN115563654B (en) | Digital marketing big data processing method | |
Mumtaz et al. | Feature Selection Using Artificial Immune Network: An Approach for Software Defect Prediction. | |
CN112101452A (en) | Access right control method and device | |
Shakhovska et al. | An Ensemble Methods for Medical Insurance Costs Prediction Task. | |
CN114942974A (en) | E-commerce platform commodity user evaluation emotional tendency classification method | |
CN117453764A (en) | Data mining analysis method | |
CN117593037A (en) | Method for predicting completion capability of human-computer interaction user | |
CN105718444B (en) | Financial concept based on news corpus corresponds to stock correlating method and its device | |
Wei et al. | [Retracted] Analysis and Risk Assessment of Corporate Financial Leverage Using Mobile Payment in the Era of Digital Technology in a Complex Environment | |
CN114328812A (en) | Community resident event identification method and device based on text clustering | |
CN112784049A (en) | Online social platform multivariate knowledge acquisition method facing text data | |
CN114298013A (en) | False goods receiving address prediction method and device based on deep learning | |
CN114757495A (en) | Membership value quantitative evaluation method based on logistic regression | |
CN111061711B (en) | Big data stream unloading method and device based on data processing behavior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |