CN111652280A - Behavior-based target object data analysis method and device and storage medium - Google Patents

Behavior-based target object data analysis method and device and storage medium Download PDF

Info

Publication number
CN111652280A
CN111652280A CN202010370884.5A CN202010370884A CN111652280A CN 111652280 A CN111652280 A CN 111652280A CN 202010370884 A CN202010370884 A CN 202010370884A CN 111652280 A CN111652280 A CN 111652280A
Authority
CN
China
Prior art keywords
data
target object
characteristic
vector
analysis model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010370884.5A
Other languages
Chinese (zh)
Other versions
CN111652280B (en
Inventor
孙侨侨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010370884.5A priority Critical patent/CN111652280B/en
Publication of CN111652280A publication Critical patent/CN111652280A/en
Application granted granted Critical
Publication of CN111652280B publication Critical patent/CN111652280B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a data processing technology, and discloses a behavior-based target object data analysis method, which comprises the following steps: acquiring basic data of a user and characteristic data of a target object; performing word segmentation processing and coding on the basic data and/or the characteristic data to obtain a word vector set; carrying out characteristic value calculation on the word vector set; selecting an optimization vector from the result of the characteristic value calculation to obtain an optimization vector set; training an initial target object analysis model by using the constructed optimization vector set to obtain a standard target object analysis model; analyzing basic data of a user to be analyzed and characteristic data of a target to be analyzed by using a standard target analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. Furthermore, the invention relates to a blockchain technique, the basic data and/or characteristic data can be stored in blockchain nodes. The invention can improve the efficiency and the accuracy of adjusting the target object data.

Description

Behavior-based target object data analysis method and device and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a behavior-based target object data analysis method and device, electronic equipment and a computer-readable storage medium.
Background
With the development of science and technology, objects required by people become more and more complex and personalized, and although more and more products and services appear in front of people, most of the products and services are difficult to meet different requirements of people facing massive users, so that how to adjust the data (such as the size and type of the product or the time and type of the service) of the existing object is more and more emphasized by people so as to meet the requirements of the masses.
For example, a user may prefer a certain price zone or a provision time period for a certain product and/or service, and a wrong pricing or provision of the product and/or service in a wrong time period may result in a loss of the user. At present, the mainstream target object data adjustment strategy is to perform statistical adjustment manually, and the method is performed too much by manpower, lacks timeliness and is not high in accuracy, so that how to adjust the target object efficiently and with high accuracy becomes an increasingly important problem.
Disclosure of Invention
The invention provides a behavior-based target object data analysis method, a behavior-based target object data analysis device, electronic equipment and a computer-readable storage medium, and mainly aims to improve the efficiency and the accuracy of target object data adjustment.
In order to achieve the above object, the present invention provides a behavior-based target data analysis method, including:
acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprises behavior data of the user relative to the target object;
performing word segmentation processing on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set;
carrying out characteristic value calculation on the word vectors in the word vector set;
selecting an optimization vector from the result of the characteristic value calculation to obtain an optimization vector set;
constructing an initial target object analysis model, and training the initial target object analysis model by using the optimization vector set to obtain a standard target object analysis model;
acquiring basic data of a user to be analyzed and characteristic data of a target object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result;
and adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
Optionally, the performing feature value calculation on the word vectors in the word vector set includes:
performing multiple word vector sampling on the word vector set to obtain multiple training sets containing word vectors, wherein the sampling is a random sampling with a place back;
classifying the training sets containing the word vectors respectively to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropies contained in the feature vector sets in the classification results, and selecting the classification results corresponding to the feature vector sets with the information entropies larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
Optionally, the selecting an optimization vector from the result of the feature value calculation to obtain an optimization vector set includes:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain a total characteristic value of each characteristic vector in the different characteristic vector sets;
sorting all the eigenvectors in the different eigenvector sets according to the size of the total eigenvalue to obtain a vector sequence;
and sequentially selecting a plurality of characteristic vectors in the vector sequence, and collecting the plurality of characteristic vectors into the optimized vector set.
Optionally, the constructing an initial target analysis model, and training the initial target analysis model by using the optimized vector set to obtain a standard target analysis model includes:
performing preset times of replacement random sampling on the optimized vector set to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the plurality of training sets by using the plurality of training sets;
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm;
training the initial target object analysis model by using the optimization vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
Optionally, said aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm comprises:
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm as follows:
Figure BDA0002475718680000031
wherein F represents a set of the plurality of decision trees, FkRepresenting the kth decision tree of the plurality of decision trees, K representing a total tree of the plurality of decision trees,
Figure BDA0002475718680000032
the initial target analysis model.
In order to solve the above problems, the present invention also provides a behavior-based target data analysis device, the device including:
the data acquisition module is used for acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprises behavior data of the user relative to the target object;
the data word segmentation module is used for performing word segmentation processing on the basic data and/or the characteristic data and encoding word segmentation result data after word segmentation to obtain a word vector set;
the characteristic value calculation module is used for calculating the characteristic values of the word vectors in the word vector set;
the vector screening module is used for selecting an optimized vector from the result of the characteristic value calculation to obtain an optimized vector set;
the model training module is used for constructing an initial target object analysis model and training the initial target object analysis model by using the optimization vector set to obtain a standard target object analysis model;
the data analysis module is used for acquiring basic data of a user to be analyzed and characteristic data of a target object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result;
and the data adjusting module is used for adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
Optionally, the feature value calculation module is specifically configured to:
performing multiple word vector sampling on the word vector set to obtain multiple training sets containing word vectors, wherein the sampling is a random sampling with a place back;
classifying the training sets containing the word vectors respectively to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropies contained in the feature vector sets in the classification results, and selecting the classification results corresponding to the feature vector sets with the information entropies larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
Optionally, the model training module is specifically configured to:
performing preset times of replacement random sampling on the optimized vector set to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the plurality of training sets by using the plurality of training sets;
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm;
training the initial target object analysis model by using the optimization vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the behavior-based target data analysis method of any of the above.
In order to solve the above problems, the present invention also provides a computer-readable storage medium including a storage data area storing data created according to use of a blockchain node and a storage program area storing a computer program; wherein the computer program, when executed by a processor, implements the target object data analysis method described above.
In the embodiment of the invention, a word vector set is obtained by analyzing and coding after acquiring basic data of a user and characteristic data of a target object; obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. By establishing the model and analyzing through the model, the characteristic data of the target object to be analyzed is adjusted based on the analysis result, so that the efficiency of adjusting the target object data is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the feature data of the target object, the fitting degree of the model analysis result and different users is improved, the trained model can be subjected to accurate individual analysis, the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved. Therefore, the behavior-based target object data analysis method, the behavior-based target object data analysis device and the computer-readable storage medium provided by the invention can achieve the purpose of improving the efficiency and the accuracy of target object data adjustment.
Drawings
FIG. 1 is a schematic flow chart of a behavior-based target data analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a standard target analysis model obtained by training according to an embodiment of the present invention;
FIG. 3 is a block diagram of an apparatus for behavior-based analysis of target data according to one embodiment of the present invention;
fig. 4 is a schematic diagram of an internal structure of an electronic device implementing a behavior-based target object data analysis method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The execution subject of the behavior-based target object data analysis method provided by the embodiment of the present application includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiment of the present application. In other words, the behavior-based target object data analysis method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
The invention provides a behavior-based target object data analysis method. Fig. 1 is a schematic flow chart of a behavior-based target object data analysis method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the behavior-based target object data analysis method includes:
s1, acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprises behavior data of the user relative to the target object.
In this embodiment of the present invention, the basic data includes behavior data of the user, and the behavior data of the user includes behavior data of the user with respect to the target object, that is, the behavior data includes behavior data generated by the user on the target object, such as click data of the user, message feedback data of the user (such as a vote of the user on a product and/or a service in a certain price interval), and the like.
Further, in other alternative embodiments of the present invention, the basic information of the user may further include, but is not limited to, identity information of the user (such as a user name and a user age), consumption information of the user, and the like.
In the embodiment of the present invention, the target object refers to a certain product or service, and the characteristic data of the target object refers to data of a certain product and/or service, such as price, duration, quality, and the like of the product and/or service.
In another alternative embodiment of the present invention, the basic data of the user includes identity data, price inquiry data and claim data, and the characteristic data of the object includes vehicle data and policy data.
In one preferred embodiment of the present invention, the basic data and/or the characteristic data may be stored in a blockchain node.
Specifically, the present invention may invoke the basic information of the user and the feature data of the object from a database node for one or more block chains, which is used to store the basic information of a plurality of users and the feature data of a plurality of objects, using a java statement edited in advance.
Further, in another optional embodiment of the present invention, after the basic information of the user is acquired, the user portrait is generated from the data in the basic information, and the user portrait is displayed in a visualized form, for example, the basic information of the user is displayed in an Excel table and/or a scale chart.
By generating the user portrait based on the basic information of the user, the feature information of different users can be displayed more intuitively.
And S2, performing word segmentation processing on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set.
Preferably, in the embodiment of the present invention, the basic data and/or the feature data are/is subjected to word segmentation processing by using a THULAC word segmentation tool to obtain word segmentation result data, i.e., a word segmentation information set, and each word in the word segmentation information set is encoded by using a unique hot encoding technique, so that the basic data and/or the feature data information is/are converted into a word vector set.
The specific method for encoding processing by the one-hot encoding technology is to use an N-bit state register to encode N states in the segmentation result data, each state is represented by its independent register bit, and at any time, only one bit is valid, that is, only one bit is 1, and the rest are zero values.
The data in the behavior data of the user can be cut through word segmentation processing to obtain word vectors containing less data, so that the calculation resources are occupied when a subsequent computer identifies, the behavior data of the user can be converted into computer recognizable language through coding processing, the computer can rapidly identify the content of the behavior data, and the rapid data analysis is facilitated subsequently.
And S3, calculating the feature value of the word vector in the word vector set.
In detail, the performing feature value computation on the word vectors in the word vector set includes:
performing multiple word vector sampling on the word vector set to obtain multiple training sets containing word vectors, wherein the sampling is a random sampling with a place back;
classifying the training sets containing the word vectors respectively to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropies contained in the feature vector sets in the classification results, and selecting the classification results corresponding to the feature vector sets with the information entropies larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
In this embodiment, when performing multiple word vector sampling, one or more word vectors may be sampled at a time, and preferably, at least two word vectors are sampled at a time.
After multiple word vector sampling, multiple training sets containing word vectors are obtained, specifically, each time the sampled word vectors form one training set, multiple times of sampling are performed to obtain multiple training sets.
In this embodiment, when a plurality of training sets including word vectors are classified respectively, specifically, the word vectors in each training set are classified, that is, whether the word vectors in each training set are feature vectors or non-feature vectors is determined, so that the word vectors in each training set are classified to obtain a feature vector set and a/or non-feature vector set. Specifically, the word vectors are classified by a convolutional neural network which is trained in advance and has a feature judgment function.
For a plurality of classification results, a plurality of feature vector sets and/or non-feature vector sets are obtained, that is, each classification result has a corresponding feature vector set and/or non-feature vector set.
When a feature vector set is calculated in a plurality of classification results, the entropy of information included in each feature vector set may be calculated. Specifically, in the embodiment of the present invention, the following information entropy algorithm is used to calculate the information entropy H (Y, X) included in the feature vector set in the multiple classification results, where the information entropy algorithm is:
Figure BDA0002475718680000081
wherein X is the feature vector set, Y is the classification result corresponding to the feature vector set, and XiIs the ith feature vector in the feature vector set, k is the number of feature vectors in the feature vector set,
Figure BDA0002475718680000082
is the frequency with which the ith feature vector occurs in the set of feature vectors.
Preferably, the present invention implements a first eigenvalue algorithm to compute the first eigenvalue of the eigenvector using:
Figure BDA0002475718680000091
wherein n is the number of the plurality of classification results, k is any feature vector in the mth classification result, k' is any vector different from k in the mth classification result, pmkpmk′Indicating the probability that two vector classes randomly drawn from the m-th classification result are not the same, GImIs the first eigenvalue of the eigenvector k.
The embodiment of the invention calculates the second eigenvalue of the eigenvector by using the following second eigenvalue algorithm:
Figure BDA0002475718680000092
wherein ,
Figure BDA0002475718680000093
for the feature vector j in the feature vector set m of any classification result,m is the set of the feature vector sets in all the classification result sets, M is the feature vector set in any classification result,
Figure BDA0002475718680000094
is the second eigenvalue of eigenvector j.
And S4, selecting an optimization vector from the result of the characteristic value calculation to obtain an optimization vector set.
Specifically, the selecting an optimization vector from the result of the feature value calculation to obtain an optimization vector set includes:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain a total characteristic value of each characteristic vector in the different characteristic vector sets;
sorting all the eigenvectors in the different eigenvector sets according to the size of the total eigenvalue to obtain a vector sequence;
and sequentially selecting a plurality of characteristic vectors in the vector sequence, and collecting the plurality of characteristic vectors into the optimized vector set.
Specifically, because each feature vector has a first feature value and a second feature value, when the first feature value and the second feature value are added correspondingly, the first feature value and the second feature vector value of each feature vector are added to obtain a total feature value of each feature vector, so as to obtain the total feature value of each feature vector in different feature vector sets.
In this embodiment, when selecting the feature vector, a plurality of feature vectors in the vector sequence may be sequentially selected according to a descending order. For example, the first 100 feature vectors are selected from the vector sequence, and the selected 100 feature vectors are aggregated into an optimized feature vector.
The basic data and the characteristic data may contain a large amount of useless information, which causes that a word vector set obtained after encoding also contains a large amount of useless word vectors, so that the occupation of computing resources in the subsequent analysis process can be reduced by selecting the optimized vector, and the analysis efficiency is improved.
S5, constructing an initial target object analysis model, and training the initial target object analysis model by using the optimization vector set to obtain a standard target object analysis model.
In this embodiment, a more accurate analysis model can be obtained by constructing an initial target analysis model and training the initial target analysis model using the optimized vector set.
Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a standard target analysis model obtained by training according to an embodiment of the present invention.
In detail, the S5 includes:
and S51, performing preset times of replacement random sampling on the optimized vector set to obtain a plurality of training sets.
In the embodiment of the present invention, the preset number is preset multiple times, for example, the preset number is 8 times. And specifically, a Bagging method can be adopted for replacing random sampling, the result of each sampling is a training set, and a plurality of training sets can be obtained through a plurality of times of random sampling.
And S52, generating a plurality of decision trees corresponding to the training sets by using the training sets.
In detail, when a decision tree is generated from a training set obtained by sampling, one training set may generate one corresponding decision tree.
In detail, random sampling may be performed in the optimized vector set and a decision tree may be generated based on the training set obtained by the sampling using a decision function.
And S53, aggregating the plurality of decision trees into the initial target object analysis model by using an aggregation algorithm.
Preferably, in an embodiment of the present invention, aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm includes:
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm as follows:
Figure BDA0002475718680000101
wherein F represents the sameSet of multiple decision trees, fkRepresenting the kth decision tree of the plurality of decision trees, K representing a total tree of the plurality of decision trees,
Figure BDA0002475718680000102
the initial target analysis model.
And S54, training the initial target object analysis model by using the optimization vector set to obtain a training target object analysis model.
Specifically, the embodiment of the present invention trains the initial target analysis model by using the following objective function:
Figure BDA0002475718680000111
wherein Obj is the value of the objective function, yiFor the tag values contained in the optimized vectors in the optimized vector set,
Figure BDA0002475718680000112
for the output of the initial target analysis model, K represents the total tree of the decision tree, fkDenotes the kth decision tree, β (f)k) Is a preset regularization term.
And S55, performing parameter optimization on the training target object analysis model to obtain the standard target object analysis model.
Preferably, the invention adopts a model parameter tuning method to perform parameter tuning on the training target object analysis model.
Specifically, the model parameter tuning method includes, but is not limited to: a general parameter tuning method, a Booster parameter tuning method and a learning objective parameter tuning method.
S6, obtaining basic data of a user to be analyzed and characteristic data of a target object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result.
In this embodiment, the basic data of the user to be analyzed and the feature data of the target object to be analyzed may be acquired from the target database, or may be acquired from a business system such as a sales system.
Specifically, the number of the users to be analyzed may be one or more, and the number of the target objects to be analyzed may also be one or more.
Further, in another optional embodiment of the present invention, the method of the present invention further comprises: after obtaining the analysis result, carrying out mathematical statistics on the analysis result, including:
carrying out statistical calculation on the analysis result to obtain a statistical result;
and displaying the statistical result in a visual form.
Wherein the statistical calculations include, but are not limited to: calculating the average value of the analysis results, calculating the variance of the analysis results, and calculating the standard deviation of the analysis results.
When the statistical result is displayed in a visual form, the statistical result can be displayed in a form of a histogram and/or a pie chart.
And S7, adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
For example, when the target object is data of a certain product and/or service, and the analysis result shows that the user prefers to a certain price interval, the price of the target object can be adjusted according to the analysis result, and specifically, the price of the target object can be adjusted to the price interval shown by the analysis result. Or, if the analysis result indicates that the user prefers to receive a certain service in a certain time interval, the service providing time may be adjusted according to the analysis result, and specifically, the service providing time may be adjusted to the time interval shown by the analysis result.
In the embodiment of the invention, a word vector set is obtained by analyzing and coding after acquiring basic data of a user and characteristic data of a target object; obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. By establishing the model and analyzing through the model, the characteristic data of the target object to be analyzed is adjusted based on the analysis result, so that the efficiency of adjusting the target object data is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the feature data of the target object, the fitting degree of the model analysis result and different users is improved, the trained model can be subjected to accurate individual analysis, the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved.
Fig. 3 is a block diagram of the behavior-based target data analysis device according to the present invention.
The behavior-based target object data analysis apparatus 100 according to the present invention may be installed in an electronic device. According to the realized functions, the behavior-based target object data analysis device can comprise a data acquisition module 101, a data word segmentation module 102, a characteristic value calculation module 103, a vector screening module 104, a model training module 105, a data analysis module 106 and a data adjustment module 107. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the data acquisition module 101 is configured to acquire basic data of a user and feature data of a target object, where the basic data includes behavior data of the user relative to the target object;
the data word segmentation module 102 is configured to perform word segmentation processing on the basic data and/or the feature data, and encode word segmentation result data after word segmentation to obtain a word vector set;
the feature value calculation module 103 is configured to perform feature value calculation on the word vectors in the word vector set;
the vector screening module 104 is configured to select an optimized vector from the result of the feature value calculation to obtain an optimized vector set;
the model training module 105 is configured to construct an initial target analysis model, and train the initial target analysis model by using the optimization vector set to obtain a standard target analysis model;
the data analysis module 106 is configured to obtain basic data of a user to be analyzed and feature data of a target object to be analyzed corresponding to the user to be analyzed, and analyze the basic data of the user to be analyzed and the feature data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result;
the data adjusting module 107 is configured to adjust the feature data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
In detail, the specific implementation of each module of the behavior-based target object data analysis device is as follows:
the data acquisition module 101 acquires basic data of a user and characteristic data of a target object, wherein the basic data includes behavior data of the user relative to the target object.
In this embodiment of the present invention, the basic data includes behavior data of the user, and the behavior data of the user includes behavior data of the user with respect to the target object, that is, the behavior data includes behavior data generated by the user on the target object, such as click data of the user, message feedback data of the user (such as a vote of the user on a product and/or a service in a certain price interval), and the like.
Further, in other alternative embodiments of the present invention, the basic information of the user may further include, but is not limited to, identity information of the user (such as a user name and a user age), consumption information of the user, and the like.
In the embodiment of the present invention, the target object refers to a certain product or service, and the characteristic data of the target object refers to data of a certain product and/or service, such as price, duration, quality, and the like of the product and/or service.
In another alternative embodiment of the present invention, the basic data of the user includes identity data, price inquiry data and claim data, and the characteristic data of the object includes vehicle data and policy data.
In one preferred embodiment of the present invention, the basic data and/or the characteristic data may be stored in a blockchain node. Specifically, the present invention may invoke the basic information of the user and the feature data of the object from a database node for one or more block chains, which is used to store the basic information of a plurality of users and the feature data of a plurality of objects, using a java statement edited in advance.
Further, in another optional embodiment of the present invention, after the basic information of the user is acquired, the user portrait is generated from the data in the basic information, and the user portrait is displayed in a visualized form, for example, the basic information of the user is displayed in an Excel table and/or a scale chart.
By generating the user portrait based on the basic information of the user, the feature information of different users can be displayed more intuitively.
The data word segmentation module 102 is configured to perform word segmentation processing on the basic data and/or the feature data, and encode word segmentation result data after word segmentation to obtain a word vector set.
Preferably, in the embodiment of the present invention, the basic data and/or the feature data are/is subjected to word segmentation processing by using a THULAC word segmentation tool to obtain word segmentation result data, i.e., a word segmentation information set, and each word in the word segmentation information set is encoded by using a unique hot encoding technique, so that the basic data and/or the feature data information is/are converted into a word vector set.
The specific method for encoding processing by the one-hot encoding technology is to use an N-bit state register to encode N states in the segmentation result data, each state is represented by its independent register bit, and at any time, only one bit is valid, that is, only one bit is 1, and the rest are zero values.
The data in the behavior data of the user can be cut through word segmentation processing to obtain word vectors containing less data, so that the calculation resources are occupied when a subsequent computer identifies, the behavior data of the user can be converted into computer recognizable language through coding processing, the computer can rapidly identify the content of the behavior data, and the rapid data analysis is facilitated subsequently.
The feature value calculation module 103 is configured to perform feature value calculation on the word vectors in the word vector set.
In detail, the feature value calculation module 103 is specifically configured to:
performing multiple word vector sampling on the word vector set to obtain multiple training sets containing word vectors, wherein the sampling is a random sampling with a place back;
classifying the training sets containing the word vectors respectively to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropies contained in the feature vector sets in the classification results, and selecting the classification results corresponding to the feature vector sets with the information entropies larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
In this embodiment, when performing multiple word vector sampling, one or more word vectors may be sampled at a time, and preferably, at least two word vectors are sampled at a time.
After multiple word vector sampling, multiple training sets containing word vectors are obtained, specifically, each time the sampled word vectors form one training set, multiple times of sampling are performed to obtain multiple training sets.
In this embodiment, when a plurality of training sets including word vectors are classified respectively, specifically, the word vectors in each training set are classified, that is, whether the word vectors in each training set are feature vectors or non-feature vectors is determined, so that the word vectors in each training set are classified to obtain a feature vector set and a/or non-feature vector set. Specifically, the word vectors are classified by a convolutional neural network which is trained in advance and has a feature judgment function.
For a plurality of classification results, a plurality of feature vector sets and/or non-feature vector sets are obtained, that is, each classification result has a corresponding feature vector set and/or non-feature vector set.
When a feature vector set is calculated in a plurality of classification results, the entropy of information included in each feature vector set may be calculated. Specifically, in the embodiment of the present invention, the following information entropy algorithm is used to calculate the information entropy H (Y, X) included in the feature vector set in the multiple classification results, where the information entropy algorithm is:
Figure BDA0002475718680000151
wherein X is the feature vector set, Y is the classification result corresponding to the feature vector set, and XiIs the ith feature vector in the feature vector set, k is the number of feature vectors in the feature vector set,
Figure BDA0002475718680000152
is the frequency with which the ith feature vector occurs in the set of feature vectors.
Preferably, the present invention implements a first eigenvalue algorithm to compute the first eigenvalue of the eigenvector using:
Figure BDA0002475718680000153
wherein n is the number of the plurality of classification results, k is any vector in the mth classification result, k' is any vector different from k in the mth classification result, pmkpmk′Represents the classification result from the mProbability that two vector classes drawn at random are not the same, GImIs the first eigenvalue of the eigenvector k.
The embodiment of the invention calculates the second eigenvalue of the eigenvector by using the following second eigenvalue algorithm:
Figure BDA0002475718680000154
wherein ,
Figure BDA0002475718680000161
is the feature vector j in the feature vector set M of any classification result, M is the set of the feature vector sets in all the classification result sets, M is the feature vector set in any classification result,
Figure BDA0002475718680000162
is the second eigenvalue of eigenvector j.
The vector screening module 104 is configured to select an optimized vector from the result of the feature value calculation, so as to obtain an optimized vector set.
Specifically, the vector filtering module 104 is specifically configured to:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain a total characteristic value of each characteristic vector in the different characteristic vector sets;
sorting all the eigenvectors in the different eigenvector sets according to the size of the total eigenvalue to obtain a vector sequence;
and sequentially selecting a plurality of characteristic vectors in the vector sequence, and collecting the plurality of characteristic vectors into the optimized vector set.
Specifically, because each feature vector has a first feature value and a second feature value, when the first feature value and the second feature value are added correspondingly, the first feature value and the second feature vector value of each feature vector are added to obtain a total feature value of each feature vector, so as to obtain the total feature value of each feature vector in different feature vector sets.
In this embodiment, when selecting the feature vector, a plurality of feature vectors in the vector sequence may be sequentially selected according to a descending order. For example, the first 100 feature vectors are selected from the vector sequence, and the selected 100 feature vectors are aggregated into an optimized feature vector.
The basic data and the characteristic data may contain a large amount of useless information, which causes that a word vector set obtained after encoding also contains a large amount of useless word vectors, so that the occupation of computing resources in the subsequent analysis process can be reduced by selecting the optimized vector, and the analysis efficiency is improved.
The model training module 105 is configured to construct an initial target analysis model, and train the initial target analysis model using the optimization vector set to obtain a standard target analysis model.
In this embodiment, a more accurate analysis model can be obtained by constructing an initial target analysis model and training the initial target analysis model using the optimized vector set.
Further, the model training module 105 is specifically configured to:
performing preset times of replacement random sampling on the optimized vector set to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the plurality of training sets by using the plurality of training sets;
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm;
training the initial target object analysis model by using the optimization vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
In the embodiment of the present invention, the preset number is preset multiple times, for example, the preset number is 8 times. And specifically, a Bagging method can be adopted for replacing random sampling, the result of each sampling is a training set, and a plurality of training sets can be obtained through a plurality of times of random sampling.
In detail, when a decision tree is generated from a training set obtained by sampling, one training set may generate one corresponding decision tree.
In detail, random sampling may be performed in the optimized vector set and a decision tree may be generated based on the training set obtained by the sampling using a decision function.
Preferably, in an embodiment of the present invention, aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm includes:
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm as follows:
Figure BDA0002475718680000171
wherein F represents a set of the plurality of decision trees, FkRepresenting the kth decision tree of the plurality of decision trees, K representing a total tree of the plurality of decision trees,
Figure BDA0002475718680000172
the initial target analysis model.
Specifically, the embodiment of the present invention trains the initial target analysis model by using the following objective function:
Figure BDA0002475718680000173
wherein Obj is the value of the objective function, yiFor the tag values contained in the optimized vectors in the optimized vector set,
Figure BDA0002475718680000174
for the output of the initial target analysis model, K represents the total tree of the decision tree, fkDenotes the kth decision tree, β (f)k) Is a preset regularization term.
The data analysis module 106 is configured to obtain basic data of a user to be analyzed and feature data of a target object to be analyzed corresponding to the user to be analyzed, and analyze the basic data of the user to be analyzed and the feature data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result.
In this embodiment, the basic data of the user to be analyzed and the feature data of the target object to be analyzed may be acquired from the target database, or may be acquired from a business system such as a sales system.
Specifically, the number of the users to be analyzed may be one or more, and the number of the target objects to be analyzed may also be one or more.
Further, in another optional embodiment of the present invention, the apparatus of the present invention further includes a statistics module, where the statistics module is configured to: after obtaining the analysis result, carrying out mathematical statistics on the analysis result.
The statistics module is specifically configured to: and after the analysis result is obtained, carrying out statistical calculation on the analysis result to obtain a statistical result, and displaying the statistical result in a visual form.
Wherein the statistical calculations include, but are not limited to: calculating the average value of the analysis results, calculating the variance of the analysis results, and calculating the standard deviation of the analysis results.
When the statistical result is displayed in a visual form, the statistical result can be displayed in a form of a histogram and/or a pie chart.
The data adjusting module 107 is configured to adjust the feature data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
For example, when the target object is data of a certain product and/or service, and the analysis result shows that the user prefers to a certain price interval, the price of the target object can be adjusted according to the analysis result, and specifically, the price of the target object can be adjusted to the price interval shown by the analysis result. Or, if the analysis result indicates that the user prefers to receive a certain service in a certain time interval, the service providing time may be adjusted according to the analysis result, and specifically, the service providing time may be adjusted to the time interval shown by the analysis result.
In the embodiment of the invention, a word vector set is obtained by analyzing and coding after acquiring basic data of a user and characteristic data of a target object; obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. By establishing the model and analyzing through the model, the characteristic data of the target object to be analyzed is adjusted based on the analysis result, so that the efficiency of adjusting the target object data is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the feature data of the target object, the fitting degree of the model analysis result and different users is improved, the trained model can be subjected to accurate individual analysis, the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved.
Fig. 4 is a schematic structural diagram of an electronic device for implementing a behavior-based target object data analysis method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a behavior-based object data analysis program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the behavior-based object data analysis program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules stored in the memory 11 (for example, executing a behavior-based object data analysis program and the like) and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
In the embodiment of the invention, a word vector set is obtained by analyzing and coding after acquiring basic data of a user and characteristic data of a target object; obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. By establishing the model and analyzing through the model, the characteristic data of the target object to be analyzed is adjusted based on the analysis result, so that the efficiency of adjusting the target object data is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the feature data of the target object, the fitting degree of the model analysis result and different users is improved, the trained model can be subjected to accurate individual analysis, the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved.
Fig. 4 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (organic light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The memory 11 in the electronic device 1 stores a behavior-based object data analysis program 12 that is a combination of instructions that, when executed in the processor 10, enable:
acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprises behavior data of the user relative to the target object;
performing word segmentation processing on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set;
carrying out characteristic value calculation on the word vectors in the word vector set;
selecting an optimization vector from the result of the characteristic value calculation to obtain an optimization vector set;
constructing an initial target object analysis model, and training the initial target object analysis model by using the optimization vector set to obtain a standard target object analysis model;
acquiring basic data of a user to be analyzed and characteristic data of a target object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result;
and adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any accompanying claims should not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for behavior-based target data analysis, the method comprising:
acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprises behavior data of the user relative to the target object;
performing word segmentation processing on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set;
carrying out characteristic value calculation on the word vectors in the word vector set;
selecting an optimization vector from the result of the characteristic value calculation to obtain an optimization vector set;
constructing an initial target object analysis model, and training the initial target object analysis model by using the optimization vector set to obtain a standard target object analysis model;
acquiring basic data of a user to be analyzed and characteristic data of a target object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result;
and adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
2. The behavior-based object data analysis method of claim 1, wherein the performing feature value calculations on word vectors in the set of word vectors comprises:
performing multiple word vector sampling on the word vector set to obtain multiple training sets containing word vectors, wherein the sampling is a random sampling with a place back;
classifying the training sets containing the word vectors respectively to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropies contained in the feature vector sets in the classification results, and selecting the classification results corresponding to the feature vector sets with the information entropies larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
3. The behavior-based object data analysis method of claim 2, wherein the selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set comprises:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain a total characteristic value of each characteristic vector in the different characteristic vector sets;
sorting all the eigenvectors in the different eigenvector sets according to the size of the total eigenvalue to obtain a vector sequence;
and sequentially selecting a plurality of characteristic vectors in the vector sequence, and collecting the plurality of characteristic vectors into the optimized vector set.
4. The behavior-based target data analysis method of any one of claims 1 to 3, wherein the constructing an initial target analysis model, the training of the initial target analysis model using the set of optimization vectors to obtain a standard target analysis model, comprises:
performing preset times of replacement random sampling on the optimized vector set to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the plurality of training sets by using the plurality of training sets;
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm;
training the initial target object analysis model by using the optimization vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
5. The behavior-based target data analysis method of claim 4, wherein the aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm comprises:
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm as follows:
Figure FDA0002475718670000021
wherein F represents a set of the plurality of decision trees, FkRepresenting the kth decision tree of the plurality of decision trees, K representing a total tree of the plurality of decision trees,
Figure FDA0002475718670000022
the initial target analysis model.
6. A behavior-based target data analysis apparatus, the apparatus comprising:
the data acquisition module is used for acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprises behavior data of the user relative to the target object;
the data word segmentation module is used for performing word segmentation processing on the basic data and/or the characteristic data and encoding word segmentation result data after word segmentation to obtain a word vector set;
the characteristic value calculation module is used for calculating the characteristic values of the word vectors in the word vector set;
the vector screening module is used for selecting an optimized vector from the result of the characteristic value calculation to obtain an optimized vector set;
the model training module is used for constructing an initial target object analysis model and training the initial target object analysis model by using the optimization vector set to obtain a standard target object analysis model;
the data analysis module is used for acquiring basic data of a user to be analyzed and characteristic data of a target object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using the standard target object analysis model to obtain an analysis result;
and the data adjusting module is used for adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
7. The behavior-based object data analysis device of claim 6, wherein the eigenvalue calculation module is specifically configured to:
performing multiple word vector sampling on the word vector set to obtain multiple training sets containing word vectors, wherein the sampling is a random sampling with a place back;
classifying the training sets containing the word vectors respectively to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropies contained in the feature vector sets in the classification results, and selecting the classification results corresponding to the feature vector sets with the information entropies larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
8. The behavior-based object data analysis device of claim 6, wherein the model training module is specifically configured to:
performing preset times of replacement random sampling on the optimized vector set to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the plurality of training sets by using the plurality of training sets;
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm;
training the initial target object analysis model by using the optimization vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a behavior-based target data analysis method as claimed in any one of claims 1 to 5.
10. A computer-readable storage medium comprising a storage data area storing data created according to use of blockchain nodes and a storage program area storing a computer program; wherein the computer program, when executed by a processor, implements a method of target data analysis as claimed in any one of claims 1 to 5.
CN202010370884.5A 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium Active CN111652280B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010370884.5A CN111652280B (en) 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010370884.5A CN111652280B (en) 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111652280A true CN111652280A (en) 2020-09-11
CN111652280B CN111652280B (en) 2023-10-27

Family

ID=72351971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010370884.5A Active CN111652280B (en) 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111652280B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184241A (en) * 2020-09-27 2021-01-05 中国银联股份有限公司 Identity authentication method and device
CN113240036A (en) * 2021-05-28 2021-08-10 北京达佳互联信息技术有限公司 Object classification method and device, electronic equipment and storage medium
CN113505280A (en) * 2021-07-28 2021-10-15 全知科技(杭州)有限责任公司 Sensitive key information identification and extraction technology for general scene
CN113656559A (en) * 2021-10-18 2021-11-16 印象(山东)大数据有限公司 Data analysis method and device based on metering platform and electronic equipment
CN114064440A (en) * 2022-01-18 2022-02-18 恒生电子股份有限公司 Training method of credibility analysis model, credibility analysis method and related device
CN114844788A (en) * 2022-04-25 2022-08-02 中国电信股份有限公司 Network data analysis method, system, device and storage medium
CN112988893B (en) * 2021-03-15 2023-05-12 中国联合网络通信集团有限公司 Information management method, system, block chain node and medium based on block chain

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121671A1 (en) * 2008-11-11 2010-05-13 Combinenet, Inc. Automated Channel Abstraction for Advertising Auctions
CN110033383A (en) * 2019-02-18 2019-07-19 阿里巴巴集团控股有限公司 A kind of data processing method, equipment, medium and device
CN110827069A (en) * 2019-10-28 2020-02-21 阿里巴巴(中国)有限公司 Data processing method, device, medium, and electronic apparatus
CN110910199A (en) * 2019-10-16 2020-03-24 中国平安人寿保险股份有限公司 Item information sorting method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121671A1 (en) * 2008-11-11 2010-05-13 Combinenet, Inc. Automated Channel Abstraction for Advertising Auctions
CN110033383A (en) * 2019-02-18 2019-07-19 阿里巴巴集团控股有限公司 A kind of data processing method, equipment, medium and device
CN110910199A (en) * 2019-10-16 2020-03-24 中国平安人寿保险股份有限公司 Item information sorting method and device, computer equipment and storage medium
CN110827069A (en) * 2019-10-28 2020-02-21 阿里巴巴(中国)有限公司 Data processing method, device, medium, and electronic apparatus

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184241A (en) * 2020-09-27 2021-01-05 中国银联股份有限公司 Identity authentication method and device
CN112184241B (en) * 2020-09-27 2024-02-20 中国银联股份有限公司 Identity authentication method and device
CN112988893B (en) * 2021-03-15 2023-05-12 中国联合网络通信集团有限公司 Information management method, system, block chain node and medium based on block chain
CN113240036A (en) * 2021-05-28 2021-08-10 北京达佳互联信息技术有限公司 Object classification method and device, electronic equipment and storage medium
CN113240036B (en) * 2021-05-28 2023-11-07 北京达佳互联信息技术有限公司 Object classification method and device, electronic equipment and storage medium
CN113505280A (en) * 2021-07-28 2021-10-15 全知科技(杭州)有限责任公司 Sensitive key information identification and extraction technology for general scene
CN113505280B (en) * 2021-07-28 2023-08-22 全知科技(杭州)有限责任公司 Sensitive key information identification and extraction technology for general scene
CN113656559A (en) * 2021-10-18 2021-11-16 印象(山东)大数据有限公司 Data analysis method and device based on metering platform and electronic equipment
CN113656559B (en) * 2021-10-18 2022-01-25 印象(山东)大数据有限公司 Data analysis method and device based on metering platform and electronic equipment
CN114064440A (en) * 2022-01-18 2022-02-18 恒生电子股份有限公司 Training method of credibility analysis model, credibility analysis method and related device
CN114844788A (en) * 2022-04-25 2022-08-02 中国电信股份有限公司 Network data analysis method, system, device and storage medium
CN114844788B (en) * 2022-04-25 2023-10-31 中国电信股份有限公司 Network data analysis method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN111652280B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN111652280B (en) Behavior-based target object data analysis method, device and storage medium
CN112148577B (en) Data anomaly detection method and device, electronic equipment and storage medium
CN112541745B (en) User behavior data analysis method and device, electronic equipment and readable storage medium
CN109784736A (en) A kind of analysis and decision system based on big data
CN111625713A (en) Resource recommendation method and device based on big data, electronic equipment and medium
CN113688923B (en) Order abnormity intelligent detection method and device, electronic equipment and storage medium
CN111756760B (en) User abnormal behavior detection method based on integrated classifier and related equipment
CN114398557B (en) Information recommendation method and device based on double images, electronic equipment and storage medium
CN113626606B (en) Information classification method, device, electronic equipment and readable storage medium
CN113656690B (en) Product recommendation method and device, electronic equipment and readable storage medium
CN111696663A (en) Disease risk analysis method and device, electronic equipment and computer storage medium
CN112306835A (en) User data monitoring and analyzing method, device, equipment and medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN113051480A (en) Resource pushing method and device, electronic equipment and storage medium
CN113868529A (en) Knowledge recommendation method and device, electronic equipment and readable storage medium
CN112269875A (en) Text classification method and device, electronic equipment and storage medium
CN113628043B (en) Complaint validity judging method, device, equipment and medium based on data classification
CN114742412A (en) Software technology service system and method
CN117155771B (en) Equipment cluster fault tracing method and device based on industrial Internet of things
CN113344415A (en) Deep neural network-based service distribution method, device, equipment and medium
CN111460293B (en) Information pushing method and device and computer readable storage medium
CN111583215A (en) Intelligent damage assessment method and device for damage image, electronic equipment and storage medium
CN116843395A (en) Alarm classification method, device, equipment and storage medium of service system
CN113706019B (en) Service capability analysis method, device, equipment and medium based on multidimensional data
CN112580505B (en) Method and device for identifying network point switch door state, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant