CN111652280B - Behavior-based target object data analysis method, device and storage medium - Google Patents

Behavior-based target object data analysis method, device and storage medium Download PDF

Info

Publication number
CN111652280B
CN111652280B CN202010370884.5A CN202010370884A CN111652280B CN 111652280 B CN111652280 B CN 111652280B CN 202010370884 A CN202010370884 A CN 202010370884A CN 111652280 B CN111652280 B CN 111652280B
Authority
CN
China
Prior art keywords
data
target object
characteristic
analyzed
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010370884.5A
Other languages
Chinese (zh)
Other versions
CN111652280A (en
Inventor
孙侨侨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010370884.5A priority Critical patent/CN111652280B/en
Publication of CN111652280A publication Critical patent/CN111652280A/en
Application granted granted Critical
Publication of CN111652280B publication Critical patent/CN111652280B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data processing technology, and discloses a target object data analysis method based on behaviors, which comprises the following steps: acquiring basic data of a user and characteristic data of a target object; performing word segmentation processing and encoding on the basic data and/or the characteristic data to obtain a word vector set; calculating a characteristic value of the word vector set; selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set; training the initial target object analysis model by using the constructed optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the object to be analyzed by using a standard object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. Furthermore, the present invention relates to blockchain techniques, wherein the base data and/or the feature data may be stored in a blockchain node. The invention can improve the efficiency and accuracy of adjusting the target data.

Description

Behavior-based target object data analysis method, device and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a behavior-based target object data analysis method, a behavior-based target object data analysis device, an electronic device, and a computer readable storage medium.
Background
With the development of technology, objects required by people become more and more complex and personalized, and although more and more products and services are presented to people, most of the products and services are difficult to meet different demands of people, so that how to adjust the data (such as the size and type of the products or the time and type of the services) of the existing objects meets the demands of the masses, and the objects are more and more emphasized.
For example, a user may prefer a certain price interval or offer period for a certain product and/or service, and incorrect pricing or offer of a product and/or service during an incorrect period would result in a loss for the user. The current mainstream target object data adjustment strategy is to manually perform statistical adjustment, and the method is too dependent on manual operation, and lacks timeliness and has low accuracy, so how to efficiently and accurately adjust the target object becomes an increasingly important problem.
Disclosure of Invention
The invention provides a behavior-based target object data analysis method, a behavior-based target object data analysis device, an electronic device and a computer-readable storage medium, and mainly aims to improve efficiency and accuracy of adjusting target object data.
In order to achieve the above object, the present invention provides a behavior-based target data analysis method, including:
acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprise behavior data of the user relative to the target object;
performing word segmentation on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set;
calculating characteristic values of word vectors in the word vector set;
selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set;
constructing an initial target object analysis model, and training the initial target object analysis model by utilizing the optimized vector set to obtain a standard target object analysis model;
basic data of a user to be analyzed and characteristic data of an object to be analyzed corresponding to the user to be analyzed are obtained, and the basic data of the user to be analyzed and the characteristic data of the object to be analyzed are analyzed by utilizing the standard object analysis model, so that an analysis result is obtained;
And adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
Optionally, the calculating the eigenvalue of the word vector in the word vector set includes:
performing word vector sampling for a plurality of times on the word vector set to obtain a plurality of training sets containing word vectors, wherein the sampling is replaced random sampling;
respectively classifying the training sets containing the word vectors to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropy contained in the feature vector sets in the plurality of classification results, and selecting a classification result corresponding to the feature vector set with the information entropy larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
Optionally, selecting an optimization vector from the results of the feature value calculation to obtain an optimization vector set, including:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain total characteristic values of all characteristic vectors in the different characteristic vector sets;
sorting all the feature vectors in the different feature vector sets according to the total feature value to obtain a vector sequence;
And sequentially selecting a plurality of eigenvectors in the vector sequence, and collecting the eigenvectors into the optimized vector set.
Optionally, the constructing an initial target object analysis model, training the initial target object analysis model by using the optimized vector set, to obtain a standard target object analysis model, including:
carrying out random sampling with put back on the optimized vector set for preset times to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the training sets by utilizing the training sets;
utilizing an aggregation algorithm to aggregate the plurality of decision trees into the initial target object analysis model;
training the initial target object analysis model by using the optimized vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
Optionally, the aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm comprises:
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm:
wherein F represents a set of the plurality of decision trees, F k Represents the kth decision tree in the plurality of decision trees, K represents the total tree of the plurality of decision trees,a model is analyzed for the initial target.
In order to solve the above problems, the present invention also provides a behavior-based object data analysis apparatus, the apparatus comprising:
the data acquisition module is used for acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprise behavior data of the user relative to the target object;
the data word segmentation module is used for carrying out word segmentation processing on the basic data and/or the characteristic data and encoding word segmentation result data after word segmentation to obtain a word vector set;
the characteristic value calculation module is used for calculating characteristic values of the word vectors in the word vector set;
the vector screening module is used for selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set;
the model training module is used for constructing an initial target object analysis model, and training the initial target object analysis model by utilizing the optimized vector set to obtain a standard target object analysis model;
the data analysis module is used for acquiring basic data of a user to be analyzed and characteristic data of an object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the object to be analyzed by utilizing the standard object analysis model to obtain an analysis result;
And the data adjustment module is used for adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
Optionally, the feature value calculating module is specifically configured to:
performing word vector sampling for a plurality of times on the word vector set to obtain a plurality of training sets containing word vectors, wherein the sampling is replaced random sampling;
respectively classifying the training sets containing the word vectors to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropy contained in the feature vector sets in the plurality of classification results, and selecting a classification result corresponding to the feature vector set with the information entropy larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
Optionally, the model training module is specifically configured to:
carrying out random sampling with put back on the optimized vector set for preset times to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the training sets by utilizing the training sets;
utilizing an aggregation algorithm to aggregate the plurality of decision trees into the initial target object analysis model;
Training the initial target object analysis model by using the optimized vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system
A processor executing instructions stored in the memory to implement the behavior-based object data analysis method of any one of the above.
In order to solve the above-described problems, the present invention also provides a computer-readable storage medium including a storage data area storing data created according to use of blockchain nodes and a storage program area storing a computer program; wherein the computer program when executed by the processor implements the method for analyzing object data described above.
In the embodiment of the invention, after the basic data of the user and the characteristic data of the target object are acquired, analysis and coding processing are carried out to obtain a word vector set; further obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. The characteristic data of the target object to be analyzed is adjusted based on the analysis result by establishing a model and analyzing the model, so that the efficiency of adjusting the data of the target object is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the characteristic data of the target object, the degree of fit between the result of model analysis and different users is improved, the model obtained through training can be subjected to accurate individual analysis, further the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved. Therefore, the object data analysis method, the object data analysis device and the computer readable storage medium based on the behaviors can achieve the aim of improving the efficiency and the accuracy of adjusting the object data.
Drawings
FIG. 1 is a flow chart of a behavior-based target data analysis method according to an embodiment of the present application;
FIG. 2 is a flow chart of a standard target analysis model obtained by training in accordance with an embodiment of the present application;
FIG. 3 is a schematic block diagram of a behavior-based object data analysis device according to an embodiment of the present application;
fig. 4 is a schematic diagram of an internal structure of an electronic device for implementing a behavior-based object data analysis method according to an embodiment of the present application;
the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The execution subject of the behavior-based object data analysis method provided by the embodiment of the application includes, but is not limited to, at least one of a server, a terminal and the like capable of being configured to execute the electronic device of the method provided by the embodiment of the application. In other words, the behavior-based object data analysis method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Blockchains are novel application modes of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The blockchain underlying platform may include processing modules for user management, basic services, smart contracts, operation monitoring, and the like. The user management module is responsible for identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, maintenance of corresponding relation between the real identity of the user and the blockchain address (authority management) and the like, and under the condition of authorization, supervision and audit of transaction conditions of certain real identities, and provision of rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node devices, is used for verifying the validity of a service request, recording the service request on a storage after the effective request is identified, for a new service request, the basic service firstly analyzes interface adaptation and authenticates the interface adaptation, encrypts service information (identification management) through an identification algorithm, and transmits the encrypted service information to a shared account book (network communication) in a complete and consistent manner, and records and stores the service information; the intelligent contract module is responsible for registering and issuing contracts, triggering contracts and executing contracts, a developer can define contract logic through a certain programming language, issue the contract logic to a blockchain (contract registering), invoke keys or other event triggering execution according to the logic of contract clauses to complete the contract logic, and simultaneously provide a function of registering contract upgrading; the operation monitoring module is mainly responsible for deployment in the product release process, modification of configuration, contract setting, cloud adaptation and visual output of real-time states in product operation, for example: alarms, monitoring network conditions, monitoring node device health status, etc.
The invention provides a behavior-based target object data analysis method. Referring to fig. 1, a flow chart of a behavior-based object data analysis method according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the behavior-based target object data analysis method includes:
s1, acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprise behavior data of the user relative to the target object.
In the embodiment of the invention, the basic data comprises behavior data of the user, wherein the behavior data of the user comprises behavior data of the user relative to the target object, namely the behavior data comprise behavior data generated by the user on the target object, such as click data of the user, message feedback data of the user (such as voting of the user on products and/or services in a certain price interval) and the like.
Further, in other optional embodiments of the present invention, the basic information of the user may further include, but is not limited to, identity information of the user (such as a user name and a user age), consumption information of the user, and the like.
In the embodiment of the invention, the target object refers to a certain product or service, and the characteristic data of the target object refers to data of a certain product and/or service, such as price, term, quality and the like of the product and/or service.
In another alternative embodiment of the present invention, the user's basic data includes identity data, price inquiry data, and pay data, and the object's characteristic data includes vehicle data and policy data.
In one preferred embodiment of the present invention, the base data and/or the feature data may be stored in a blockchain node.
Specifically, the present invention may call the basic information of the user and the feature data of the target object from a database node for one or more blockchains using a java sentence edited in advance, where the database node is used to store the basic information of a plurality of users and the feature data of a plurality of target objects, so in this embodiment, the present data of a plurality of users and the feature data of a plurality of target objects may be obtained.
Further, in another optional embodiment of the present invention, after the basic information of the user is obtained, each item of data in the basic information is generated into a user portrait, and the user portrait is displayed in a visual form, for example, the basic information of the user is displayed in an Excel form and/or a proportion diagram.
By generating the user portrait based on the basic information of the user, the feature information of different users can be displayed more intuitively.
S2, performing word segmentation on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set.
Preferably, the embodiment of the invention uses the THULAC word segmentation tool to perform word segmentation processing on the basic data and/or the characteristic data to obtain word segmentation result data, namely a word segmentation information set, and uses a single-hot encoding technology to encode each word in the word segmentation information set so as to convert the basic data and/or the characteristic data information into a word vector set.
The specific method for encoding by the single-hot encoding technology is to use an N-bit state register to encode N states in the segmentation result data, wherein each state is formed by independent register bits, and at any time, only one bit is valid, namely, only one bit is 1, and the rest is zero.
The word vector containing less data can be obtained by cutting the data in the behavior data of the user through word segmentation, so that the occupation of calculation resources can be conveniently calculated in the identification of a subsequent computer, the behavior data of the user can be converted into a computer identifiable language through coding processing, the computer can conveniently and quickly identify the content of the behavior data, and the rapid data analysis can be facilitated.
S3, calculating characteristic values of the word vectors in the word vector set.
In detail, the calculating the eigenvalue of the word vector in the word vector set includes:
performing word vector sampling for a plurality of times on the word vector set to obtain a plurality of training sets containing word vectors, wherein the sampling is replaced random sampling;
respectively classifying the training sets containing the word vectors to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropy contained in the feature vector sets in the plurality of classification results, and selecting a classification result corresponding to the feature vector set with the information entropy larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
In this embodiment, one or more word vectors may be sampled each time, preferably at least two word vectors each time, as multiple word vector samples are performed.
After the word vector is sampled for a plurality of times, a plurality of training sets containing the word vector are obtained, specifically, the word vector sampled each time forms a training set, and then a plurality of training sets are obtained through a plurality of times of sampling.
In this embodiment, when classifying a plurality of training sets including word vectors, the word vectors in each training set are specifically classified, that is, whether the word vectors in each training set are feature vectors or non-feature vectors is determined, so that the word vectors in each training set are classified to obtain a feature vector set and/or a non-feature vector set. Specifically, the embodiment of the invention classifies the word vectors through a convolutional neural network with a feature judgment function which is trained in advance.
And obtaining a plurality of feature vector sets and/or non-feature vector sets for the plurality of classification results, namely, each classification result has a corresponding feature vector set and/or non-feature vector set.
In calculating the feature vector sets in the plurality of classification results, the information entropy contained in each feature vector set may be calculated. Specifically, the embodiment of the invention calculates the information entropy H (Y, X) contained in the feature vector set in the plurality of classification results by using the following information entropy algorithm, wherein the information entropy algorithm is as follows:
wherein X is the feature vector set, Y is the classification result corresponding to the feature vector set, and X is i For the ith feature vector in the feature vector set, k is the number of feature vectors in the feature vector set, Is the frequency with which the ith feature vector appears in the feature vector set.
Preferably, the present invention implements the calculation of the first eigenvalue of the eigenvector by using the following first eigenvalue algorithm:
wherein n is the number of the multiple classification results, k is any feature vector in the mth classification result, k' is any vector different from k in the mth classification result, and p mk p mk′ Representing probability of two randomly extracted vector categories being different from mth classification result, GI m Is the first eigenvalue of eigenvector k.
The embodiment of the invention calculates the second eigenvalue of the eigenvector by using the following second eigenvalue algorithm:
wherein ,the feature vector j in the feature vector set M of any classification result, M is the set of feature vector sets in all classification results, M is the feature vector set in any classification result,/and M is the set of feature vector sets in any classification result>Is the second eigenvalue of eigenvector j.
S4, selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set.
Specifically, the selecting an optimization vector from the results of the eigenvalue calculation to obtain an optimization vector set includes:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain total characteristic values of all characteristic vectors in the different characteristic vector sets;
Sorting all the feature vectors in the different feature vector sets according to the total feature value to obtain a vector sequence;
and sequentially selecting a plurality of eigenvectors in the vector sequence, and collecting the eigenvectors into the optimized vector set.
Specifically, since each feature vector has the first feature value and the second feature value, when the first feature value and the second feature value are respectively and correspondingly added, the first feature value and the second feature value of each feature vector are added to obtain the total feature value of each feature vector, so as to obtain the total feature values of the feature vectors in different feature vector sets.
In this embodiment, when selecting the feature vectors, a plurality of feature vectors in the vector sequence may be selected sequentially according to the order from large to small. For example, the first 100 feature vectors are selected from the vector sequence, and the selected 100 feature vectors are assembled into an optimized feature vector.
Since the basic data and the characteristic data possibly contain a large amount of useless information, the word vector set obtained after encoding also contains a large amount of useless word vectors, and therefore, the occupation of computing resources in the subsequent analysis process can be reduced and the analysis efficiency can be improved by selecting the optimized vector.
S5, constructing an initial target object analysis model, and training the initial target object analysis model by using the optimized vector set to obtain a standard target object analysis model.
In this embodiment, by constructing an initial target object analysis model and training the initial target object analysis model by using the optimized vector set, a more accurate analysis model can be obtained.
Referring to fig. 2, fig. 2 is a flow chart of a standard target analysis model obtained by training in an embodiment of the invention.
In detail, the S5 includes:
s51, carrying out the random sampling with the substitution for the preset times on the optimized vector set to obtain a plurality of training sets.
In the embodiment of the present invention, the preset number of times is a preset number of times, for example, the preset number of times is 8 times. The Bagging method can be adopted to carry out put-back random sampling, the result of each sampling is a training set, and a plurality of training sets can be obtained through a plurality of random sampling.
S52, generating a plurality of decision trees corresponding to the training sets by using the training sets.
In detail, when generating the sampled training set into a decision tree, a training set may generate a corresponding decision tree.
In detail, decision trees may be randomly sampled in an optimized vector set and generated based on a training set of samples using a decision function.
And S53, aggregating the plurality of decision trees into the initial target object analysis model by using an aggregation algorithm.
Preferably, in the embodiment of the present invention, aggregating the plurality of decision trees into the initial target object analysis model by using an aggregation algorithm includes:
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm:
wherein F represents a set of the plurality of decision trees, F k Represents the kth decision tree in the plurality of decision trees, K represents the total tree of the plurality of decision trees,a model is analyzed for the initial target.
S54, training the initial target object analysis model by using the optimized vector set to obtain a training target object analysis model.
Specifically, the embodiment of the invention trains the initial target object analysis model by using the following objective function:
wherein Obj is the objective function value, y i Optimizing the tag values contained in the vectors for the set of optimization vectors,for the output of the initial target analysis model, K represents the total tree of the decision tree, f k Represents the kth decision tree, beta (f k ) Is a preset regularization term.
And S55, performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
Preferably, the invention adopts a model parameter tuning method to perform parameter tuning on the training target object analysis model.
Specifically, the model parameter tuning method includes, but is not limited to: a general parameter tuning method, a boost parameter tuning method and a learning target parameter tuning method.
S6, acquiring basic data of a user to be analyzed and characteristic data of an object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the object to be analyzed by utilizing the standard object analysis model to obtain an analysis result.
In this embodiment, the basic data of the user to be analyzed and the feature data of the object to be analyzed may be obtained from the object database, or may be obtained from a business system such as a sales system.
Specifically, the number of the users to be analyzed may be one or more, and the number of the target objects to be analyzed may be one or more.
Further, in another optional embodiment of the present invention, the method of the present invention further includes: after the analysis result is obtained, carrying out mathematical statistics on the analysis result, wherein the method comprises the following steps:
Carrying out statistical calculation on the analysis result to obtain a statistical result;
and displaying the statistical result in a visual form.
Wherein the statistical calculations include, but are not limited to: and calculating the average value of the analysis results, calculating the variance of the analysis results, and calculating the standard deviation of the analysis results.
When the statistics are presented in a visual form, the statistics may be presented in a histogram and/or pie chart form.
And S7, adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
For example, when the target object is data of a certain product and/or service, the analysis result shows that the user prefers a certain price interval, the price of the target object can be adjusted according to the analysis result, and specifically, the price of the target object can be adjusted to be within the price interval shown by the analysis result. Or, if the analysis result is that the user prefers to accept a certain service in a certain time interval, the service providing time can be adjusted according to the analysis result, and specifically, the service providing time can be adjusted to be within the time interval shown by the analysis result.
In the embodiment of the invention, after the basic data of the user and the characteristic data of the target object are acquired, analysis and coding processing are carried out to obtain a word vector set; further obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. The characteristic data of the target object to be analyzed is adjusted based on the analysis result by establishing a model and analyzing the model, so that the efficiency of adjusting the data of the target object is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the characteristic data of the target object, the degree of fit between the result of model analysis and different users is improved, the model obtained through training can be subjected to accurate individual analysis, further the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved.
FIG. 3 is a schematic block diagram of a behavior-based object data analysis device according to the present invention.
The behavior-based object data analysis apparatus 100 of the present invention may be installed in an electronic device. Depending on the implemented functionality, the behavior-based object data analysis device may include a data acquisition module 101, a data word segmentation module 102, a feature value calculation module 103, a vector screening module 104, a model training module 105, a data analysis module 106, and a data adjustment module 107. The module of the present invention may also be referred to as a unit, meaning a series of computer program segments capable of being executed by the processor of the electronic device and of performing fixed functions, stored in the memory of the electronic device.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the data acquisition module 101 is configured to acquire basic data of a user and feature data of a target object, where the basic data includes behavior data of the user relative to the target object;
the data word segmentation module 102 is configured to perform word segmentation on the basic data and/or the feature data, and encode word segmentation result data after word segmentation to obtain a word vector set;
The feature value calculating module 103 is configured to perform feature value calculation on the word vectors in the word vector set;
the vector screening module 104 is configured to select an optimization vector from the result of the feature value calculation, so as to obtain an optimization vector set;
the model training module 105 is configured to construct an initial target object analysis model, and train the initial target object analysis model by using the optimized vector set to obtain a standard target object analysis model;
the data analysis module 106 is configured to obtain basic data of a user to be analyzed and feature data of an object to be analyzed corresponding to the user to be analyzed, and analyze the basic data of the user to be analyzed and the feature data of the object to be analyzed by using the standard object analysis model to obtain an analysis result;
the data adjustment module 107 is configured to adjust feature data of an object to be analyzed corresponding to the user to be analyzed according to the analysis result.
In detail, the specific implementation modes of each module of the behavior-based object data analysis device are as follows:
the data acquisition module 101 acquires basic data of a user and feature data of a target object, wherein the basic data includes behavior data of the user relative to the target object.
In the embodiment of the invention, the basic data comprises behavior data of the user, wherein the behavior data of the user comprises behavior data of the user relative to the target object, namely the behavior data comprise behavior data generated by the user on the target object, such as click data of the user, message feedback data of the user (such as voting of the user on products and/or services in a certain price interval) and the like.
Further, in other optional embodiments of the present invention, the basic information of the user may further include, but is not limited to, identity information of the user (such as a user name and a user age), consumption information of the user, and the like.
In the embodiment of the invention, the target object refers to a certain product or service, and the characteristic data of the target object refers to data of a certain product and/or service, such as price, term, quality and the like of the product and/or service.
In another alternative embodiment of the present invention, the user's basic data includes identity data, price inquiry data, and pay data, and the object's characteristic data includes vehicle data and policy data.
In one preferred embodiment of the present invention, the base data and/or the feature data may be stored in a blockchain node. Specifically, the present invention may call the basic information of the user and the feature data of the target object from a database node for one or more blockchains using a java sentence edited in advance, where the database node is used to store the basic information of a plurality of users and the feature data of a plurality of target objects, so in this embodiment, the present data of a plurality of users and the feature data of a plurality of target objects may be obtained.
Further, in another optional embodiment of the present invention, after the basic information of the user is obtained, each item of data in the basic information is generated into a user portrait, and the user portrait is displayed in a visual form, for example, the basic information of the user is displayed in an Excel form and/or a proportion diagram.
By generating the user portrait based on the basic information of the user, the feature information of different users can be displayed more intuitively.
The data word segmentation module 102 is configured to perform word segmentation on the basic data and/or the feature data, and encode word segmentation result data after word segmentation to obtain a word vector set.
Preferably, the embodiment of the invention uses the THULAC word segmentation tool to perform word segmentation processing on the basic data and/or the characteristic data to obtain word segmentation result data, namely a word segmentation information set, and uses a single-hot encoding technology to encode each word in the word segmentation information set so as to convert the basic data and/or the characteristic data information into a word vector set.
The specific method for encoding by the single-hot encoding technology is to use an N-bit state register to encode N states in the segmentation result data, wherein each state is formed by independent register bits, and at any time, only one bit is valid, namely, only one bit is 1, and the rest is zero.
The word vector containing less data can be obtained by cutting the data in the behavior data of the user through word segmentation, so that the occupation of calculation resources can be conveniently calculated in the identification of a subsequent computer, the behavior data of the user can be converted into a computer identifiable language through coding processing, the computer can conveniently and quickly identify the content of the behavior data, and the rapid data analysis can be facilitated.
The feature value calculating module 103 is configured to perform feature value calculation on the word vectors in the word vector set.
In detail, the feature value calculating module 103 is specifically configured to:
performing word vector sampling for a plurality of times on the word vector set to obtain a plurality of training sets containing word vectors, wherein the sampling is replaced random sampling;
respectively classifying the training sets containing the word vectors to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropy contained in the feature vector sets in the plurality of classification results, and selecting a classification result corresponding to the feature vector set with the information entropy larger than a preset entropy threshold value to obtain a classification result set;
and calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
In this embodiment, one or more word vectors may be sampled each time, preferably at least two word vectors each time, as multiple word vector samples are performed.
After the word vector is sampled for a plurality of times, a plurality of training sets containing the word vector are obtained, specifically, the word vector sampled each time forms a training set, and then a plurality of training sets are obtained through a plurality of times of sampling.
In this embodiment, when classifying a plurality of training sets including word vectors, the word vectors in each training set are specifically classified, that is, whether the word vectors in each training set are feature vectors or non-feature vectors is determined, so that the word vectors in each training set are classified to obtain a feature vector set and/or a non-feature vector set. Specifically, the embodiment of the invention classifies the word vectors through a convolutional neural network with a feature judgment function which is trained in advance.
And obtaining a plurality of feature vector sets and/or non-feature vector sets for the plurality of classification results, namely, each classification result has a corresponding feature vector set and/or non-feature vector set.
In calculating the feature vector sets in the plurality of classification results, the information entropy contained in each feature vector set may be calculated. Specifically, the embodiment of the invention calculates the information entropy H (Y, X) contained in the feature vector set in the plurality of classification results by using the following information entropy algorithm, wherein the information entropy algorithm is as follows:
Wherein X is the feature vector set, Y is the classification result corresponding to the feature vector set, and X is i For the ith feature vector in the feature vector set, k is the number of feature vectors in the feature vector set,is the frequency with which the ith feature vector appears in the feature vector set.
Preferably, the present invention implements the calculation of the first eigenvalue of the eigenvector by using the following first eigenvalue algorithm:
wherein n is the number of the multiple classification results, k is any vector in the mth classification result, k' is any vector different from k in the mth classification result, and p mk p mk′ Representing probability of two randomly extracted vector categories being different from mth classification result, GI m Is the first eigenvalue of eigenvector k.
The embodiment of the invention calculates the second eigenvalue of the eigenvector by using the following second eigenvalue algorithm:
wherein ,is any one ofThe feature vector j in the feature vector set M of the classification result, M is the set of feature vector sets in all classification result sets, M is the feature vector set in any classification result, and->Is the second eigenvalue of eigenvector j.
The vector filtering module 104 is configured to select an optimization vector from the result of the feature value calculation, so as to obtain an optimization vector set.
Specifically, the vector screening module 104 is specifically configured to:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain total characteristic values of all characteristic vectors in the different characteristic vector sets;
sorting all the feature vectors in the different feature vector sets according to the total feature value to obtain a vector sequence;
and sequentially selecting a plurality of eigenvectors in the vector sequence, and collecting the eigenvectors into the optimized vector set.
Specifically, since each feature vector has the first feature value and the second feature value, when the first feature value and the second feature value are respectively and correspondingly added, the first feature value and the second feature value of each feature vector are added to obtain the total feature value of each feature vector, so as to obtain the total feature values of the feature vectors in different feature vector sets.
In this embodiment, when selecting the feature vectors, a plurality of feature vectors in the vector sequence may be selected sequentially according to the order from large to small. For example, the first 100 feature vectors are selected from the vector sequence, and the selected 100 feature vectors are assembled into an optimized feature vector.
Since the basic data and the characteristic data possibly contain a large amount of useless information, the word vector set obtained after encoding also contains a large amount of useless word vectors, and therefore, the occupation of computing resources in the subsequent analysis process can be reduced and the analysis efficiency can be improved by selecting the optimized vector.
The model training module 105 is configured to construct an initial target object analysis model, and train the initial target object analysis model by using the optimized vector set to obtain a standard target object analysis model.
In this embodiment, by constructing an initial target object analysis model and training the initial target object analysis model by using the optimized vector set, a more accurate analysis model can be obtained.
Further, the model training module 105 is specifically configured to:
carrying out random sampling with put back on the optimized vector set for preset times to obtain a plurality of training sets;
generating a plurality of decision trees corresponding to the training sets by utilizing the training sets;
utilizing an aggregation algorithm to aggregate the plurality of decision trees into the initial target object analysis model;
training the initial target object analysis model by using the optimized vector set to obtain a training target object analysis model;
and performing parameter tuning on the training target object analysis model to obtain the standard target object analysis model.
In the embodiment of the present invention, the preset number of times is a preset number of times, for example, the preset number of times is 8 times. The Bagging method can be adopted to carry out put-back random sampling, the result of each sampling is a training set, and a plurality of training sets can be obtained through a plurality of random sampling.
In detail, when generating the sampled training set into a decision tree, a training set may generate a corresponding decision tree.
In detail, decision trees may be randomly sampled in an optimized vector set and generated based on a training set of samples using a decision function.
Preferably, in the embodiment of the present invention, aggregating the plurality of decision trees into the initial target object analysis model by using an aggregation algorithm includes:
aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm:
wherein F represents a set of the plurality of decision trees, F k Represents the kth decision tree in the plurality of decision trees, K represents the total tree of the plurality of decision trees,a model is analyzed for the initial target.
Specifically, the embodiment of the invention trains the initial target object analysis model by using the following objective function:
wherein Obj is the objective function value, y i Optimizing the tag values contained in the vectors for the set of optimization vectors,for the output of the initial target analysis model, K represents the total tree of the decision tree, f k Represents the kth decision tree, beta (f k ) Is a preset regularization term.
The data analysis module 106 is configured to obtain basic data of a user to be analyzed and feature data of an object to be analyzed corresponding to the user to be analyzed, and analyze the basic data of the user to be analyzed and the feature data of the object to be analyzed by using the standard object analysis model to obtain an analysis result.
In this embodiment, the basic data of the user to be analyzed and the feature data of the object to be analyzed may be obtained from the object database, or may be obtained from a business system such as a sales system.
Specifically, the number of the users to be analyzed may be one or more, and the number of the target objects to be analyzed may be one or more.
Further, in another optional embodiment of the present invention, the apparatus of the present invention further includes a statistics module, where the statistics module is configured to: after the analysis result is obtained, the analysis result is subjected to mathematical statistics.
The statistics module is specifically configured to: after the analysis result is obtained, carrying out statistical calculation on the analysis result to obtain a statistical result, and displaying the statistical result in a visual form.
Wherein the statistical calculations include, but are not limited to: and calculating the average value of the analysis results, calculating the variance of the analysis results, and calculating the standard deviation of the analysis results.
When the statistics are presented in a visual form, the statistics may be presented in a histogram and/or pie chart form.
The data adjustment module 107 is configured to adjust feature data of an object to be analyzed corresponding to the user to be analyzed according to the analysis result.
For example, when the target object is data of a certain product and/or service, the analysis result shows that the user prefers a certain price interval, the price of the target object can be adjusted according to the analysis result, and specifically, the price of the target object can be adjusted to be within the price interval shown by the analysis result. Or, if the analysis result is that the user prefers to accept a certain service in a certain time interval, the service providing time can be adjusted according to the analysis result, and specifically, the service providing time can be adjusted to be within the time interval shown by the analysis result.
In the embodiment of the invention, after the basic data of the user and the characteristic data of the target object are acquired, analysis and coding processing are carried out to obtain a word vector set; further obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. The characteristic data of the target object to be analyzed is adjusted based on the analysis result by establishing a model and analyzing the model, so that the efficiency of adjusting the data of the target object is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the characteristic data of the target object, the degree of fit between the result of model analysis and different users is improved, the model obtained through training can be subjected to accurate individual analysis, further the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved.
Fig. 4 is a schematic structural diagram of an electronic device for implementing the behavior-based object data analysis method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as a behavior-based object data analysis program 12.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the behavior-based object data analysis program 12, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, executes or executes programs or modules (e.g., a behavior-based object data analysis program, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
In the embodiment of the invention, after the basic data of the user and the characteristic data of the target object are acquired, analysis and coding processing are carried out to obtain a word vector set; further obtaining an optimized vector set based on the word vector set, and constructing an initial target object analysis model; training an initial target object analysis model by using the obtained optimized vector set to obtain a standard target object analysis model; analyzing the basic data of the user to be analyzed and the characteristic data of the target object to be analyzed by using a standard target object analysis model to obtain an analysis result; and adjusting the characteristic data of the target object to be analyzed corresponding to the user according to the analysis result. The characteristic data of the target object to be analyzed is adjusted based on the analysis result by establishing a model and analyzing the model, so that the efficiency of adjusting the data of the target object is improved; meanwhile, a model is built based on the behavior data of the user, the built model is trained according to the behavior data of the user and the characteristic data of the target object, the degree of fit between the result of model analysis and different users is improved, the model obtained through training can be subjected to accurate individual analysis, further the target object data is adjusted through the model, and the accuracy of adjusting the target object data is improved.
Fig. 4 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (organic light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The behavior-based object data analysis program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprise behavior data of the user relative to the target object;
performing word segmentation on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set;
Calculating characteristic values of word vectors in the word vector set;
selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set;
constructing an initial target object analysis model, and training the initial target object analysis model by utilizing the optimized vector set to obtain a standard target object analysis model;
basic data of a user to be analyzed and characteristic data of an object to be analyzed corresponding to the user to be analyzed are obtained, and the basic data of the user to be analyzed and the characteristic data of the object to be analyzed are analyzed by utilizing the standard object analysis model, so that an analysis result is obtained;
and adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any accompanying diagram representation in the claims should not be considered as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (6)

1. A method of behavioral-based object data analysis, the method comprising:
acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprise behavior data of the user relative to the target object;
performing word segmentation on the basic data and/or the characteristic data, and encoding word segmentation result data after word segmentation to obtain a word vector set;
calculating characteristic values of word vectors in the word vector set;
Selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set;
constructing an initial target object analysis model, and training the initial target object analysis model by utilizing the optimized vector set to obtain a standard target object analysis model;
basic data of a user to be analyzed and characteristic data of an object to be analyzed corresponding to the user to be analyzed are obtained, and the basic data of the user to be analyzed and the characteristic data of the object to be analyzed are analyzed by utilizing the standard object analysis model, so that an analysis result is obtained;
according to the analysis result, characteristic data of the target object to be analyzed corresponding to the user to be analyzed are adjusted;
the constructing an initial target object analysis model, training the initial target object analysis model by using the optimized vector set to obtain a standard target object analysis model, and the method comprises the following steps: carrying out random sampling with put back on the optimized vector set for preset times to obtain a plurality of training sets; generating a plurality of decision trees corresponding to the training sets by utilizing the training sets; utilizing an aggregation algorithm to aggregate the plurality of decision trees into the initial target object analysis model; training the initial target object analysis model by using the optimized vector set to obtain a training target object analysis model; parameter tuning is carried out on the training target object analysis model to obtain the standard target object analysis model;
The aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm comprises: aggregating the plurality of decision trees into the initial target analysis model using an aggregation algorithm:
wherein ,to be in training setiThe number of feature vectors is chosen to be the same,Frepresenting a set of the plurality of decision trees, and (2)>Representing the first of the plurality of decision treeskThe decision tree is used for the decision tree,Krepresenting the total tree of said plurality of decision trees,/a>A model is analyzed for the initial target.
2. The behavior-based object data analysis method of claim 1, wherein the eigenvalue calculation of word vectors in the word vector set comprises:
performing word vector sampling for a plurality of times on the word vector set to obtain a plurality of training sets containing word vectors, wherein the sampling is replaced random sampling;
respectively classifying the training sets containing the word vectors to obtain a plurality of classification results, wherein the classification results contain a characteristic vector set and/or a non-characteristic vector set;
calculating information entropy contained in the feature vector sets in the plurality of classification results, and selecting a classification result corresponding to the feature vector set with the information entropy larger than a preset entropy threshold value to obtain a classification result set;
And calculating a first characteristic value and a second characteristic value of each characteristic vector in different characteristic vector sets contained in the classification result set.
3. The behavior-based object data analysis method according to claim 2, wherein selecting an optimization vector from the results of the eigenvalue calculation to obtain an optimization vector set comprises:
correspondingly adding the first characteristic value and the second characteristic value respectively to obtain total characteristic values of all characteristic vectors in the different characteristic vector sets;
sorting all the feature vectors in the different feature vector sets according to the total feature value to obtain a vector sequence;
and sequentially selecting a plurality of eigenvectors in the vector sequence, and collecting the eigenvectors into the optimized vector set.
4. A behavior-based object data analysis apparatus for implementing the object data analysis method according to any one of claims 1 to 3, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring basic data of a user and characteristic data of a target object, wherein the basic data comprise behavior data of the user relative to the target object;
the data word segmentation module is used for carrying out word segmentation processing on the basic data and/or the characteristic data and encoding word segmentation result data after word segmentation to obtain a word vector set;
The characteristic value calculation module is used for calculating characteristic values of the word vectors in the word vector set;
the vector screening module is used for selecting an optimization vector from the result of the eigenvalue calculation to obtain an optimization vector set;
the model training module is used for constructing an initial target object analysis model, and training the initial target object analysis model by utilizing the optimized vector set to obtain a standard target object analysis model;
the data analysis module is used for acquiring basic data of a user to be analyzed and characteristic data of an object to be analyzed corresponding to the user to be analyzed, and analyzing the basic data of the user to be analyzed and the characteristic data of the object to be analyzed by utilizing the standard object analysis model to obtain an analysis result;
and the data adjustment module is used for adjusting the characteristic data of the target object to be analyzed corresponding to the user to be analyzed according to the analysis result.
5. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the behavior-based object data analysis method of any one of claims 1 to 3.
6. A computer-readable storage medium comprising a storage data area storing data created according to use of blockchain nodes and a storage program area storing a computer program; wherein the computer program, when executed by a processor, implements the object data analysis method as claimed in any one of claims 1 to 3.
CN202010370884.5A 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium Active CN111652280B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010370884.5A CN111652280B (en) 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010370884.5A CN111652280B (en) 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111652280A CN111652280A (en) 2020-09-11
CN111652280B true CN111652280B (en) 2023-10-27

Family

ID=72351971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010370884.5A Active CN111652280B (en) 2020-04-30 2020-04-30 Behavior-based target object data analysis method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111652280B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184241B (en) * 2020-09-27 2024-02-20 中国银联股份有限公司 Identity authentication method and device
CN112988893B (en) * 2021-03-15 2023-05-12 中国联合网络通信集团有限公司 Information management method, system, block chain node and medium based on block chain
CN113240036B (en) * 2021-05-28 2023-11-07 北京达佳互联信息技术有限公司 Object classification method and device, electronic equipment and storage medium
CN113505280B (en) * 2021-07-28 2023-08-22 全知科技(杭州)有限责任公司 Sensitive key information identification and extraction technology for general scene
CN113656559B (en) * 2021-10-18 2022-01-25 印象(山东)大数据有限公司 Data analysis method and device based on metering platform and electronic equipment
CN114064440A (en) * 2022-01-18 2022-02-18 恒生电子股份有限公司 Training method of credibility analysis model, credibility analysis method and related device
CN114844788B (en) * 2022-04-25 2023-10-31 中国电信股份有限公司 Network data analysis method, system, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110033383A (en) * 2019-02-18 2019-07-19 阿里巴巴集团控股有限公司 A kind of data processing method, equipment, medium and device
CN110827069A (en) * 2019-10-28 2020-02-21 阿里巴巴(中国)有限公司 Data processing method, device, medium, and electronic apparatus
CN110910199A (en) * 2019-10-16 2020-03-24 中国平安人寿保险股份有限公司 Item information sorting method and device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515814B2 (en) * 2008-11-11 2013-08-20 Combinenet, Inc. Automated channel abstraction for advertising auctions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110033383A (en) * 2019-02-18 2019-07-19 阿里巴巴集团控股有限公司 A kind of data processing method, equipment, medium and device
CN110910199A (en) * 2019-10-16 2020-03-24 中国平安人寿保险股份有限公司 Item information sorting method and device, computer equipment and storage medium
CN110827069A (en) * 2019-10-28 2020-02-21 阿里巴巴(中国)有限公司 Data processing method, device, medium, and electronic apparatus

Also Published As

Publication number Publication date
CN111652280A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN111652280B (en) Behavior-based target object data analysis method, device and storage medium
WO2021139252A1 (en) Operation and maintenance fault root cause identification method and apparatus, computer device, and storage medium
CN112541745B (en) User behavior data analysis method and device, electronic equipment and readable storage medium
CN109993233B (en) Method and system for predicting data auditing objective based on machine learning
CN113688923B (en) Order abnormity intelligent detection method and device, electronic equipment and storage medium
CN112306835B (en) User data monitoring and analyzing method, device, equipment and medium
CN111950622B (en) Behavior prediction method, device, terminal and storage medium based on artificial intelligence
CN111368926B (en) Image screening method, device and computer readable storage medium
WO2021151291A1 (en) Disease risk analysis method, apparatus, electronic device, and computer storage medium
CN114398557B (en) Information recommendation method and device based on double images, electronic equipment and storage medium
CN113626606B (en) Information classification method, device, electronic equipment and readable storage medium
CN113762973A (en) Data processing method and device, computer readable medium and electronic equipment
CN113628043B (en) Complaint validity judging method, device, equipment and medium based on data classification
CN117155771B (en) Equipment cluster fault tracing method and device based on industrial Internet of things
CN114139931A (en) Enterprise data evaluation method and device, computer equipment and storage medium
CN113010659A (en) Questionnaire sample processing method and device
CN116843395A (en) Alarm classification method, device, equipment and storage medium of service system
CN116401606A (en) Fraud identification method, device, equipment and medium
CN112580505B (en) Method and device for identifying network point switch door state, electronic equipment and storage medium
CN113706207B (en) Order success rate analysis method, device, equipment and medium based on semantic analysis
CN111651652B (en) Emotion tendency identification method, device, equipment and medium based on artificial intelligence
CN113657546A (en) Information classification method and device, electronic equipment and readable storage medium
CN113344415A (en) Deep neural network-based service distribution method, device, equipment and medium
CN111583215A (en) Intelligent damage assessment method and device for damage image, electronic equipment and storage medium
CN113723554B (en) Model scheduling method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant