CN110264311A - A kind of business promotion accurate information recommended method and system based on deep learning - Google Patents
A kind of business promotion accurate information recommended method and system based on deep learning Download PDFInfo
- Publication number
- CN110264311A CN110264311A CN201910461767.7A CN201910461767A CN110264311A CN 110264311 A CN110264311 A CN 110264311A CN 201910461767 A CN201910461767 A CN 201910461767A CN 110264311 A CN110264311 A CN 110264311A
- Authority
- CN
- China
- Prior art keywords
- data
- business promotion
- neural network
- information
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a kind of business promotion accurate information recommended method and system based on deep learning, it assembles for training to practice LSTM neural network and test the accuracy of LSTM neural network by test sample data set by training sample data and obtains trained LSTM neural network as recommender system, obtain the classifier of the recommender system of perfect in shape and function, shorten the time required for developing client, improves the precision of discovery client.Disclosed method and system are a set of molding, efficient, facing area industry recommender systems;In each region, there are many zonal concentrations of enterprises to develop client under same panel region, line and have been approached saturation.And potential customers are excavated under line and need certain manpower and material resources cost, and success rate is not high.There is an urgent need to a set of molding systems for this kind of enterprise to guide, and save cost required for developing client.
Description
Technical field
This disclosure relates to machine learning proposed algorithm and depth learning technology field, and in particular to one kind is based on deep learning
Business promotion accurate information recommended method and system.
Background technique
Conventional recommendation systems are generally the recommendation between commodity and customer, for the different commodity of different customer recommendations.And
Traditional proposed algorithm generally has: content-based recommendation (Content Based, CB), collaborative filtering (Collaborative
Filtering CF), mixed recommendation method etc..And this patent casts aside conventional recommendation mode, realizes upstream firm and down-stream enterprise
Between recommendation, using solving relationship complicated between upstream and downstream firms based on the proposed algorithm of deep learning.
There is following two aspect in current conventional recommendation systems:
1) research field of domestic proposed algorithm at this stage is concentrated mainly on the accurate recommendation to commodity, for lead referral
Research it is also fewer.And the algorithm of studies in China recommender system generally uses machine learning algorithm, but for now increasingly multiple
The data relationship of miscellaneous business connection and diversification, learning efficiency possessed by machine learning are unable to meet demand gradually.
2) it is detached from the frame of traditional data mining, data are automatically grabbed and screened by webpage, reduce the cost for collecting data
Or solve the predicament for lacking data
Summary of the invention
To solve the above problems, the disclosure provide a kind of business promotion accurate information recommended method based on deep learning and
The technical solution of system is assembled for training by training sample data and practices LSTM neural network and test LSTM by test sample data set
Neural network accuracy obtains trained LSTM neural network as recommender system, obtains point of the recommender system of perfect in shape and function
Class device is based on deep learning and network technology, reduces cost required when enterprise development client, shortens and develops required for client
Time improves the precision of discovery client.
To achieve the goals above, according to the one side of the disclosure, a kind of business promotion letter based on deep learning is provided
Accurate recommendation method is ceased, the described method comprises the following steps:
Step 1, business promotion information data is acquired;
Step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains business promotion Information Number
According to collection;
Step 3, dimensionality reduction and feature selecting are carried out to business promotion message data set;
Step 4, the business promotion message data set that feature selecting obtains is divided into training sample data collection and test specimens
Notebook data collection;
Step 5, training sample data collection and test sample data set are obtained by word2vec model for trained word
Vector;
Step 6, it is assembled for training by training sample data and practices LSTM (Long Short-Term Memory) neural network and lead to
It crosses the test LSTM neural network accuracy of test sample data set and obtains trained LSTM neural network as recommender system.
Further, in step 1, the method for acquiring business promotion information data includes but is not limited to: acquiring open source
Data set website such as kaggle data set is as business promotion information data;By the web crawlers technology after secondary development to class
Taobao's hotel owner's webpage crosses crawler with city Transaction Information Netcom and carries out crawl data set, obtains business promotion information data;Benefit
It is backed up with the webpage of the txt format retained in Baidu's snapshot, information required for therefrom obtaining is as business promotion information data.
Further, in step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains quotient
The method of industry promotion message data set is, since the business promotion information data got is extremely huge, mixes or even useless, institute
To be pre-processed, pre-process as by being done before being classified to collected business promotion information data or being grouped
The necessary processing such as audit, screening, sequence, i.e. data audit integrality and accuracy, data screening, data sorting count
According to cleaning, data integration, data transformation, data regularization;Carrying out cleaning to the collected business promotion information data of institute is index
According to cleaning, last one of program of identifiable mistake in data file is found and corrects, including check data consistency, processing
Invalid value and missing values etc.;Business promotion information data using mathematical statistics, data mining or is made a reservation for by pretreatment and cleaning
The cleaning rule of justice converts dirty data to the data for meeting quality of data requirement, that is, the business promotion information data after cleaning
Collection.
Further, in step 3, the method for dimensionality reduction and feature selecting being carried out to business promotion message data set include but
It is not limited to principal component analytical method (Principal Component Analysis, PCA), independent component analysis
(Independent Component Analysis, ICA), linear discriminant analysis (Linear Discriminant
Analysis, LDA), be locally linear embedding into (Locally Linear Embedding, LLE), laplacian eigenmaps
(Laplacian Eigenmaps), multidimensional scale (MultiDimensional scaling, MDS), isometry maps (Equal
Metric Mapping) in any one dimension reduction method;And feature selecting uses changing based on individually optimal Method for Feature Selection
Good algorithm, the feature are the information such as location, business scope, the registration information of company that business promotion information data is concentrated warp
Vector after crossing text vector;The separability that individually optimal feature selecting algorithm calculates when each feature is used alone is sentenced
It according to value, is then ranked up from big to small according to Separability Criterion value, takes the preceding biggish feature conduct of 30 Separability Criterion values
Feature combination;But in conjunction with actual conditions, the phenomenon that collected business promotion information data is concentrated, and there are loss of learning, institute
When selecting feature, to need to consider single feature loss of learning degree, the improvement based on individually optimal Method for Feature Selection
Algorithm is following formula:Wherein, x (i)=(x (1), x (2), x (3) ..., x
(n)), x (i) represents ith feature, and n is characterized number, and J (X) indicates that the Separability Criterion of this feature set, N (x (i)) indicate
The data volume number of ith feature not being missing from, M indicate the total amount of data volume, and N (x (i))/M illustrates that ith feature exists
Missing degree in data, the phenomenon that improving loss of learning.
Further, in step 4, the business promotion message data set that feature selecting obtains is divided into number of training
Method according to collection and test sample data set includes: the method that reserves, cross-validation method, bootstrap any one method.
Reserving method is that business promotion message data set is directly divided into the set of two mutual exclusions, one of set conduct
Training sample data collection, the set left is as test sample data set.
Cross-validation method is that business promotion message data set is divided into an equal-sized exclusive subsets, i.e., each subset
The consistency for all keeping data distribution as far as possible, that is, pass therethrough stratified sampling and obtain, and then, is made every time with the union of subset
For training sample data collection, that remaining subset is as test sample data set.
Bootstrap is to carry out sampling generation to progress business promotion message data set: random therefrom business promotion information every time
Data set selects a sample, is copied portion and is put into training sample data concentration, the conduct test sample number remained unchanged
According to collection, above procedure time is repeated.Wherein, the data set for having part sample that can repeatedly appear in business promotion information data concentration is made
For training sample data collection, and another part does not appear in the data set of business promotion message data set as test sample number
According to collection.
Due to needing to export a kind of client for having the purchase specific enterprise service or product with like attribute.So examining
The attribute that data set may be made there are different numbers is considered, so LSTM is used to carry out training pattern as core network.LSTM is one
The special type of kind RNN (Recurrent Neural Network), can learn long-term Dependency Specification.LSTM passes through " door "
(gate) to control the specific gravity for abandoning information useless or improving advantageous information, while memory cell is added inside model
(cell), relevant information before remembering is collected, realizes that the function of forgeing or remember, this characteristic remembered that carries make the network
Have great advantage to the product is used for a long time, because also continuous study improves memory capability to network in use, improves product
Service life.
Further, in steps of 5, training sample data collection and test sample data set are obtained by word2vec model
For trained term vector method the following steps are included:
Step 5.1, it segments: due to Chinese particularity, sentence in business promotion information segment by segmenting library
To dictionary, segmenting library includes but is not limited to Jieba dictionary, IK dictionary, mmseg dictionary, word dictionary;
Step 5.2, count word frequency: the dictionary formed after segmenting in traversal step 5.1 counts the frequency of the word occurred
And it is numbered;
Step 5.3, it constructs tree-like result: according to there is the probability of occurrence of each word in step 5.2, constructing Huffman tree;
Step 5.4, it generates the binary code where node: the probability of occurrence of each word being converted into binary coding and carrys out table
Show each node in step 5.3 in Huffman tree;
Step 5.5, the term vector in the intermediate vector and leaf node of each non-leaf nodes is initialized: the Huffman
Each node in tree is all stored with the vector of an a length of m, but the meaning of leaf node and the vector in non-leaf node is not
Together, what is stored in leaf node is the term vector of each word, is the input as neural network;Rather than what is stored in leaf node is
Intermediate vector determines classification results with input corresponding to the parameter of hidden layer in neural network together;
Step 5.6, it training intermediate vector and term vector: uses CBOW (Continuous Bag-Of-Words Model)
The term vector of n-1 word near word A is added the input as system by model or Skip-Gram model, and according to word A
The binary code generated in step 5.4 successively carries out classification and trains intermediate vector and term vector according to classification results, finally
Obtain the correspondence term vector of business promotion information;Word A is word, and trained process mainly has input layer (input), mapping layer
(projection) and output layer (output) three phases;Input layer is n-1 word around some word A (word A)
Term vector.If n takes 5, word A (can be denoted as w (t)), and the first two and latter two word are w (t-2), w (t-1), w (t+
1),w(t+2).Corresponding, the term vector of that 4 words is denoted as v (w (t-2)), v (w (t-1)), v (w (t+1)), v (w (t+
2)).It is fairly simple from input layer to mapping layer, that n-1 term vector is added.
Further, in step 6, it is assembled for training by training sample data and practices LSTM neural network and by test sample number
It includes following for obtaining trained LSTM neural network as the method for recommender system according to collection test LSTM neural network accuracy
Step:
Step 6.1, using term vector training LSTM neural network: training sample data collection is passed through LSTM neural network
To forget door, starts the information discarding movement of LSTM neural network, information discarding movement is realized by the sigmoid layer in forgetting door,
It will check the input of sigmoid layers of previous output and current term vector, determine whether the information of Last status study retains,
The LSTM neural network includes input gate, forgets door and out gate;
Step 6.2, the training sample data collection after information being abandoned passes through the input gate of LSTM neural network, starts LSTM
The information update of neural network acts, and information update movement is realized by the sigmoid layer in input gate, then will change for tanh layers
The each cell state for becoming LSTM neural network, learns knowledge new out;
Step 6.3, the training sample data collection after information update is passed through to the out gate of LSTM neural network, exports one
Vector, this vector depend on the cell state in step 6.2;Firstly, sigmoid layers of operation obtains vector and determines cell state
Output par, c, cell state is handled, and it is multiplied with sigmoid output (vector) by tanh layers, is obtained
To the output information of LSTM network;
Step 6.4, it using LSTM network obtained in test data input step 6.3, is exported as a result, with test number
Label in compares, and verifies the accuracy of network, if accuracy reaches requirement, completes training and obtains trained LSTM mind
Through network, using trained LSTM neural network as recommender system.
The present invention also provides a kind of business promotion accurate information recommender system based on deep learning, the system packet
Include: memory, processor and storage are in the memory and the computer program that can run on the processor, described
Processor executes the computer program and operates in the unit of following system:
Dataset acquisition unit, for acquiring business promotion information data;
Data pre-processing unit, for carrying out pretreatment to the collected business promotion information data of institute and cleaning obtains quotient
Industry promotion message data set;
Feature selection unit, for carrying out dimensionality reduction and feature selecting to business promotion message data set;
Training sample division unit, the business promotion message data set for obtaining feature selecting are divided into training sample
Data set and test sample data set;
Vectorization unit is used for for training sample data collection and test sample data set by word2vec model
Trained term vector;
Recommender system obtaining unit, for practicing LSTM neural network by the training of training sample data and passing through test sample
Data set test LSTM neural network accuracy obtains trained LSTM neural network as recommender system.
The beneficial effect of the disclosure
The present invention provides a kind of business promotion accurate information recommended method and system based on deep learning:
1) the no molding online service system of excavation aspect of potential customers, the visitor based on social networks under line are directed to
Recommend to go to substantially ultimate attainment in family;
Although 2) there is enterprise study simultaneously development system for above situation, it is reported that, this simple system is not
The situation of various complexity in reality is adapted to, work effect is little.According to investigations, domestic basic not specifically for company or enterprise
The company for carrying out information recommendation, even without a set of molding, efficient, facing area industry recommender system;
3) in each region, have many zonal concentrations of enterprises develop under same panel region, line client have been approached it is full
With.And potential customers are excavated under line and need certain manpower and material resources cost, and success rate is not high.There is an urgent need to one for this kind of enterprise
The system of sleeve forming is guided, and cost required for developing client is saved;
4) it by achieving good effect in terms of natural language processing using LSTM, applies in language translator,
With very high accuracy rate and adaptive ability, the condition of various complexity in reality can be perfectly adapted to, this production is suitable as
The core network of product.
Detailed description of the invention
By the way that the embodiment in conjunction with shown by attached drawing is described in detail, above-mentioned and other features of the disclosure will
More obvious, identical reference label indicates the same or similar element in disclosure attached drawing, it should be apparent that, it is described below
Attached drawing be only some embodiments of the present disclosure, for those of ordinary skill in the art, do not making the creative labor
Under the premise of, it is also possible to obtain other drawings based on these drawings, in the accompanying drawings:
Fig. 1 show a kind of flow chart of business promotion accurate information recommended method based on deep learning;
Fig. 2 show a kind of business promotion accurate information recommender system figure based on deep learning.
Specific embodiment
It is carried out below with reference to technical effect of the embodiment and attached drawing to the design of the disclosure, specific structure and generation clear
Chu, complete description, to be completely understood by the purpose, scheme and effect of the disclosure.It should be noted that the case where not conflicting
Under, the features in the embodiments and the embodiments of the present application can be combined with each other.
As shown in Figure 1 for according to a kind of stream of business promotion accurate information recommended method based on deep learning of the disclosure
Cheng Tu illustrates a kind of business promotion information essence based on deep learning according to embodiment of the present disclosure below with reference to Fig. 1
Quasi- recommended method.
The disclosure proposes a kind of business promotion accurate information recommended method based on deep learning, specifically includes following step
It is rapid:
Step 1, business promotion information data is acquired;
Step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains business promotion Information Number
According to collection;
Step 3, dimensionality reduction and feature selecting are carried out to business promotion message data set;
Step 4, the business promotion message data set that feature selecting obtains is divided into training sample data collection and test specimens
Notebook data collection;
Step 5, training sample data collection and test sample data set are obtained by word2vec model for trained word
Vector;
Step 6, it is assembled for training by training sample data and practices LSTM (Long Short-Term Memory) neural network and lead to
It crosses the test LSTM neural network accuracy of test sample data set and obtains trained LSTM neural network as recommender system.
Further, in step 1, the method for acquiring business promotion information data includes but is not limited to: acquiring open source
Data set website such as kaggle data set is as business promotion information data;By the web crawlers technology after secondary development to class
Taobao's hotel owner's webpage crosses crawler with city Transaction Information Netcom and carries out crawl data set, obtains business promotion information data;Benefit
It is backed up with the webpage of the txt format retained in Baidu's snapshot, information required for therefrom obtaining is as business promotion information data.
Further, in step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains quotient
The method of industry promotion message data set is, since the business promotion information data got is extremely huge, mixes or even useless, institute
To be pre-processed, pre-process as by being done before being classified to collected business promotion information data or being grouped
The necessary processing such as audit, screening, sequence, i.e. data audit integrality and accuracy, data screening, data sorting count
According to cleaning, data integration, data transformation, data regularization;Carrying out cleaning to the collected business promotion information data of institute is index
According to cleaning, last one of program of identifiable mistake in data file is found and corrects, including check data consistency, processing
Invalid value and missing values etc.;Business promotion information data using mathematical statistics, data mining or is made a reservation for by pretreatment and cleaning
The cleaning rule of justice converts dirty data to the data for meeting quality of data requirement, that is, the business promotion information data after cleaning
Collection.
Further, in step 3, the method for dimensionality reduction and feature selecting being carried out to business promotion message data set include but
It is not limited to principal component analytical method (PCA), independent component analysis (ICA), linear discriminant analysis (LDA), is locally linear embedding into
(LLE), any one dimension reduction method during laplacian eigenmaps, multidimensional scale (MDS), isometry maps;And feature selecting
Using the modified algorithm based on individually optimal Method for Feature Selection, the feature is the company that business promotion information data is concentrated
Vector of the information such as location, business scope, registration information after text vector;Individually optimal feature selecting algorithm meter
Separability Criterion value when each feature is used alone is calculated, is then ranked up, is taken from big to small according to Separability Criterion value
The biggish feature of preceding 30 Separability Criterion values is combined as feature;But in conjunction with actual conditions, believe in collected business promotion
It ceases in data set, the phenomenon that there are loss of learning, so needing to consider single feature loss of learning degree, institute when selecting feature
Stating the modified algorithm based on individually optimal Method for Feature Selection is following formula:
Wherein, x (i)=(x (1), x (2), x (3) ..., x (n)), x (i) represent ith feature, and n is characterized number, and J (X) is indicated should
The Separability Criterion of characteristic set, N (x (i)) indicate the data volume number of ith feature not being missing from, and M indicates data volume
Total amount, N (x (i))/M illustrate missing degree of the ith feature in data.
Further, in step 4, the business promotion message data set that feature selecting obtains is divided into number of training
Method according to collection and test sample data set includes: the method that reserves, cross-validation method, bootstrap any one method.
Reserving method is that business promotion message data set is directly divided into the set of two mutual exclusions, one of set conduct
Training sample data collection, the set left is as test sample data set.
Cross-validation method is that business promotion message data set is divided into an equal-sized exclusive subsets, i.e., each subset
The consistency for all keeping data distribution as far as possible, that is, pass therethrough stratified sampling and obtain, and then, is made every time with the union of subset
For training sample data collection, that remaining subset is as test sample data set.
Bootstrap is to carry out sampling generation to progress business promotion message data set: random therefrom business promotion information every time
Data set selects a sample, is copied portion and is put into training sample data concentration, the conduct test sample number remained unchanged
According to collection, above procedure time is repeated.Wherein, the data set for having part sample that can repeatedly appear in business promotion information data concentration is made
For training sample data collection, and another part does not appear in the data set of business promotion message data set as test sample number
According to collection.
Due to needing to export a kind of client for having the purchase specific enterprise service or product with like attribute.So examining
The attribute that data set may be made there are different numbers is considered, so using LSTM network as core network carrys out training pattern.LSTM
It is a kind of type that RNN is special, long-term Dependency Specification can be learnt.LSTM controlled by " door " abandon information useless or
The specific gravity of advantageous information is improved, while adding memory cell inside model, collects relevant information before memory, realizes and forgets
Or the function of memory, this characteristic for carrying memory makes the network have great advantage to the product is used for a long time, because network exists
Also continuous study improves memory capability in use, improves the service life of product.
Further, in steps of 5, training sample data collection and test sample data set are obtained by word2vec model
For trained term vector method the following steps are included:
Step 5.1, it segments: due to Chinese particularity, sentence in business promotion information segment by segmenting library
To dictionary, segmenting library includes but is not limited to Jieba dictionary, IK dictionary, mmseg dictionary, word dictionary;
Step 5.2, count word frequency: the dictionary formed after segmenting in traversal step 5.1 counts the frequency of the word occurred
And it is numbered;
Step 5.3, it constructs tree-like result: according to there is the probability of occurrence of each word in step 5.2, constructing Huffman tree;
Step 5.4, it generates the binary code where node: the probability of occurrence of each word being converted into binary coding and carrys out table
Show each node in step 5.3 in Huffman tree;
Step 5.5, the term vector in the intermediate vector and leaf node of each non-leaf nodes is initialized: the Huffman
Each node in tree is all stored with the vector of an a length of m, but the meaning of leaf node and the vector in non-leaf node is not
Together, what is stored in leaf node is the term vector of each word, is the input as neural network;Rather than what is stored in leaf node is
Intermediate vector determines classification results with input corresponding to the parameter of hidden layer in neural network together;
Step 5.6, it training intermediate vector and term vector: uses CBOW (Continuous Bag-Of-Words Model)
The term vector of n-1 word near word A is added the input as system by model or Skip-Gram model, and according to word A
The binary code generated in step 5.4 successively carries out classification and trains intermediate vector and term vector according to classification results, finally
Obtain the correspondence term vector of business promotion information;Word A is word, and trained process mainly has input layer (input), mapping layer
(projection) and output layer (output) three phases;Input layer is n-1 word around some word A (word A)
Term vector.If n takes 5, word A (can be denoted as w (t)), and the first two and latter two word are w (t-2), w (t-1), w (t+
1),w(t+2).Corresponding, the term vector of that 4 words is denoted as v (w (t-2)), v (w (t-1)), v (w (t+1)), v (w (t+
2)).It is fairly simple from input layer to mapping layer, that n-1 term vector is added.
Further, in step 6, it is assembled for training by training sample data and practices LSTM neural network and by test sample number
It includes following for obtaining trained LSTM neural network as the method for recommender system according to collection test LSTM neural network accuracy
Step:
Step 6.1, using term vector training LSTM neural network: training sample data collection is passed through LSTM neural network
To forget door, starts the information discarding movement of LSTM neural network, information discarding movement is realized by the sigmoid layer in forgetting door,
It will check the input of sigmoid layers of previous output and current term vector, determine whether the information of Last status study retains,
The LSTM neural network includes input gate, forgets door and out gate;
Step 6.2, the training sample data collection after information being abandoned passes through the input gate of LSTM neural network, starts LSTM
The information update of neural network acts, and information update movement is realized by the sigmoid layer in input gate, then will change for tanh layers
The each cell state for becoming LSTM neural network, learns knowledge new out;
Step 6.3, the training sample data collection after information update is passed through to the out gate of LSTM neural network, exports one
Vector, this vector depend on the cell state in step 6.2;Firstly, sigmoid layers of operation obtains vector and determines cell state
Output par, c, cell state is handled, and it is multiplied with sigmoid output (vector) by tanh layers, is obtained
To the output information of LSTM network;
Step 6.4, it using LSTM network obtained in test data input step 6.3, is exported as a result, with test number
Label in compares, and verifies the accuracy of network, if accuracy reaches requirement, completes training and obtains trained LSTM mind
Through network, using trained LSTM neural network as recommender system.
It can illustrate how this method is realized by recommender system and client is precisely recommended into enterprise, help enterprise more effective
Develop to rate a large amount of, potential and accurately client.The thinking of " internet+finance " is utilized, deep learning and network are based on
Technology reduces cost required when enterprise development client, shortens the time required for developing client, improves the accurate of discovery client
Degree.
Embodiment of the disclosure is excavated using a kind of business promotion accurate information recommender system based on deep learning
The accuracys rate (0.12) of data be much higher than the accuracy rate (0.067) of traditional method based on content, retention ratio also has obviously
Improve;It is easier to realize hot spot business promotion information.LSTM deep neural network is applied on business promotion information excavating, is lost
Forget incoherent noise information, and deepen the memory of strong related information, algorithmically selects the superior and eliminates the inferior, optimum selecting feature.Using
Huffman tree carries out relative commercial promotion message participle, and the participle speed greatly promoted, the participle calculating time is probably general
Exhaustion 1/20.
A kind of business promotion accurate information recommender system based on deep learning that embodiment of the disclosure provides, such as Fig. 2
It is shown a kind of business promotion accurate information recommender system figure based on deep learning of the disclosure, one kind of the embodiment is based on
The business promotion accurate information recommender system of deep learning includes: processor, memory and stores in the memory simultaneously
The computer program that can be run on the processor, the processor realize a kind of above-mentioned base when executing the computer program
Step in the business promotion accurate information recommender system embodiment of deep learning.
It can be transported in the memory and on the processor the system comprises: memory, processor and storage
Capable computer program, the processor execute the computer program and operate in the unit of following system:
Dataset acquisition unit, for acquiring business promotion information data;
Data pre-processing unit, for carrying out pretreatment to the collected business promotion information data of institute and cleaning obtains quotient
Industry promotion message data set;
Feature selection unit, for carrying out dimensionality reduction and feature selecting to business promotion message data set;
Training sample division unit, the business promotion message data set for obtaining feature selecting are divided into training sample
Data set and test sample data set;
Vectorization unit is used for for training sample data collection and test sample data set by word2vec model
Trained term vector;
Recommender system obtaining unit, for practicing LSTM neural network by the training of training sample data and passing through test sample
Data set test LSTM neural network accuracy obtains trained LSTM neural network as recommender system.
A kind of business promotion accurate information recommender system based on deep learning can run on desktop PC,
Notebook, palm PC and cloud server etc. calculate in equipment.A kind of business promotion information essence based on deep learning
Quasi- recommender system, the system that can be run may include, but be not limited only to, processor, memory.Those skilled in the art can manage
Solution, the example is only a kind of example of business promotion accurate information recommender system based on deep learning, composition pair
A kind of restriction of the business promotion accurate information recommender system based on deep learning, may include portion more more or fewer than example
Part perhaps combines certain components or different components, such as a kind of business promotion accurate information based on deep learning
Recommender system can also include input-output equipment, network access equipment, bus etc..
Alleged processor can be central processing unit (Central Processing Unit, CPU), can also be it
His general processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng the processor is in a kind of control of business promotion accurate information recommender system operating system based on deep learning
The heart, using various interfaces and connection, entirely a kind of business promotion accurate information recommender system based on deep learning can be run
The various pieces of system.
The memory can be used for storing the computer program and/or module, and the processor is by operation or executes
Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization
A kind of various functions of the business promotion accurate information recommender system based on deep learning.The memory can mainly include storage
Program area and storage data area, wherein storing program area can application program needed for storage program area, at least one function
(such as sound-playing function, image player function etc.) etc.;Storage data area, which can be stored, uses created number according to mobile phone
According to (such as audio data, phone directory etc.) etc..In addition, memory may include high-speed random access memory, can also include
Nonvolatile memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), safety
Digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or
Other volatile solid-state parts.
Although the description of the disclosure is quite detailed and especially several embodiments are described, it is not
Any of these details or embodiment or any specific embodiments are intended to be limited to, but should be considered as is by reference to appended
A possibility that claim provides broad sense in view of the prior art for these claims explanation, to effectively cover the disclosure
Preset range.In addition, the disclosure is described with inventor's foreseeable embodiment above, its purpose is to be provided with
Description, and those equivalent modifications that the disclosure can be still represented to the unsubstantiality change of the disclosure still unforeseen at present.
Claims (7)
1. a kind of business promotion accurate information recommended method based on deep learning, which is characterized in that the method includes following
Step:
Step 1, business promotion information data is acquired;
Step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains business promotion message data set;
Step 3, dimensionality reduction and feature selecting are carried out to business promotion message data set;
Step 4, the business promotion message data set that feature selecting obtains is divided into training sample data collection and test sample number
According to collection;
Step 5, training sample data collection and test sample data set are obtained by word2vec model for trained term vector;
Step 6, it is assembled for training by training sample data and practices LSTM neural network and LSTM nerve is tested by test sample data set
Network accuracy obtains trained LSTM neural network as recommender system.
2. a kind of business promotion accurate information recommended method based on deep learning according to claim 1, feature exist
In in step 2, carrying out pretreatment to the collected business promotion information data of institute and cleaning obtain business promotion information data
The method of collection is, since the business promotion information data got is extremely huge, mixes or even useless, so needing to be located in advance
Reason is pre-processed as audit, the screening, row by being done before being classified to collected business promotion information data or being grouped
Integrality and accuracy, data screening, data sorting, i.e. data scrubbing, data set are audited in the necessary processing such as sequence, i.e. data
At the transformation of, data, data regularization;Cleaning is carried out to the collected business promotion information data of institute and refers to data cleansing, discovery is simultaneously
Last one of program of identifiable mistake in data file is corrected, including checks data consistency, handles invalid value and missing
Value etc.;Business promotion information data utilizes mathematical statistics, data mining or predefined cleaning rule by pretreatment and cleaning
Convert dirty data to the data for meeting quality of data requirement, that is, the business promotion message data set after cleaning.
3. a kind of business promotion accurate information recommended method based on deep learning according to claim 1, feature exist
In in step 3, the method for carrying out dimensionality reduction and feature selecting to business promotion message data set includes but is not limited to principal component point
Analysis method, independent component analysis, linear discriminant analysis, be locally linear embedding into, laplacian eigenmaps, multidimensional scaling, it is isocratic
Any one dimension reduction method in amount mapping;And feature selecting uses the modified algorithm based on individually optimal Method for Feature Selection, institute
State feature be the information such as location, business scope, the registration information of company concentrated of business promotion information data by text to
Vector after quantization;Individually optimal feature selecting algorithm calculates Separability Criterion value when each feature is used alone, so
It is ranked up from big to small according to Separability Criterion value afterwards, takes the biggish feature of preceding 30 Separability Criterion values as feature group
It closes;But in conjunction with actual conditions, the phenomenon that collected business promotion information data is concentrated, and there are loss of learning, so selecting
When selecting feature, need to consider single feature loss of learning degree, the modified algorithm based on individually optimal Method for Feature Selection is
Following formula:Wherein, x (i)=(x (1), x (2), x (3) ..., x (n)), x
(i) ith feature is represented, n is characterized number, and J (X) indicates that the Separability Criterion of this feature set, N (x (i)) indicate i-th
The data volume number of feature not being missing from, M indicate the total amount of data volume, and N (x (i))/M illustrates ith feature in data
Missing degree.
4. a kind of business promotion accurate information recommended method based on deep learning according to claim 3, feature exist
In in step 4, the business promotion message data set that feature selecting obtains being divided into training sample data collection and test sample
The method of data set includes: the method that reserves, cross-validation method, bootstrap any one method.
5. a kind of business promotion accurate information recommended method based on deep learning according to claim 4, feature exist
In, in steps of 5, training sample data collection and test sample data set by word2vec model obtain for trained word to
The method of amount the following steps are included:
Step 5.1, it segments: due to Chinese particularity, sentence in business promotion information being segmented to obtain word by segmenting library
Library, participle library includes but is not limited to Jieba dictionary, IK dictionary, mmseg dictionary, word dictionary;
Step 5.2, count word frequency: the dictionary that is formed after being segmented in traversal step 5.1, count the frequency of the word occurred and
It is numbered;
Step 5.3, it constructs tree-like result: according to there is the probability of occurrence of each word in step 5.2, constructing Huffman tree;
Step 5.4, it generates the binary code where node: the probability of occurrence of each word being converted into binary coding to indicate to walk
Each node in rapid 5.3 in Huffman tree;
Step 5.5, the term vector in the intermediate vector and leaf node of each non-leaf nodes is initialized: in the Huffman tree
Each node, be all stored with the vector of an a length of m, but the meaning of leaf node and the vector in non-leaf node is different, leaf
What is stored in child node is the term vector of each word, is the input as neural network;Rather than what is stored in leaf node is intermediate
Vector determines classification results with input corresponding to the parameter of hidden layer in neural network together;
Step 5.6, CBOW model or Skip-Gram model training intermediate vector and word training intermediate vector and term vector: are used
Vector finally obtains the correspondence term vector of business promotion information.
6. a kind of business promotion accurate information recommended method based on deep learning according to claim 5, feature exist
In, in step 6, by training sample data assemble for training practice LSTM neural network and by test sample data set test LSTM mind
Obtain trained LSTM neural network as the method for recommender system through network accuracy the following steps are included:
Step 6.1, using term vector training LSTM neural network: training sample data collection is passed through to the forgetting of LSTM neural network
Door starts the information discarding movement of LSTM neural network, and information discarding movement is realized by the sigmoid layer in forgetting door, will be looked into
It sees sigmoid layers of previous output and the input of current term vector, determines whether the information of Last status study retains, it is described
LSTM neural network includes input gate, forgets door and out gate;
Step 6.2, the training sample data collection after information being abandoned passes through the input gate of LSTM neural network, starts LSTM nerve
The information update of network acts, and information update movement is realized by the sigmoid layer in input gate, then will change for tanh layers
Each cell state of LSTM neural network, learns knowledge new out;
Step 6.3, by the training sample data collection after information update by the out gate of LSTM neural network, output one to
Amount, this vector depend on the cell state in step 6.2;Firstly, sigmoid layers of operation obtains vector and determines cell state
Output par, c is handled cell state by tanh layers, and it is multiplied with sigmoid output, and LSTM net is obtained
The output information of network;
Step 6.4, it using LSTM network obtained in test data input step 6.3, is exported as a result, with test data
Label compare, verify the accuracy of network, if accuracy reaches requirement, complete training obtaining trained LSTM nerve net
Network, using trained LSTM neural network as recommender system.
7. a kind of business promotion accurate information recommender system based on deep learning, which is characterized in that the system comprises: storage
Device, processor and storage in the memory and the computer program that can run on the processor, the processor
The computer program is executed to operate in the unit of following system:
Dataset acquisition unit, for acquiring business promotion information data;
Data pre-processing unit, for carrying out pretreatment to the collected business promotion information data of institute and cleaning obtains business and pushes away
Wide message data set;
Feature selection unit, for carrying out dimensionality reduction and feature selecting to business promotion message data set;
Training sample division unit, the business promotion message data set for obtaining feature selecting are divided into training sample data
Collection and test sample data set;
Vectorization unit is obtained by word2vec model for training for training sample data collection and test sample data set
Term vector;
Recommender system obtaining unit practices LSTM neural network and by test sample data for assembling for training by training sample data
Collection test LSTM neural network accuracy obtains trained LSTM neural network as recommender system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910461767.7A CN110264311B (en) | 2019-05-30 | 2019-05-30 | Business promotion information accurate recommendation method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910461767.7A CN110264311B (en) | 2019-05-30 | 2019-05-30 | Business promotion information accurate recommendation method and system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110264311A true CN110264311A (en) | 2019-09-20 |
CN110264311B CN110264311B (en) | 2023-04-18 |
Family
ID=67915965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910461767.7A Active CN110264311B (en) | 2019-05-30 | 2019-05-30 | Business promotion information accurate recommendation method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110264311B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860981A (en) * | 2020-07-03 | 2020-10-30 | 航天信息(山东)科技有限公司 | Enterprise national industry category prediction method and system based on LSTM deep learning |
CN112465389A (en) * | 2020-12-12 | 2021-03-09 | 广东电力信息科技有限公司 | Word frequency-based similar provider recommendation method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130204729A1 (en) * | 2012-02-08 | 2013-08-08 | Ebay Inc. | Systems and methods for reseller discovery and analysis |
CN105869024A (en) * | 2016-04-20 | 2016-08-17 | 北京小米移动软件有限公司 | Commodity recommending method and device |
CN107153642A (en) * | 2017-05-16 | 2017-09-12 | 华北电力大学 | A kind of analysis method based on neural network recognization text comments Sentiment orientation |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN108256052A (en) * | 2018-01-15 | 2018-07-06 | 成都初联创智软件有限公司 | Automobile industry potential customers' recognition methods based on tri-training |
CN108427670A (en) * | 2018-04-08 | 2018-08-21 | 重庆邮电大学 | A kind of sentiment analysis method based on context word vector sum deep learning |
CN109635204A (en) * | 2018-12-21 | 2019-04-16 | 上海交通大学 | Online recommender system based on collaborative filtering and length memory network |
-
2019
- 2019-05-30 CN CN201910461767.7A patent/CN110264311B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130204729A1 (en) * | 2012-02-08 | 2013-08-08 | Ebay Inc. | Systems and methods for reseller discovery and analysis |
CN105869024A (en) * | 2016-04-20 | 2016-08-17 | 北京小米移动软件有限公司 | Commodity recommending method and device |
CN107153642A (en) * | 2017-05-16 | 2017-09-12 | 华北电力大学 | A kind of analysis method based on neural network recognization text comments Sentiment orientation |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN108256052A (en) * | 2018-01-15 | 2018-07-06 | 成都初联创智软件有限公司 | Automobile industry potential customers' recognition methods based on tri-training |
CN108427670A (en) * | 2018-04-08 | 2018-08-21 | 重庆邮电大学 | A kind of sentiment analysis method based on context word vector sum deep learning |
CN109635204A (en) * | 2018-12-21 | 2019-04-16 | 上海交通大学 | Online recommender system based on collaborative filtering and length memory network |
Non-Patent Citations (1)
Title |
---|
李爱国、库向阳: "数据预处理", 《数据挖掘原理、算法及应用》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860981A (en) * | 2020-07-03 | 2020-10-30 | 航天信息(山东)科技有限公司 | Enterprise national industry category prediction method and system based on LSTM deep learning |
CN111860981B (en) * | 2020-07-03 | 2024-01-19 | 航天信息(山东)科技有限公司 | Enterprise national industry category prediction method and system based on LSTM deep learning |
CN112465389A (en) * | 2020-12-12 | 2021-03-09 | 广东电力信息科技有限公司 | Word frequency-based similar provider recommendation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110264311B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Hybrid models for open set recognition | |
KR102173555B1 (en) | Machine learning-based network model building method and apparatus | |
Vergari et al. | Visualizing and understanding sum-product networks | |
US20100211533A1 (en) | Extracting structured data from web forums | |
CN107193915A (en) | A kind of company information sorting technique and device | |
CN110737805B (en) | Method and device for processing graph model data and terminal equipment | |
CN112417289A (en) | Information intelligent recommendation method based on deep clustering | |
DE102021004562A1 (en) | Modification of scene graphs based on natural language commands | |
CN110264311A (en) | A kind of business promotion accurate information recommended method and system based on deep learning | |
CN109299286A (en) | The Knowledge Discovery Method and system of unstructured data | |
CN110830291A (en) | Node classification method of heterogeneous information network based on meta-path | |
Wang et al. | Asymmetric graph based zero shot learning | |
CN112131506B (en) | Webpage classification method, terminal equipment and storage medium | |
CN113535177A (en) | Form generation method, device and equipment | |
CN116595191A (en) | Construction method and device of interactive low-code knowledge graph | |
WO2023155303A1 (en) | Webpage data extraction method and apparatus, computer device, and storage medium | |
Tsai et al. | An intelligent recommendation system for animation scriptwriters’ education | |
CN117033751A (en) | Recommended information processing method, recommended information processing device, storage medium and equipment | |
CN115168609A (en) | Text matching method and device, computer equipment and storage medium | |
CN114840642A (en) | Event extraction method, device, equipment and storage medium | |
CN109086373B (en) | Method for constructing fair link prediction evaluation system | |
Cazzolato et al. | A statistical decision tree algorithm for medical data stream mining | |
CN112835797A (en) | Metamorphic relation prediction method based on program intermediate structure characteristics | |
Walha et al. | From user generated content to social data warehouse: Processes, operations and data modelling | |
Serratosa et al. | Graph edit distance or graph edit pseudo-distance? |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |