CN110264311A - A kind of business promotion accurate information recommended method and system based on deep learning - Google Patents

A kind of business promotion accurate information recommended method and system based on deep learning Download PDF

Info

Publication number
CN110264311A
CN110264311A CN201910461767.7A CN201910461767A CN110264311A CN 110264311 A CN110264311 A CN 110264311A CN 201910461767 A CN201910461767 A CN 201910461767A CN 110264311 A CN110264311 A CN 110264311A
Authority
CN
China
Prior art keywords
data
business promotion
neural network
information
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910461767.7A
Other languages
Chinese (zh)
Other versions
CN110264311B (en
Inventor
苏俊健
王东
麦志领
何佳奋
纪淇纯
叶新华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN201910461767.7A priority Critical patent/CN110264311B/en
Publication of CN110264311A publication Critical patent/CN110264311A/en
Application granted granted Critical
Publication of CN110264311B publication Critical patent/CN110264311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a kind of business promotion accurate information recommended method and system based on deep learning, it assembles for training to practice LSTM neural network and test the accuracy of LSTM neural network by test sample data set by training sample data and obtains trained LSTM neural network as recommender system, obtain the classifier of the recommender system of perfect in shape and function, shorten the time required for developing client, improves the precision of discovery client.Disclosed method and system are a set of molding, efficient, facing area industry recommender systems;In each region, there are many zonal concentrations of enterprises to develop client under same panel region, line and have been approached saturation.And potential customers are excavated under line and need certain manpower and material resources cost, and success rate is not high.There is an urgent need to a set of molding systems for this kind of enterprise to guide, and save cost required for developing client.

Description

A kind of business promotion accurate information recommended method and system based on deep learning
Technical field
This disclosure relates to machine learning proposed algorithm and depth learning technology field, and in particular to one kind is based on deep learning Business promotion accurate information recommended method and system.
Background technique
Conventional recommendation systems are generally the recommendation between commodity and customer, for the different commodity of different customer recommendations.And Traditional proposed algorithm generally has: content-based recommendation (Content Based, CB), collaborative filtering (Collaborative Filtering CF), mixed recommendation method etc..And this patent casts aside conventional recommendation mode, realizes upstream firm and down-stream enterprise Between recommendation, using solving relationship complicated between upstream and downstream firms based on the proposed algorithm of deep learning.
There is following two aspect in current conventional recommendation systems:
1) research field of domestic proposed algorithm at this stage is concentrated mainly on the accurate recommendation to commodity, for lead referral Research it is also fewer.And the algorithm of studies in China recommender system generally uses machine learning algorithm, but for now increasingly multiple The data relationship of miscellaneous business connection and diversification, learning efficiency possessed by machine learning are unable to meet demand gradually.
2) it is detached from the frame of traditional data mining, data are automatically grabbed and screened by webpage, reduce the cost for collecting data Or solve the predicament for lacking data
Summary of the invention
To solve the above problems, the disclosure provide a kind of business promotion accurate information recommended method based on deep learning and The technical solution of system is assembled for training by training sample data and practices LSTM neural network and test LSTM by test sample data set Neural network accuracy obtains trained LSTM neural network as recommender system, obtains point of the recommender system of perfect in shape and function Class device is based on deep learning and network technology, reduces cost required when enterprise development client, shortens and develops required for client Time improves the precision of discovery client.
To achieve the goals above, according to the one side of the disclosure, a kind of business promotion letter based on deep learning is provided Accurate recommendation method is ceased, the described method comprises the following steps:
Step 1, business promotion information data is acquired;
Step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains business promotion Information Number According to collection;
Step 3, dimensionality reduction and feature selecting are carried out to business promotion message data set;
Step 4, the business promotion message data set that feature selecting obtains is divided into training sample data collection and test specimens Notebook data collection;
Step 5, training sample data collection and test sample data set are obtained by word2vec model for trained word Vector;
Step 6, it is assembled for training by training sample data and practices LSTM (Long Short-Term Memory) neural network and lead to It crosses the test LSTM neural network accuracy of test sample data set and obtains trained LSTM neural network as recommender system.
Further, in step 1, the method for acquiring business promotion information data includes but is not limited to: acquiring open source Data set website such as kaggle data set is as business promotion information data;By the web crawlers technology after secondary development to class Taobao's hotel owner's webpage crosses crawler with city Transaction Information Netcom and carries out crawl data set, obtains business promotion information data;Benefit It is backed up with the webpage of the txt format retained in Baidu's snapshot, information required for therefrom obtaining is as business promotion information data.
Further, in step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains quotient The method of industry promotion message data set is, since the business promotion information data got is extremely huge, mixes or even useless, institute To be pre-processed, pre-process as by being done before being classified to collected business promotion information data or being grouped The necessary processing such as audit, screening, sequence, i.e. data audit integrality and accuracy, data screening, data sorting count According to cleaning, data integration, data transformation, data regularization;Carrying out cleaning to the collected business promotion information data of institute is index According to cleaning, last one of program of identifiable mistake in data file is found and corrects, including check data consistency, processing Invalid value and missing values etc.;Business promotion information data using mathematical statistics, data mining or is made a reservation for by pretreatment and cleaning The cleaning rule of justice converts dirty data to the data for meeting quality of data requirement, that is, the business promotion information data after cleaning Collection.
Further, in step 3, the method for dimensionality reduction and feature selecting being carried out to business promotion message data set include but It is not limited to principal component analytical method (Principal Component Analysis, PCA), independent component analysis (Independent Component Analysis, ICA), linear discriminant analysis (Linear Discriminant Analysis, LDA), be locally linear embedding into (Locally Linear Embedding, LLE), laplacian eigenmaps (Laplacian Eigenmaps), multidimensional scale (MultiDimensional scaling, MDS), isometry maps (Equal Metric Mapping) in any one dimension reduction method;And feature selecting uses changing based on individually optimal Method for Feature Selection Good algorithm, the feature are the information such as location, business scope, the registration information of company that business promotion information data is concentrated warp Vector after crossing text vector;The separability that individually optimal feature selecting algorithm calculates when each feature is used alone is sentenced It according to value, is then ranked up from big to small according to Separability Criterion value, takes the preceding biggish feature conduct of 30 Separability Criterion values Feature combination;But in conjunction with actual conditions, the phenomenon that collected business promotion information data is concentrated, and there are loss of learning, institute When selecting feature, to need to consider single feature loss of learning degree, the improvement based on individually optimal Method for Feature Selection Algorithm is following formula:Wherein, x (i)=(x (1), x (2), x (3) ..., x (n)), x (i) represents ith feature, and n is characterized number, and J (X) indicates that the Separability Criterion of this feature set, N (x (i)) indicate The data volume number of ith feature not being missing from, M indicate the total amount of data volume, and N (x (i))/M illustrates that ith feature exists Missing degree in data, the phenomenon that improving loss of learning.
Further, in step 4, the business promotion message data set that feature selecting obtains is divided into number of training Method according to collection and test sample data set includes: the method that reserves, cross-validation method, bootstrap any one method.
Reserving method is that business promotion message data set is directly divided into the set of two mutual exclusions, one of set conduct Training sample data collection, the set left is as test sample data set.
Cross-validation method is that business promotion message data set is divided into an equal-sized exclusive subsets, i.e., each subset The consistency for all keeping data distribution as far as possible, that is, pass therethrough stratified sampling and obtain, and then, is made every time with the union of subset For training sample data collection, that remaining subset is as test sample data set.
Bootstrap is to carry out sampling generation to progress business promotion message data set: random therefrom business promotion information every time Data set selects a sample, is copied portion and is put into training sample data concentration, the conduct test sample number remained unchanged According to collection, above procedure time is repeated.Wherein, the data set for having part sample that can repeatedly appear in business promotion information data concentration is made For training sample data collection, and another part does not appear in the data set of business promotion message data set as test sample number According to collection.
Due to needing to export a kind of client for having the purchase specific enterprise service or product with like attribute.So examining The attribute that data set may be made there are different numbers is considered, so LSTM is used to carry out training pattern as core network.LSTM is one The special type of kind RNN (Recurrent Neural Network), can learn long-term Dependency Specification.LSTM passes through " door " (gate) to control the specific gravity for abandoning information useless or improving advantageous information, while memory cell is added inside model (cell), relevant information before remembering is collected, realizes that the function of forgeing or remember, this characteristic remembered that carries make the network Have great advantage to the product is used for a long time, because also continuous study improves memory capability to network in use, improves product Service life.
Further, in steps of 5, training sample data collection and test sample data set are obtained by word2vec model For trained term vector method the following steps are included:
Step 5.1, it segments: due to Chinese particularity, sentence in business promotion information segment by segmenting library To dictionary, segmenting library includes but is not limited to Jieba dictionary, IK dictionary, mmseg dictionary, word dictionary;
Step 5.2, count word frequency: the dictionary formed after segmenting in traversal step 5.1 counts the frequency of the word occurred And it is numbered;
Step 5.3, it constructs tree-like result: according to there is the probability of occurrence of each word in step 5.2, constructing Huffman tree;
Step 5.4, it generates the binary code where node: the probability of occurrence of each word being converted into binary coding and carrys out table Show each node in step 5.3 in Huffman tree;
Step 5.5, the term vector in the intermediate vector and leaf node of each non-leaf nodes is initialized: the Huffman Each node in tree is all stored with the vector of an a length of m, but the meaning of leaf node and the vector in non-leaf node is not Together, what is stored in leaf node is the term vector of each word, is the input as neural network;Rather than what is stored in leaf node is Intermediate vector determines classification results with input corresponding to the parameter of hidden layer in neural network together;
Step 5.6, it training intermediate vector and term vector: uses CBOW (Continuous Bag-Of-Words Model) The term vector of n-1 word near word A is added the input as system by model or Skip-Gram model, and according to word A The binary code generated in step 5.4 successively carries out classification and trains intermediate vector and term vector according to classification results, finally Obtain the correspondence term vector of business promotion information;Word A is word, and trained process mainly has input layer (input), mapping layer (projection) and output layer (output) three phases;Input layer is n-1 word around some word A (word A) Term vector.If n takes 5, word A (can be denoted as w (t)), and the first two and latter two word are w (t-2), w (t-1), w (t+ 1),w(t+2).Corresponding, the term vector of that 4 words is denoted as v (w (t-2)), v (w (t-1)), v (w (t+1)), v (w (t+ 2)).It is fairly simple from input layer to mapping layer, that n-1 term vector is added.
Further, in step 6, it is assembled for training by training sample data and practices LSTM neural network and by test sample number It includes following for obtaining trained LSTM neural network as the method for recommender system according to collection test LSTM neural network accuracy Step:
Step 6.1, using term vector training LSTM neural network: training sample data collection is passed through LSTM neural network To forget door, starts the information discarding movement of LSTM neural network, information discarding movement is realized by the sigmoid layer in forgetting door, It will check the input of sigmoid layers of previous output and current term vector, determine whether the information of Last status study retains, The LSTM neural network includes input gate, forgets door and out gate;
Step 6.2, the training sample data collection after information being abandoned passes through the input gate of LSTM neural network, starts LSTM The information update of neural network acts, and information update movement is realized by the sigmoid layer in input gate, then will change for tanh layers The each cell state for becoming LSTM neural network, learns knowledge new out;
Step 6.3, the training sample data collection after information update is passed through to the out gate of LSTM neural network, exports one Vector, this vector depend on the cell state in step 6.2;Firstly, sigmoid layers of operation obtains vector and determines cell state Output par, c, cell state is handled, and it is multiplied with sigmoid output (vector) by tanh layers, is obtained To the output information of LSTM network;
Step 6.4, it using LSTM network obtained in test data input step 6.3, is exported as a result, with test number Label in compares, and verifies the accuracy of network, if accuracy reaches requirement, completes training and obtains trained LSTM mind Through network, using trained LSTM neural network as recommender system.
The present invention also provides a kind of business promotion accurate information recommender system based on deep learning, the system packet Include: memory, processor and storage are in the memory and the computer program that can run on the processor, described Processor executes the computer program and operates in the unit of following system:
Dataset acquisition unit, for acquiring business promotion information data;
Data pre-processing unit, for carrying out pretreatment to the collected business promotion information data of institute and cleaning obtains quotient Industry promotion message data set;
Feature selection unit, for carrying out dimensionality reduction and feature selecting to business promotion message data set;
Training sample division unit, the business promotion message data set for obtaining feature selecting are divided into training sample Data set and test sample data set;
Vectorization unit is used for for training sample data collection and test sample data set by word2vec model Trained term vector;
Recommender system obtaining unit, for practicing LSTM neural network by the training of training sample data and passing through test sample Data set test LSTM neural network accuracy obtains trained LSTM neural network as recommender system.
The beneficial effect of the disclosure
The present invention provides a kind of business promotion accurate information recommended method and system based on deep learning:
1) the no molding online service system of excavation aspect of potential customers, the visitor based on social networks under line are directed to Recommend to go to substantially ultimate attainment in family;
Although 2) there is enterprise study simultaneously development system for above situation, it is reported that, this simple system is not The situation of various complexity in reality is adapted to, work effect is little.According to investigations, domestic basic not specifically for company or enterprise The company for carrying out information recommendation, even without a set of molding, efficient, facing area industry recommender system;
3) in each region, have many zonal concentrations of enterprises develop under same panel region, line client have been approached it is full With.And potential customers are excavated under line and need certain manpower and material resources cost, and success rate is not high.There is an urgent need to one for this kind of enterprise The system of sleeve forming is guided, and cost required for developing client is saved;
4) it by achieving good effect in terms of natural language processing using LSTM, applies in language translator, With very high accuracy rate and adaptive ability, the condition of various complexity in reality can be perfectly adapted to, this production is suitable as The core network of product.
Detailed description of the invention
By the way that the embodiment in conjunction with shown by attached drawing is described in detail, above-mentioned and other features of the disclosure will More obvious, identical reference label indicates the same or similar element in disclosure attached drawing, it should be apparent that, it is described below Attached drawing be only some embodiments of the present disclosure, for those of ordinary skill in the art, do not making the creative labor Under the premise of, it is also possible to obtain other drawings based on these drawings, in the accompanying drawings:
Fig. 1 show a kind of flow chart of business promotion accurate information recommended method based on deep learning;
Fig. 2 show a kind of business promotion accurate information recommender system figure based on deep learning.
Specific embodiment
It is carried out below with reference to technical effect of the embodiment and attached drawing to the design of the disclosure, specific structure and generation clear Chu, complete description, to be completely understood by the purpose, scheme and effect of the disclosure.It should be noted that the case where not conflicting Under, the features in the embodiments and the embodiments of the present application can be combined with each other.
As shown in Figure 1 for according to a kind of stream of business promotion accurate information recommended method based on deep learning of the disclosure Cheng Tu illustrates a kind of business promotion information essence based on deep learning according to embodiment of the present disclosure below with reference to Fig. 1 Quasi- recommended method.
The disclosure proposes a kind of business promotion accurate information recommended method based on deep learning, specifically includes following step It is rapid:
Step 1, business promotion information data is acquired;
Step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains business promotion Information Number According to collection;
Step 3, dimensionality reduction and feature selecting are carried out to business promotion message data set;
Step 4, the business promotion message data set that feature selecting obtains is divided into training sample data collection and test specimens Notebook data collection;
Step 5, training sample data collection and test sample data set are obtained by word2vec model for trained word Vector;
Step 6, it is assembled for training by training sample data and practices LSTM (Long Short-Term Memory) neural network and lead to It crosses the test LSTM neural network accuracy of test sample data set and obtains trained LSTM neural network as recommender system.
Further, in step 1, the method for acquiring business promotion information data includes but is not limited to: acquiring open source Data set website such as kaggle data set is as business promotion information data;By the web crawlers technology after secondary development to class Taobao's hotel owner's webpage crosses crawler with city Transaction Information Netcom and carries out crawl data set, obtains business promotion information data;Benefit It is backed up with the webpage of the txt format retained in Baidu's snapshot, information required for therefrom obtaining is as business promotion information data.
Further, in step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains quotient The method of industry promotion message data set is, since the business promotion information data got is extremely huge, mixes or even useless, institute To be pre-processed, pre-process as by being done before being classified to collected business promotion information data or being grouped The necessary processing such as audit, screening, sequence, i.e. data audit integrality and accuracy, data screening, data sorting count According to cleaning, data integration, data transformation, data regularization;Carrying out cleaning to the collected business promotion information data of institute is index According to cleaning, last one of program of identifiable mistake in data file is found and corrects, including check data consistency, processing Invalid value and missing values etc.;Business promotion information data using mathematical statistics, data mining or is made a reservation for by pretreatment and cleaning The cleaning rule of justice converts dirty data to the data for meeting quality of data requirement, that is, the business promotion information data after cleaning Collection.
Further, in step 3, the method for dimensionality reduction and feature selecting being carried out to business promotion message data set include but It is not limited to principal component analytical method (PCA), independent component analysis (ICA), linear discriminant analysis (LDA), is locally linear embedding into (LLE), any one dimension reduction method during laplacian eigenmaps, multidimensional scale (MDS), isometry maps;And feature selecting Using the modified algorithm based on individually optimal Method for Feature Selection, the feature is the company that business promotion information data is concentrated Vector of the information such as location, business scope, registration information after text vector;Individually optimal feature selecting algorithm meter Separability Criterion value when each feature is used alone is calculated, is then ranked up, is taken from big to small according to Separability Criterion value The biggish feature of preceding 30 Separability Criterion values is combined as feature;But in conjunction with actual conditions, believe in collected business promotion It ceases in data set, the phenomenon that there are loss of learning, so needing to consider single feature loss of learning degree, institute when selecting feature Stating the modified algorithm based on individually optimal Method for Feature Selection is following formula: Wherein, x (i)=(x (1), x (2), x (3) ..., x (n)), x (i) represent ith feature, and n is characterized number, and J (X) is indicated should The Separability Criterion of characteristic set, N (x (i)) indicate the data volume number of ith feature not being missing from, and M indicates data volume Total amount, N (x (i))/M illustrate missing degree of the ith feature in data.
Further, in step 4, the business promotion message data set that feature selecting obtains is divided into number of training Method according to collection and test sample data set includes: the method that reserves, cross-validation method, bootstrap any one method.
Reserving method is that business promotion message data set is directly divided into the set of two mutual exclusions, one of set conduct Training sample data collection, the set left is as test sample data set.
Cross-validation method is that business promotion message data set is divided into an equal-sized exclusive subsets, i.e., each subset The consistency for all keeping data distribution as far as possible, that is, pass therethrough stratified sampling and obtain, and then, is made every time with the union of subset For training sample data collection, that remaining subset is as test sample data set.
Bootstrap is to carry out sampling generation to progress business promotion message data set: random therefrom business promotion information every time Data set selects a sample, is copied portion and is put into training sample data concentration, the conduct test sample number remained unchanged According to collection, above procedure time is repeated.Wherein, the data set for having part sample that can repeatedly appear in business promotion information data concentration is made For training sample data collection, and another part does not appear in the data set of business promotion message data set as test sample number According to collection.
Due to needing to export a kind of client for having the purchase specific enterprise service or product with like attribute.So examining The attribute that data set may be made there are different numbers is considered, so using LSTM network as core network carrys out training pattern.LSTM It is a kind of type that RNN is special, long-term Dependency Specification can be learnt.LSTM controlled by " door " abandon information useless or The specific gravity of advantageous information is improved, while adding memory cell inside model, collects relevant information before memory, realizes and forgets Or the function of memory, this characteristic for carrying memory makes the network have great advantage to the product is used for a long time, because network exists Also continuous study improves memory capability in use, improves the service life of product.
Further, in steps of 5, training sample data collection and test sample data set are obtained by word2vec model For trained term vector method the following steps are included:
Step 5.1, it segments: due to Chinese particularity, sentence in business promotion information segment by segmenting library To dictionary, segmenting library includes but is not limited to Jieba dictionary, IK dictionary, mmseg dictionary, word dictionary;
Step 5.2, count word frequency: the dictionary formed after segmenting in traversal step 5.1 counts the frequency of the word occurred And it is numbered;
Step 5.3, it constructs tree-like result: according to there is the probability of occurrence of each word in step 5.2, constructing Huffman tree;
Step 5.4, it generates the binary code where node: the probability of occurrence of each word being converted into binary coding and carrys out table Show each node in step 5.3 in Huffman tree;
Step 5.5, the term vector in the intermediate vector and leaf node of each non-leaf nodes is initialized: the Huffman Each node in tree is all stored with the vector of an a length of m, but the meaning of leaf node and the vector in non-leaf node is not Together, what is stored in leaf node is the term vector of each word, is the input as neural network;Rather than what is stored in leaf node is Intermediate vector determines classification results with input corresponding to the parameter of hidden layer in neural network together;
Step 5.6, it training intermediate vector and term vector: uses CBOW (Continuous Bag-Of-Words Model) The term vector of n-1 word near word A is added the input as system by model or Skip-Gram model, and according to word A The binary code generated in step 5.4 successively carries out classification and trains intermediate vector and term vector according to classification results, finally Obtain the correspondence term vector of business promotion information;Word A is word, and trained process mainly has input layer (input), mapping layer (projection) and output layer (output) three phases;Input layer is n-1 word around some word A (word A) Term vector.If n takes 5, word A (can be denoted as w (t)), and the first two and latter two word are w (t-2), w (t-1), w (t+ 1),w(t+2).Corresponding, the term vector of that 4 words is denoted as v (w (t-2)), v (w (t-1)), v (w (t+1)), v (w (t+ 2)).It is fairly simple from input layer to mapping layer, that n-1 term vector is added.
Further, in step 6, it is assembled for training by training sample data and practices LSTM neural network and by test sample number It includes following for obtaining trained LSTM neural network as the method for recommender system according to collection test LSTM neural network accuracy Step:
Step 6.1, using term vector training LSTM neural network: training sample data collection is passed through LSTM neural network To forget door, starts the information discarding movement of LSTM neural network, information discarding movement is realized by the sigmoid layer in forgetting door, It will check the input of sigmoid layers of previous output and current term vector, determine whether the information of Last status study retains, The LSTM neural network includes input gate, forgets door and out gate;
Step 6.2, the training sample data collection after information being abandoned passes through the input gate of LSTM neural network, starts LSTM The information update of neural network acts, and information update movement is realized by the sigmoid layer in input gate, then will change for tanh layers The each cell state for becoming LSTM neural network, learns knowledge new out;
Step 6.3, the training sample data collection after information update is passed through to the out gate of LSTM neural network, exports one Vector, this vector depend on the cell state in step 6.2;Firstly, sigmoid layers of operation obtains vector and determines cell state Output par, c, cell state is handled, and it is multiplied with sigmoid output (vector) by tanh layers, is obtained To the output information of LSTM network;
Step 6.4, it using LSTM network obtained in test data input step 6.3, is exported as a result, with test number Label in compares, and verifies the accuracy of network, if accuracy reaches requirement, completes training and obtains trained LSTM mind Through network, using trained LSTM neural network as recommender system.
It can illustrate how this method is realized by recommender system and client is precisely recommended into enterprise, help enterprise more effective Develop to rate a large amount of, potential and accurately client.The thinking of " internet+finance " is utilized, deep learning and network are based on Technology reduces cost required when enterprise development client, shortens the time required for developing client, improves the accurate of discovery client Degree.
Embodiment of the disclosure is excavated using a kind of business promotion accurate information recommender system based on deep learning The accuracys rate (0.12) of data be much higher than the accuracy rate (0.067) of traditional method based on content, retention ratio also has obviously Improve;It is easier to realize hot spot business promotion information.LSTM deep neural network is applied on business promotion information excavating, is lost Forget incoherent noise information, and deepen the memory of strong related information, algorithmically selects the superior and eliminates the inferior, optimum selecting feature.Using Huffman tree carries out relative commercial promotion message participle, and the participle speed greatly promoted, the participle calculating time is probably general Exhaustion 1/20.
A kind of business promotion accurate information recommender system based on deep learning that embodiment of the disclosure provides, such as Fig. 2 It is shown a kind of business promotion accurate information recommender system figure based on deep learning of the disclosure, one kind of the embodiment is based on The business promotion accurate information recommender system of deep learning includes: processor, memory and stores in the memory simultaneously The computer program that can be run on the processor, the processor realize a kind of above-mentioned base when executing the computer program Step in the business promotion accurate information recommender system embodiment of deep learning.
It can be transported in the memory and on the processor the system comprises: memory, processor and storage Capable computer program, the processor execute the computer program and operate in the unit of following system:
Dataset acquisition unit, for acquiring business promotion information data;
Data pre-processing unit, for carrying out pretreatment to the collected business promotion information data of institute and cleaning obtains quotient Industry promotion message data set;
Feature selection unit, for carrying out dimensionality reduction and feature selecting to business promotion message data set;
Training sample division unit, the business promotion message data set for obtaining feature selecting are divided into training sample Data set and test sample data set;
Vectorization unit is used for for training sample data collection and test sample data set by word2vec model Trained term vector;
Recommender system obtaining unit, for practicing LSTM neural network by the training of training sample data and passing through test sample Data set test LSTM neural network accuracy obtains trained LSTM neural network as recommender system.
A kind of business promotion accurate information recommender system based on deep learning can run on desktop PC, Notebook, palm PC and cloud server etc. calculate in equipment.A kind of business promotion information essence based on deep learning Quasi- recommender system, the system that can be run may include, but be not limited only to, processor, memory.Those skilled in the art can manage Solution, the example is only a kind of example of business promotion accurate information recommender system based on deep learning, composition pair A kind of restriction of the business promotion accurate information recommender system based on deep learning, may include portion more more or fewer than example Part perhaps combines certain components or different components, such as a kind of business promotion accurate information based on deep learning Recommender system can also include input-output equipment, network access equipment, bus etc..
Alleged processor can be central processing unit (Central Processing Unit, CPU), can also be it His general processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng the processor is in a kind of control of business promotion accurate information recommender system operating system based on deep learning The heart, using various interfaces and connection, entirely a kind of business promotion accurate information recommender system based on deep learning can be run The various pieces of system.
The memory can be used for storing the computer program and/or module, and the processor is by operation or executes Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization A kind of various functions of the business promotion accurate information recommender system based on deep learning.The memory can mainly include storage Program area and storage data area, wherein storing program area can application program needed for storage program area, at least one function (such as sound-playing function, image player function etc.) etc.;Storage data area, which can be stored, uses created number according to mobile phone According to (such as audio data, phone directory etc.) etc..In addition, memory may include high-speed random access memory, can also include Nonvolatile memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), safety Digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or Other volatile solid-state parts.
Although the description of the disclosure is quite detailed and especially several embodiments are described, it is not Any of these details or embodiment or any specific embodiments are intended to be limited to, but should be considered as is by reference to appended A possibility that claim provides broad sense in view of the prior art for these claims explanation, to effectively cover the disclosure Preset range.In addition, the disclosure is described with inventor's foreseeable embodiment above, its purpose is to be provided with Description, and those equivalent modifications that the disclosure can be still represented to the unsubstantiality change of the disclosure still unforeseen at present.

Claims (7)

1. a kind of business promotion accurate information recommended method based on deep learning, which is characterized in that the method includes following Step:
Step 1, business promotion information data is acquired;
Step 2, pretreatment is carried out to the collected business promotion information data of institute and cleaning obtains business promotion message data set;
Step 3, dimensionality reduction and feature selecting are carried out to business promotion message data set;
Step 4, the business promotion message data set that feature selecting obtains is divided into training sample data collection and test sample number According to collection;
Step 5, training sample data collection and test sample data set are obtained by word2vec model for trained term vector;
Step 6, it is assembled for training by training sample data and practices LSTM neural network and LSTM nerve is tested by test sample data set Network accuracy obtains trained LSTM neural network as recommender system.
2. a kind of business promotion accurate information recommended method based on deep learning according to claim 1, feature exist In in step 2, carrying out pretreatment to the collected business promotion information data of institute and cleaning obtain business promotion information data The method of collection is, since the business promotion information data got is extremely huge, mixes or even useless, so needing to be located in advance Reason is pre-processed as audit, the screening, row by being done before being classified to collected business promotion information data or being grouped Integrality and accuracy, data screening, data sorting, i.e. data scrubbing, data set are audited in the necessary processing such as sequence, i.e. data At the transformation of, data, data regularization;Cleaning is carried out to the collected business promotion information data of institute and refers to data cleansing, discovery is simultaneously Last one of program of identifiable mistake in data file is corrected, including checks data consistency, handles invalid value and missing Value etc.;Business promotion information data utilizes mathematical statistics, data mining or predefined cleaning rule by pretreatment and cleaning Convert dirty data to the data for meeting quality of data requirement, that is, the business promotion message data set after cleaning.
3. a kind of business promotion accurate information recommended method based on deep learning according to claim 1, feature exist In in step 3, the method for carrying out dimensionality reduction and feature selecting to business promotion message data set includes but is not limited to principal component point Analysis method, independent component analysis, linear discriminant analysis, be locally linear embedding into, laplacian eigenmaps, multidimensional scaling, it is isocratic Any one dimension reduction method in amount mapping;And feature selecting uses the modified algorithm based on individually optimal Method for Feature Selection, institute State feature be the information such as location, business scope, the registration information of company concentrated of business promotion information data by text to Vector after quantization;Individually optimal feature selecting algorithm calculates Separability Criterion value when each feature is used alone, so It is ranked up from big to small according to Separability Criterion value afterwards, takes the biggish feature of preceding 30 Separability Criterion values as feature group It closes;But in conjunction with actual conditions, the phenomenon that collected business promotion information data is concentrated, and there are loss of learning, so selecting When selecting feature, need to consider single feature loss of learning degree, the modified algorithm based on individually optimal Method for Feature Selection is Following formula:Wherein, x (i)=(x (1), x (2), x (3) ..., x (n)), x (i) ith feature is represented, n is characterized number, and J (X) indicates that the Separability Criterion of this feature set, N (x (i)) indicate i-th The data volume number of feature not being missing from, M indicate the total amount of data volume, and N (x (i))/M illustrates ith feature in data Missing degree.
4. a kind of business promotion accurate information recommended method based on deep learning according to claim 3, feature exist In in step 4, the business promotion message data set that feature selecting obtains being divided into training sample data collection and test sample The method of data set includes: the method that reserves, cross-validation method, bootstrap any one method.
5. a kind of business promotion accurate information recommended method based on deep learning according to claim 4, feature exist In, in steps of 5, training sample data collection and test sample data set by word2vec model obtain for trained word to The method of amount the following steps are included:
Step 5.1, it segments: due to Chinese particularity, sentence in business promotion information being segmented to obtain word by segmenting library Library, participle library includes but is not limited to Jieba dictionary, IK dictionary, mmseg dictionary, word dictionary;
Step 5.2, count word frequency: the dictionary that is formed after being segmented in traversal step 5.1, count the frequency of the word occurred and It is numbered;
Step 5.3, it constructs tree-like result: according to there is the probability of occurrence of each word in step 5.2, constructing Huffman tree;
Step 5.4, it generates the binary code where node: the probability of occurrence of each word being converted into binary coding to indicate to walk Each node in rapid 5.3 in Huffman tree;
Step 5.5, the term vector in the intermediate vector and leaf node of each non-leaf nodes is initialized: in the Huffman tree Each node, be all stored with the vector of an a length of m, but the meaning of leaf node and the vector in non-leaf node is different, leaf What is stored in child node is the term vector of each word, is the input as neural network;Rather than what is stored in leaf node is intermediate Vector determines classification results with input corresponding to the parameter of hidden layer in neural network together;
Step 5.6, CBOW model or Skip-Gram model training intermediate vector and word training intermediate vector and term vector: are used Vector finally obtains the correspondence term vector of business promotion information.
6. a kind of business promotion accurate information recommended method based on deep learning according to claim 5, feature exist In, in step 6, by training sample data assemble for training practice LSTM neural network and by test sample data set test LSTM mind Obtain trained LSTM neural network as the method for recommender system through network accuracy the following steps are included:
Step 6.1, using term vector training LSTM neural network: training sample data collection is passed through to the forgetting of LSTM neural network Door starts the information discarding movement of LSTM neural network, and information discarding movement is realized by the sigmoid layer in forgetting door, will be looked into It sees sigmoid layers of previous output and the input of current term vector, determines whether the information of Last status study retains, it is described LSTM neural network includes input gate, forgets door and out gate;
Step 6.2, the training sample data collection after information being abandoned passes through the input gate of LSTM neural network, starts LSTM nerve The information update of network acts, and information update movement is realized by the sigmoid layer in input gate, then will change for tanh layers Each cell state of LSTM neural network, learns knowledge new out;
Step 6.3, by the training sample data collection after information update by the out gate of LSTM neural network, output one to Amount, this vector depend on the cell state in step 6.2;Firstly, sigmoid layers of operation obtains vector and determines cell state Output par, c is handled cell state by tanh layers, and it is multiplied with sigmoid output, and LSTM net is obtained The output information of network;
Step 6.4, it using LSTM network obtained in test data input step 6.3, is exported as a result, with test data Label compare, verify the accuracy of network, if accuracy reaches requirement, complete training obtaining trained LSTM nerve net Network, using trained LSTM neural network as recommender system.
7. a kind of business promotion accurate information recommender system based on deep learning, which is characterized in that the system comprises: storage Device, processor and storage in the memory and the computer program that can run on the processor, the processor The computer program is executed to operate in the unit of following system:
Dataset acquisition unit, for acquiring business promotion information data;
Data pre-processing unit, for carrying out pretreatment to the collected business promotion information data of institute and cleaning obtains business and pushes away Wide message data set;
Feature selection unit, for carrying out dimensionality reduction and feature selecting to business promotion message data set;
Training sample division unit, the business promotion message data set for obtaining feature selecting are divided into training sample data Collection and test sample data set;
Vectorization unit is obtained by word2vec model for training for training sample data collection and test sample data set Term vector;
Recommender system obtaining unit practices LSTM neural network and by test sample data for assembling for training by training sample data Collection test LSTM neural network accuracy obtains trained LSTM neural network as recommender system.
CN201910461767.7A 2019-05-30 2019-05-30 Business promotion information accurate recommendation method and system based on deep learning Active CN110264311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910461767.7A CN110264311B (en) 2019-05-30 2019-05-30 Business promotion information accurate recommendation method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910461767.7A CN110264311B (en) 2019-05-30 2019-05-30 Business promotion information accurate recommendation method and system based on deep learning

Publications (2)

Publication Number Publication Date
CN110264311A true CN110264311A (en) 2019-09-20
CN110264311B CN110264311B (en) 2023-04-18

Family

ID=67915965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910461767.7A Active CN110264311B (en) 2019-05-30 2019-05-30 Business promotion information accurate recommendation method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN110264311B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860981A (en) * 2020-07-03 2020-10-30 航天信息(山东)科技有限公司 Enterprise national industry category prediction method and system based on LSTM deep learning
CN112465389A (en) * 2020-12-12 2021-03-09 广东电力信息科技有限公司 Word frequency-based similar provider recommendation method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204729A1 (en) * 2012-02-08 2013-08-08 Ebay Inc. Systems and methods for reseller discovery and analysis
CN105869024A (en) * 2016-04-20 2016-08-17 北京小米移动软件有限公司 Commodity recommending method and device
CN107153642A (en) * 2017-05-16 2017-09-12 华北电力大学 A kind of analysis method based on neural network recognization text comments Sentiment orientation
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN108256052A (en) * 2018-01-15 2018-07-06 成都初联创智软件有限公司 Automobile industry potential customers' recognition methods based on tri-training
CN108427670A (en) * 2018-04-08 2018-08-21 重庆邮电大学 A kind of sentiment analysis method based on context word vector sum deep learning
CN109635204A (en) * 2018-12-21 2019-04-16 上海交通大学 Online recommender system based on collaborative filtering and length memory network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204729A1 (en) * 2012-02-08 2013-08-08 Ebay Inc. Systems and methods for reseller discovery and analysis
CN105869024A (en) * 2016-04-20 2016-08-17 北京小米移动软件有限公司 Commodity recommending method and device
CN107153642A (en) * 2017-05-16 2017-09-12 华北电力大学 A kind of analysis method based on neural network recognization text comments Sentiment orientation
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN108256052A (en) * 2018-01-15 2018-07-06 成都初联创智软件有限公司 Automobile industry potential customers' recognition methods based on tri-training
CN108427670A (en) * 2018-04-08 2018-08-21 重庆邮电大学 A kind of sentiment analysis method based on context word vector sum deep learning
CN109635204A (en) * 2018-12-21 2019-04-16 上海交通大学 Online recommender system based on collaborative filtering and length memory network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李爱国、库向阳: "数据预处理", 《数据挖掘原理、算法及应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860981A (en) * 2020-07-03 2020-10-30 航天信息(山东)科技有限公司 Enterprise national industry category prediction method and system based on LSTM deep learning
CN111860981B (en) * 2020-07-03 2024-01-19 航天信息(山东)科技有限公司 Enterprise national industry category prediction method and system based on LSTM deep learning
CN112465389A (en) * 2020-12-12 2021-03-09 广东电力信息科技有限公司 Word frequency-based similar provider recommendation method and device

Also Published As

Publication number Publication date
CN110264311B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Zhang et al. Hybrid models for open set recognition
KR102173555B1 (en) Machine learning-based network model building method and apparatus
Vergari et al. Visualizing and understanding sum-product networks
US20100211533A1 (en) Extracting structured data from web forums
CN107193915A (en) A kind of company information sorting technique and device
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN112417289A (en) Information intelligent recommendation method based on deep clustering
DE102021004562A1 (en) Modification of scene graphs based on natural language commands
CN110264311A (en) A kind of business promotion accurate information recommended method and system based on deep learning
CN109299286A (en) The Knowledge Discovery Method and system of unstructured data
CN110830291A (en) Node classification method of heterogeneous information network based on meta-path
Wang et al. Asymmetric graph based zero shot learning
CN112131506B (en) Webpage classification method, terminal equipment and storage medium
CN113535177A (en) Form generation method, device and equipment
CN116595191A (en) Construction method and device of interactive low-code knowledge graph
WO2023155303A1 (en) Webpage data extraction method and apparatus, computer device, and storage medium
Tsai et al. An intelligent recommendation system for animation scriptwriters’ education
CN117033751A (en) Recommended information processing method, recommended information processing device, storage medium and equipment
CN115168609A (en) Text matching method and device, computer equipment and storage medium
CN114840642A (en) Event extraction method, device, equipment and storage medium
CN109086373B (en) Method for constructing fair link prediction evaluation system
Cazzolato et al. A statistical decision tree algorithm for medical data stream mining
CN112835797A (en) Metamorphic relation prediction method based on program intermediate structure characteristics
Walha et al. From user generated content to social data warehouse: Processes, operations and data modelling
Serratosa et al. Graph edit distance or graph edit pseudo-distance?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant