CN110059851A

CN110059851A - The method, apparatus and computer equipment of prediction data variation based on deep learning

Info

Publication number: CN110059851A
Application number: CN201910175768.5A
Authority: CN
Inventors: 吴壮伟
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2019-07-26

Abstract

The method, apparatus and computer equipment for the prediction data variation based on deep learning that this application discloses a kind of, wherein method includes: the predictions request sent according to user terminal, obtains the article issued in appointed website；Judge whether to have in the article area field corresponding with the specified region, the area field is the information field for indicating the geographical location in specified region；If so, extracting the keyword in the article by TF-IDF matrix；The keyword is input to preset term vector model, obtains the corresponding term vector of each keyword；By the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export the first amount of increase coefficient of the Target Attribute values in the specified region.The application objectively analyzes the amount of increase trend of room rate according to article automatically according to reading article on the relevant website of real estate very much.

Description

The method, apparatus and computer equipment of prediction data variation based on deep learning

Technical field

This application involves field of artificial intelligence is arrived, a kind of prediction data change based on deep learning is especially related to The method, apparatus and computer equipment of change.

Background technique

The price of real estate changes in real time, and change conditions can rise or fall because of some current events hot spots.

Media information is very flourishing at present, and any media article in relation to house property information can be all published on media platform, The price of the information and real estate expressed in media article is that have very big incidence relation, and some real estate experts can To judge the substantially tendency of room rate according to the relevant article of real estate.But specific room rate is how to change, real estate is special Family is difficult specifically to judge.

Summary of the invention

The main purpose of the application is to provide the method, apparatus and calculating of a kind of prediction data variation based on deep learning Machine equipment, it is intended to solve the problems, such as that specific judgement can not be made to the amount of increase of room rate according to the content of article in the prior art.

In order to achieve the above-mentioned object of the invention, a kind of method that the application proposes prediction data based on deep learning, comprising:

According to the predictions request that user terminal is sent, the article issued in appointed website is obtained, the predictions request is for referring to Show the Target Attribute values variation for predicting specified region；

Judge whether to have in the article area field corresponding with the specified region, the area field is to indicate to refer to Determine the information field in the geographical location in region；

If so, extracting the keyword in the article by TF-IDF matrix, the TF-IDF matrix is the inverse text of word frequency- This frequency index matrix；

The keyword is input to preset term vector model, obtains the corresponding term vector of each keyword；

By the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export the specified region Target Attribute values the first amount of increase coefficient.

Further, it is described by the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, it is defeated Out after the step of the first amount of increase coefficient of the Target Attribute values in the specified region, comprising:

Obtain the reading quantity, forwarding quantity and number of reviews of the article；

The reading quantity, forwarding quantity and number of reviews are input in preset formula, the text is calculated The weight coefficient of chapter；

By the weight coefficient multiplied by the first amount of increase coefficient, updated second amount of increase coefficient is obtained.

Further,

It is described that the reading quantity, forwarding quantity and number of reviews are input in preset formula, institute is calculated The step of stating the weight coefficient of article, comprising:

The reading quantity is input in preset first formula, it is public that the forwarding quantity is input to preset second In formula, the number of reviews is input in preset third formula, calculates separately to obtain and reads weight coefficient, forwarding weight system Number and comment weight coefficient；

The reading weight coefficient, forwarding weight coefficient are added with comment weight coefficient, obtain the weight of the article Coefficient.

Further, before described the step of judging whether to have in article area field corresponding with the specified region, Include:

Read the first location information in the specified region；

The corresponding administrative hierarchy of the first location information is obtained in preset address base；

Obtained in the preset address base administrative hierarchy the corresponding second location information of a upper grade and The corresponding the third place information of the lower level of the administrative hierarchy；

The first location information, second location information and the third place information are determined as and the specified region pair The area field answered.

Further, it is described by the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, it is defeated Out before the step of the first amount of increase coefficient of the Target Attribute values in the specified region, comprising:

It obtains test term vector and is based on the corresponding amount of increase coefficient of the test term vector, as test sample；

Using the test term vector as the input layer of preset deep neural network DNN model,

Amount of increase coefficient corresponding with the test term vector is as output as a result, being input to the preset depth nerve net In network DNN model, the deep neural network DNN model includes an input layer, multiple hidden layers and an output layer；

The formula of hidden layer is set are as follows: Y=a (W*X+b), wherein X indicates the test term vector, Y indicate output to Amount, b indicate that offset vector, W indicate that the weight matrix of hidden layer, a indicate activation primitive；

The formula that output layer is arranged is softmax function；

The parameter of the preset deep neural network DNN model is initialized；

It is successively anti-upwards after the error for calculating the last one hidden layer and output layer using the method for stochastic gradient descent Objective attribute target attribute value prediction model to the error for finding out each layer, to be adjusted to parameter, after being trained.Further, institute State by the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export the target in the specified region After the step of first amount of increase coefficient of attribute value, comprising:

Judge whether the first amount of increase coefficient is more than preset threshold coefficient；

If so, the first amount of increase coefficient is marked as red.

The application also provides a kind of device of prediction data variation based on deep learning, comprising:

Article module is obtained, the predictions request for sending according to user terminal obtains the article issued in appointed website, institute It states predictions request and is used to indicate the Target Attribute values variation for predicting specified region；

Judgment module, for judging whether to have in the article area field corresponding with the specified region, the area Domain field is the information field for indicating the geographical location in specified region；

Abstraction module, if passing through TF-IDF for there is area field corresponding with the specified region in the article Matrix extracts the keyword in the article, and the TF-IDF matrix is word frequency-inverse document frequency matrix；

It is corresponding to obtain each keyword for the keyword to be input to preset term vector model for term vector module Term vector；

Output module, for by the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, it is defeated First amount of increase coefficient of the Target Attribute values in the specified region out.

Further, the device of the prediction data variation based on deep learning, further includes:

Quantity module is obtained, for obtaining the reading quantity, forwarding quantity and number of reviews of the article；

Weight module is calculated, for the reading quantity, forwarding quantity and number of reviews to be input to preset formula In, the weight coefficient of the article is calculated；

Update module, for the weight coefficient multiplied by the first amount of increase coefficient, to be obtained updated second amount of increase Coefficient.

The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer The step of program, the processor realizes any of the above-described the method when executing the computer program.

The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journey The step of method described in any of the above embodiments is realized when sequence is executed by processor.

The method, apparatus and computer equipment of the prediction data variation based on deep learning of the application, automatically according to room Article is read on the relevant website of real estate, and objectively analyzes very much according to article the amount of increase trend of the data variation of room rate. Amount of increase trend is adjusted according to the reading quantity of article, forwarding quantity, number of reviews, makes the amount of increase of room rate predicted It is more accurate.When predicting the data variation of room rate, while also first judging in article whether to be the room rate for specifying region, more into one The amount of increase of the data of the room rate for making to predict of step is more accurate.

Detailed description of the invention

Fig. 1 is the flow diagram of the method for the prediction data variation based on deep learning of one embodiment of the application；

Fig. 2 is the structural schematic block diagram of the device of the prediction data variation based on deep learning of one embodiment of the application；

Fig. 3 is the structural schematic block diagram of the computer equipment of one embodiment of the application.

The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Referring to Fig.1, the embodiment of the present application provides a kind of method of prediction data variation based on deep learning, including step It is rapid:

S1, the predictions request sent according to user terminal, obtain the article issued in appointed website, and the predictions request is used for Indication predicting specifies the Target Attribute values variation in region；

S2, judge whether to have in the article area field corresponding with the specified region, the area field is table Show the information field in the geographical location in specified region；

S3, if so, extracting the keyword in the article by TF-IDF matrix, the TF-IDF matrix is word frequency- Inverse document frequency matrix；

S4, the keyword is input to preset term vector model, obtains the corresponding term vector of each keyword；

S5, by the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export described specified First amount of increase coefficient of the Target Attribute values in region.

In the present embodiment, the prediction applied to the growth rate of real estate price to a specified region, above-mentioned data variation refer to room The amount of increase of valence.Room rate, that is, Target Attribute values.As described in above-mentioned steps S1, above-mentioned appointed website is the related text of some publication real estates The various and room rates such as the website of chapter, including publication room rate information, policy relevant to room rate, INDUSTRY OVERVIEW relevant with real estate Article that is related or influencing housing price fluctuation.Wherein, appointed website be stored after staff pre-sets in the server, with Just server accesses the website automatically to obtain the article in website.When user needs to predict the room rate in some location, It provides a specific location or region to be packaged into predictions request and then be sent to server, server is visited according to the predictions request It asks the appointed website prestored, obtains the article issued in appointed website.

As described in above-mentioned steps S2, article includes multiple information, including the text information in article；Meanwhile article Information further includes the publication address information of article, and publication address information is i.e. when issuing article in above-mentioned appointed website where terminal Address location.The full content for retrieving article, judges whether there is area field corresponding with specified region in the information of article, Whether whether the text information i.e. in article have above-mentioned zone field or article in the corresponding place publication of above-mentioned zone field Information.Area field be user it should be understood that the corresponding specified region of room rate area field.Such as the name-in a city Name-the Futian District in Shenzhen or an administrative area.Area field is that user is input in server, and user thinks Which place room rate is solved, then inputs the geographical location information in that place to server, server receives the ground of user's input Manage location information then formation zone field.

As described in above-mentioned steps S3, TF-IDF (term frequency-inverse document frequency, word Frequently-inverse document frequency) it is a kind of common weighting technique for information retrieval and data mining.To assess a word For the significance level of an article.The importance of word is directly proportional with the number that it occurs hereof, increases by being somebody's turn to do Item technology reads out each word in file, extracts the keyword in article, keyword is for describing this article substantially Content.Wherein, come out frequency of occurrence more than after certain value, by be more than number word and preset keywords database in word It is compared, sees whether the word more than number is word in keywords database, if so, determine to extract is keyword, It otherwise, is not keyword.What is stored in keywords database is vocabulary relevant to room rate, for example, developer, fund flexibly, thick stick The adverbial word etc. of the relevant noun modification of some pairs of room rates such as bar, real estate, amount of increase, spring, limitation.Extracting keyword Before, stammerer participle and removal stop words processing are removed to the text of article first, avoid extracting some meaningless passes Keyword.Stammer word and stop words processing include " ", " such ", " because of " etc. words.

As described in above-mentioned steps S4, there is provided a kind of methods of mathematicization for term vector, this symbol of natural language Information is converted into the digital information of vector form.Thus solving the problems, such as to be converted into machine the problem of natural language understanding The problem of study.Keyword is input in term vector model and carries out vectorization, obtains vector keyword.Term vector model is pre- First trained.In training term vector model, one-hot Representation model can be used.One-hot Representation is exactly that a word is indicated with a very long vector, and vector length is the size N of dictionary, each vector Only one dimension is 1, remaining dimension all 0, indicates the word in the position of dictionary for 1 position.This One-hot Representation is stored using sparse mode, and the process of vectorization is very succinct.

As described in above-mentioned steps S5, after the term vector that above-mentioned keyword obtains is input to preset room rate prediction model, According to the logic after training, the appreciation or depreciation for predicting the room rate in the corresponding region of the area field become room rate prediction model Gesture calculates the first amount of increase coefficient of room rate.Room rate prediction model, that is, objective attribute target attribute value prediction model.

In one embodiment, it is above-mentioned by the term vector be input to it is trained after obtained objective attribute target attribute value prediction model Afterwards, after the step of exporting the first amount of increase coefficient of the Target Attribute values in the specified region, comprising:

S6, the reading quantity for obtaining the article, forwarding quantity and number of reviews；

S7, the reading quantity, forwarding quantity and number of reviews are input in preset formula, are calculated described The weight coefficient of article；

S8, by the weight coefficient multiplied by the first amount of increase coefficient, obtain updated second amount of increase coefficient.

In a specific embodiment, it is above-mentioned by the reading quantity, forwarding quantity and number of reviews be input to it is default Formula in, the step of weight coefficient of the article is calculated, comprising:

S71, the reading quantity is input in preset first formula, the forwarding quantity is input to preset In two formula, the number of reviews is input in preset third formula, calculates separately to obtain and reads weight coefficient, forwarding power Weight coefficient and comment weight coefficient；

S72, the reading weight coefficient, forwarding weight coefficient are added with comment weight coefficient, obtain the article Weight coefficient.

In the present embodiment, after having got specified article, the reading quantity, forwarding quantity and comment of this article are obtained Quantity.The information of this article is obtained on website respectively, wherein the information of this article just includes above three quantity.Then it will obtain Three quantity got are input in preset formula, and the weight coefficient of this article is calculated.Three quantity are in general With weight coefficient correlation, i.e. three quantity and bigger, corresponding weight coefficient is bigger, and the influence to room rate is got over Greatly.Then the weight coefficient is obtained into updated second amount of increase coefficient multiplied by above-mentioned first amount of increase coefficient.Updated second The first amount of increase coefficient that amount of increase coefficient is compared before updating is more objective and accurate.In one embodiment, weight coefficient=reading Weight coefficient+forwarding weight coefficient+comment weight coefficient；Wherein, preset first formula for reading weight coefficient f (x) is calculated (1) as follows:

In above-mentioned formula (1), x indicates the reading quantity of above-mentioned article.

Preset second formula (2) for calculating forwarding weight coefficient f (y) is as follows:

In above-mentioned formula (2), y indicates the forwarding quantity of above-mentioned article.

The preset third formula (3) for calculating comment weight coefficient f (z) is as follows:

In above-mentioned formula (3), z indicates the number of reviews of above-mentioned article.

After the information for obtaining this article, quantity, forwarding quantity and number of reviews will be read respectively and is input to above-mentioned first In formula, the second formula and third formula, corresponding weight coefficient is respectively obtained, then again by the corresponding weight of these three quantity Coefficient is added, and the corresponding weight coefficient of article is calculated.Then the weight coefficient is obtained multiplied by above-mentioned first amount of increase coefficient Updated second amount of increase coefficient.

In one embodiment, the step of article issued in above-mentioned acquisition appointed website, comprising:

S11, encapsulation appointed website to Docker container；

S12, pass through and dispose Docker container on different machines, build distributed reptile；

S13, the article is crawled in appointed website by distributed reptile.

In the present embodiment, after multiple appointed websites are input to server by staff, server is by multiple appointed websites It is encapsulated into Docker container, then Docker container is sent to in server the machine of deployment crawler, carry out building distribution Formula crawler crawls the article in appointed website then after crawler machine is started by code, the article hair that then will acquire Server is given, quickly crawls the article issued in each appointed website in this way convenient for server.

In one embodiment, above-mentioned the step of judging whether to have in article area field corresponding with the specified region Before, comprising:

S201, the first location information for reading the specified region；

S202, the corresponding administrative hierarchy of the first location information is obtained in preset address base；

S203, the corresponding second location information of a upper grade that the administrative hierarchy is obtained in the preset address base And the corresponding the third place information of lower level of the administrative hierarchy；

S203, the first location information, second location information and the third place information are determined as specifying with described The corresponding area field in region.

In the present embodiment, user wants to predict the room rate in the house in a specified region, then inputs the specified area The first location information in domain, location information can refer to that the name such as safety financial center of a specific building or cell is big Tall building is also possible to a piece of more general region such as Technology Park section；Then the preset address base of server calls, address base are to mark The address base being arranged based on quasi- administrative region.It include many address informations in address base, and each address information pair Answer at least one administrative hierarchy.Highest administrative hierarchy in address base is provincial administrative area, and the second high administrative hierarchy is ground Grade administrative area, and so on, the administrative hierarchy high to the 4th is staff according to the specific rules in each street administrative area voluntarily The lower administrative area of administrative hierarchy is set, is the location information set in a multiple grades administrative area in address base.It is such as above-mentioned The corresponding highest administrative hierarchy of safety financial center mansion is Guangdong Province, and the second high administrative hierarchy is Shenzhen, third High administrative hierarchy is Futian District, and the 4th high administrative hierarchy is Feitian street, the 5th high administrative hierarchy be staff from The central city of definition, the 6th high administrative hierarchy are safety financial center mansions.Server gets user terminal input The first location information in specified region is safety financial center mansion, then it is flat that the first location information is found in address base Pacify financial center mansion, is to belong to the 6th high administrative hierarchy, then finding the administrative hierarchy of a grade is the 5th high row Political affairs grade, the 5th high administrative hierarchy is central city.Since the 6th grade of information is the minimum administrative hierarchy of rank, no longer Search the administrative hierarchy of lower level；Then by the corresponding second location information of this administrative hierarchy of central city and safety gold Melt the first location information of Center Building as the specified corresponding area field in region.Specifically, also by central city Zhong Bao The 6th grade of information of others under the more specific location information included such as central city is incorporated as region word as area field Section.

In one embodiment, it is above-mentioned by the term vector be input to it is trained after obtained objective attribute target attribute value prediction model Afterwards, before the step of exporting the first amount of increase coefficient of the Target Attribute values in the specified region, comprising:

S501, it obtains test term vector and is based on the corresponding amount of increase coefficient of the test term vector, as test sample；

S502, using the test term vector as the input layer of preset deep neural network DNN model, with the test The corresponding amount of increase coefficient of term vector is as output as a result, being input in the preset deep neural network DNN model, the depth Spending neural network DNN model includes an input layer, multiple hidden layers and an output layer；

S503, the formula that hidden layer is set are as follows: Y=a (W*X+b), wherein X indicates that the test term vector, Y indicate defeated Outgoing vector, b indicate that offset vector, W indicate that the weight matrix of hidden layer, a indicate activation primitive；

S504, the formula that output layer is arranged are softmax function；

S505, the parameter of the preset deep neural network DNN model is initialized；

S506, the method using stochastic gradient descent, after the error for calculating the last one hidden layer and output layer, successively to On reversely find out the error of each layer, the objective attribute target attribute value prediction model to be adjusted to parameter, after being trained.

In the present embodiment, in training objective attribute value prediction model, it is based on a neural network model.First according to The area field that machine goes out acquires the article in appointed website and extracts the keyword in this article, is then input to term vector To obtain the term vector in test sample in model, test term vector is obtained；Then the upper of moment is issued according to publication article It is same at the time of stating after the first room rate and article in the corresponding position of area field issue the moment after preset time period Second room rate of position, the amount of increase according to the second room rate with respect to the first room rate, obtains the amount of increase coefficient in test sample.It gets After above-mentioned test term vector and corresponding amount of increase coefficient, as a test sample.Method obtains multiple test samples according to this, Then a neural network model is selected, specifically, selected depth neural network DNN model, model tool is of five storeys, first layer Input layer, for inputting term vector, second and third, four layers be hidden layer, layer 5 is output layer, for by above-mentioned term vector Corresponding amount of increase coefficient is as reference.Then above-mentioned term vector is input in the model by input layer.Model automatically according to The input results of the input layer and built-in formula are calculated as a result, then by the result being calculated and output layer Amount of increase coefficient be compared.Model is initialized first before calculating, and making the parameters in model is 0.Meanwhile For model before calculating, the calculation method of the hidden layer in model is arranged in server, and calculation formula is Y=a (W*X+b), wherein X Indicate that the test term vector, Y indicate output vector, b indicates that offset vector, W indicate that the weight matrix of hidden layer, a indicate to swash Function living.And the formula of output layer is set using softmax function.After having set, server Controlling model reads input layer Test term vector, then will test term vector be input in the formula of hidden layer, the first hidden layer is calculated, and obtains first As a result, result is input to the second hidden layer, the first knot that the second hidden layer again calculates the first hidden layer again by the first hidden layer In formula of the fruit as condition entry to above-mentioned hidden layer, then obtain second as a result, third hidden layer i.e. the last one hide Layer is using the second result as in the formula of condition entry to above-mentioned hidden layer, obtaining third as a result, being sent to output layer, output layer Third result is input in softmax function, the amount of increase coefficient after being trained, then by the amount of increase coefficient after training and exported The amount of increase coefficient of layer is compared, and according to the error compared, is adjusted, is obtained to the parameters of formula b and W in above-mentioned hidden layer Objective attribute target attribute value prediction model after to training.One test term vector of every training, to the objective attribute target attribute value prediction model into Row optimization.

S9, judge whether the first amount of increase coefficient is more than preset threshold coefficient；

S10, if so, the first amount of increase coefficient is marked as red.

In the present embodiment, threshold coefficient is that staff is pre-set, and when growth rate of real estate price is more than certain value, explanation is room The variation of valence is important messages, needs to arouse people's attention.First amount of increase is marked as red, is different from other data, is being added When carrying on the display apparatus, the attention of staff can be caused.It therefore, will when the first amount of increase coefficient is more than threshold coefficient First amount of increase coefficient is marked as red.

It is in one embodiment, above-mentioned to be marked as the first amount of increase coefficient after red step, comprising:

S101, specified terminal is sent by the first amount of increase coefficient.

In the present embodiment, specified terminal refers to the correspondent party of the reserved client for needing to buy house property in the server Formula, including cell-phone number, mailbox, user account of server etc..When the first amount of increase coefficient be more than the coefficient threshold, similarly also need It is sent to client, so that client recognizes rapidly the case where substantially rising of room rate.

In conclusion the method for the prediction data variation based on deep learning of the application, automatically according to real estate correlation Website on read article, and the amount of increase trend of room rate is objectively analyzed very much according to article.According to the reading quantity of article, It forwards quantity, number of reviews to be adjusted amount of increase trend, keeps the amount of increase of the room rate predicted more accurate.In prediction room rate When, while also first judging in article whether to be the room rate for specifying region, further make the amount of increase of the room rate predicted more Accurately.

Referring to Fig. 2, a kind of device of prediction data variation based on deep learning, packet are also provided in the embodiment of the present application It includes:

Article module 1 is obtained, the predictions request for being sent according to user terminal obtains the article issued in appointed website, The predictions request is used to indicate the Target Attribute values variation for predicting specified region；

Judgment module 2, for judging whether to have in the article area field corresponding with the specified region, the area Domain field is the information field for indicating the geographical location in specified region；

Abstraction module 3, if passing through TF-IDF for there is area field corresponding with the specified region in the article Matrix extracts the keyword in the article, and the TF-IDF matrix is word frequency-inverse document frequency matrix；

Term vector module 4, the term vector model for the keyword to be input to obtain the corresponding word of each keyword Vector；

Output module 5, for by the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, it is defeated First amount of increase coefficient of the Target Attribute values in the specified region out.

In the present embodiment, applied to the prediction of the growth rate of real estate price to a specified region, above-mentioned data variation refers to room rate Amount of increase.Room rate, that is, Target Attribute values.Above-mentioned appointed website is the website of some publication real estate related articles, including publication room Valence information, policy relevant to room rate, INDUSTRY OVERVIEW relevant with real estate etc. are various related with room rate or influence housing price fluctuation Article.Wherein, appointed website is to store after staff pre-sets in the server, to obtain article module 1 certainly The dynamic website that accesses is to obtain the article in website.When user needs to predict the room rate in some location, a tool is provided The location or region of body are packaged into predictions request and then are sent to server, obtain article module 1 according to the predictions request, access is pre- The appointed website deposited obtains the article issued in appointed website.

Article includes multiple information, including the text information in article；Meanwhile the information of article further includes the hair of article Cloth address information, publication address information are the address location when issuing article in above-mentioned appointed website where terminal.Retrieval text The full content of chapter, judgment module 2 judges whether there is area field corresponding with specified region in the information of article, i.e., in article Text information whether have above-mentioned zone field or article whether the corresponding place publication of above-mentioned zone field information.Area Domain field be user it should be understood that the corresponding specified region of room rate area field.Such as name-the Shenzhen in a city, or Person is the name-Futian District in an administrative area.Area field is that user is input in server, and user wants which place understands Room rate, then input the geographical location information in that place to server, judgment module 2 receives the geographical location letter of user's input Breath and then formation zone field.

TF-IDF (term frequency-inverse document frequency, word frequency-inverse document frequency) It is a kind of common weighting technique for information retrieval and data mining.To assess a word for the important of article Degree.The importance of word is directly proportional with the number that it occurs hereof, and increase is read out in file by this technology Each word, abstraction module 3 extracts the keyword in article, and keyword is used to describe the general contents of this article.Wherein, Frequency of occurrence is come out more than after certain value, the word more than number is compared with the word in preset keywords database, See that whether the word more than number is word in keywords database, if so, determine to extract is keyword, otherwise, is not Keyword.What is stored in keywords database is vocabulary relevant to room rate, for example, developer, fund flexibly, lever, real estate, The adverbial word etc. of the relevant noun modification of some pairs of room rates such as amount of increase, spring, limitation.Abstraction module 3 before extracting keyword, Stammerer participle and removal stop words processing are removed to the text of article first, avoid extracting some meaningless keys Word.Stammer word and stop words processing include " ", " such ", " because of " etc. words.

There is provided a kind of methods of mathematicization for term vector, and this symbolic information of natural language is converted into vector form Digital information.Thus solving the problems, such as the problem of being converted into machine learning the problem of natural language understanding.Term vector Keyword is input in term vector module and carries out vectorization by module 4, obtains vector keyword.Term vector model is preparatory training It crosses.In training term vector model, one-hot Representation model can be used.One-hot Representation is exactly that a word is indicated with a very long vector, and vector length is the size N of dictionary, each vector Only one dimension is 1, remaining dimension all 0, indicates the word in the position of dictionary for 1 position.This One-hot Representation is stored using sparse mode, and the process of vectorization is very succinct.

After the term vector that above-mentioned keyword obtains is input to preset objective attribute target attribute value prediction model, Target Attribute values are pre- Model is surveyed according to the logic after training, output module 5 exports appreciation or the depreciation of the room rate in the corresponding region of the area field Trend calculates the first amount of increase coefficient of room rate.Room rate prediction model, that is, objective attribute target attribute value prediction model.

In one embodiment, the device of the above-mentioned prediction data variation based on deep learning further include:

In one embodiment, above-mentioned calculating weight module includes:

First computing unit, for the reading quantity to be input in preset first formula, by the forwarding quantity It is input in preset second formula, the number of reviews is input in preset third formula, calculates separately and is read Weight coefficient, forwarding weight coefficient and comment weight coefficient；

Second computing unit is obtained for the reading weight coefficient, forwarding weight coefficient to be added with comment weight coefficient To the weight coefficient of the article.

In the present embodiment, after having got specified article, obtains quantity module and obtain the reading quantity of this article, forwarding Quantity and number of reviews.The information of this article is obtained on website respectively, wherein the information of this article just includes above three Quantity.Then three quantity that calculating weight module will acquire are input in preset formula, and the power of this article is calculated Weight coefficient.Three quantity in general with weight coefficient correlation, i.e. three quantity and bigger, corresponding weight Coefficient is bigger, and the influence to room rate is bigger.Then update module obtains more by the weight coefficient multiplied by above-mentioned first amount of increase coefficient The second amount of increase coefficient after new.The first amount of increase coefficient that updated second amount of increase coefficient is compared before updating is more objective and accurate. In one embodiment, weight coefficient=reading weight coefficient+forwarding weight coefficient+comment weight coefficient；Wherein, calculating is read Preset first formula (1) for reading weight coefficient f (x) is as follows:

The preset formula (2) for calculating forwarding weight coefficient f (y) is as follows:

The preset formula (3) for calculating comment weight coefficient f (z) is as follows:

After the information for obtaining this article, it is defeated that the first computing unit will read quantity, forwarding quantity and number of reviews respectively Enter into above-mentioned first formula, the second formula and third formula, respectively obtains corresponding weight coefficient, then the second computing unit The corresponding weight coefficient of these three quantity is added again, the corresponding weight coefficient of article is calculated.Then by the weight coefficient Multiplied by above-mentioned first amount of increase coefficient, updated second amount of increase coefficient is obtained.

In one embodiment, above-mentioned acquisition article module 1 includes:

Encapsulation unit, for encapsulating appointed website to Docker container；

Unit is built, for building distributed reptile by disposing Docker container on different machines；

Unit is crawled, for crawling the article in appointed website by distributed reptile.

In the present embodiment, after multiple appointed websites are input to server by staff, encapsulation unit is by multiple specified nets Station is encapsulated into Docker container, then build unit by Docker container be sent to deployment crawler machine, built Distributed reptile crawls unit and crawls article in appointed website, then will obtain then after crawler machine is started by code The article got is sent to server, quickly crawls the article issued in each appointed website in this way convenient for server.

Reading position module, for reading the first location information in the specified region；

Grade module is obtained, for obtaining the corresponding administrative hierarchy of the first location information in preset address base；

Position module is obtained, the upper grade for obtaining the administrative hierarchy in the preset address base is corresponding The corresponding the third place information of the lower level of second location information and the administrative hierarchy；

Field module is determined, for determining the first location information, second location information and the third place information For area field corresponding with the specified region.

In the present embodiment, user wants to predict the room rate in the house in a specified region, then inputs the specified area The first location information in domain, location information can refer to that the name such as safety financial center of a specific building or cell is big Tall building is also possible to a piece of more general region such as Technology Park section；Then reading position module calls preset address base, address base It is the address base being arranged based on standard administrative region.It include many address informations in address base, and each address Information corresponds at least one administrative hierarchy.Highest administrative hierarchy in address base is provincial administrative area, and the second high administration is Local administrative area, and so on, the administrative hierarchy high to the 4th be staff according to the specific rules in each street administrative area from The lower administrative area of administrative hierarchy is arranged in row, is the location information set in a multiple grades administrative area in address base.It is such as above-mentioned The corresponding highest administrative hierarchy in safety financial center mansion be Guangdong Province, the second high administrative hierarchy is Shenzhen, third High administrative hierarchy is Futian District, and the 4th high administrative hierarchy is Feitian street, the 5th high administrative hierarchy be staff from The central city of definition, the 6th high administrative hierarchy are safety financial center mansions.Reading position module gets user terminal The first location information in the specified region of input is safety financial center mansion, then obtains grade module and find in address base The first location information is safety financial center mansion, is to belong to the 6th high administrative hierarchy, obtains position module and finds again The administrative hierarchy of a upper grade is the 5th high administrative hierarchy, and the 5th high administrative hierarchy is central city.Due to the 6th grade Information is the administrative hierarchy of rank lowermost level, therefore no longer searches the administrative hierarchy of lower level；Then determine field module by city The first location information conduct of the corresponding second location information of this administrative hierarchy of central area and safety financial center mansion The specified corresponding area field in region.Specifically, also by the more specific location information for including in central city such as central city Under the 6th grade of information of others as area field, be incorporated as area field.

Sample module is obtained, for obtaining test term vector and being based on the corresponding amount of increase coefficient of the test term vector, As test sample；

Input module, for using the test term vector as the input layer of preset deep neural network DNN model, with The corresponding amount of increase coefficient of the test term vector is as output as a result, being input to the preset deep neural network DNN model In, the deep neural network DNN model includes an input layer, multiple hidden layers and an output layer；

First setup module, for the formula of hidden layer to be arranged are as follows: Y=a (W*X+b), wherein X indicates the test words Vector, Y indicate output vector, and b indicates that offset vector, W indicate that the weight matrix of hidden layer, a indicate activation primitive；

Second setup module, the formula for output layer to be arranged are softmax function；

Initialization module is initialized for the parameter to the preset deep neural network DNN model；

Training module calculates the error of the last one hidden layer and output layer for the method using stochastic gradient descent Afterwards, the error of each layer is successively reversely found out upwards, and to be adjusted to parameter, the Target Attribute values after being trained predict mould Type.

In the present embodiment, in training objective attribute value prediction model, it is based on a neural network model.First according to The area field that machine goes out obtains the article in sample module acquisition appointed website and extracts the keyword in this article, then It is input in term vector model to obtain the term vector in test sample, obtains test term vector；Then according to publication article It issues after the first room rate and article in the corresponding position of above-mentioned zone field at moment issue the moment after preset time period At the time of same location the second room rate, according to the second room rate with respect to the first room rate amount of increase, obtain rising in test sample Width coefficient.After getting above-mentioned test term vector and corresponding amount of increase coefficient, as a test sample.Method obtains according to this Then multiple test samples select a neural network model, specifically, selected depth neural network DNN model, model tool Be of five storeys, first layer is input layer, for inputting term vector, second and third, four layers be hidden layer, layer 5 is output layer, is used for Using the corresponding amount of increase coefficient of above-mentioned term vector as reference.Then above-mentioned term vector is input to this by input layer by input module In model.Model is calculated as a result, then will meter automatically according to the input results of the input layer and built-in formula Obtained result is compared with the amount of increase coefficient of output layer.Model is before calculating, and initialization module is initialized first, Making the parameters in model is 0.Meanwhile model, before calculating, the meter of the hidden layer in model is arranged in the first setup module Calculation method, calculation formula are Y=a (W*X+b), wherein X indicates that the test term vector, Y indicate output vector, and b indicates offset Vector, W indicate that the weight matrix of hidden layer, a indicate activation primitive.The formula that output layer is arranged in second setup module uses Softmax function.After having set, then test term vector is input to hiding by the test term vector of model read input layer In the formula of layer, the first hidden layer is calculated, and obtains first as a result, result is input to second again by the first hidden layer hides Layer, the first result that the second hidden layer again calculates the first hidden layer is as in the formula of condition entry to above-mentioned hidden layer, so Obtain afterwards second as a result, third hidden layer i.e. the last one hidden layer using the second result as condition entry to above-mentioned hidden layer In formula, third is obtained as a result, being sent to output layer, and third result is input in softmax function by output layer, is trained Amount of increase coefficient after training is compared by amount of increase coefficient afterwards, training module with the amount of increase coefficient of output layer again, according to comparing Error out is adjusted the parameters of formula b and W in above-mentioned hidden layer, the objective attribute target attribute value prediction model after being trained. One test term vector of every training, optimizes the objective attribute target attribute value prediction model.

Judgement factor module, for judging whether the first amount of increase coefficient is more than preset threshold coefficient；

Mark module, if being more than preset threshold coefficient for the first amount of increase coefficient, by first amount of increase system Number is marked as red.

In the present embodiment, threshold coefficient is that staff is pre-set, when judgement factor module judges that growth rate of real estate price is super Certain value is crossed, explanation is that the variation of room rate is important messages, needs to arouse people's attention.Mark module marks the first amount of increase coefficient At red, other data are different from, when loading on the display apparatus, the attention of staff can be caused.Therefore, when When one amount of increase coefficient is more than threshold coefficient, the first amount of increase coefficient is marked as red.

Sending module, for sending specified terminal for the first amount of increase coefficient.

In the present embodiment, specified terminal refers to the correspondent party of the reserved client for needing to buy house property in the server Formula, including cell-phone number, mailbox, user account of server etc..When the first amount of increase coefficient be more than the coefficient threshold, similarly also need Want sending module that the first amount of increase coefficient is sent to client immediately, so that client recognizes rapidly the feelings of room rate substantially to rise Condition.

In conclusion the device of the prediction data variation based on deep learning of the application, automatically according to real estate correlation Website on read article, and the amount of increase trend of room rate is objectively analyzed very much according to article.According to the reading quantity of article, It forwards quantity, number of reviews to be adjusted amount of increase trend, keeps the amount of increase of the room rate predicted more accurate.In prediction room rate When, while also first judging in article whether to be the room rate for specifying region, further make the amount of increase of the room rate predicted more Accurately.

Referring to Fig. 3, a kind of computer equipment is also provided in the embodiment of the present application, which can be server, Its internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.It should The database of computer equipment is for storing the data such as preset area field, term vector model.The network of the computer equipment Interface is used to communicate with external terminal by network connection.To realize that one kind is based on when the computer program is executed by processor The method of the prediction data variation of deep learning.

Above-mentioned processor executes the step of method of the above-mentioned prediction data variation based on deep learning: being sent out according to user terminal The predictions request sent, obtains the article issued in appointed website, and the predictions request is used to indicate the target for predicting specified region Attribute value variation；Judge whether to have in the article area field corresponding with the specified region, the area field is table Show the information field in the geographical location in specified region；If so, extracting the keyword in the article, institute by TF-IDF matrix Stating TF-IDF matrix is word frequency-inverse document frequency matrix；The keyword is input to preset term vector model, is obtained To the corresponding term vector of each keyword；By the term vector be input to it is trained after obtained objective attribute target attribute value prediction model Afterwards, the first amount of increase coefficient of the Target Attribute values in the specified region is exported.

In one embodiment, above-mentioned processor execute it is described by the term vector be input to it is trained after obtained target After attribute value prediction model, after the step of exporting the first amount of increase coefficient of the Target Attribute values in the specified region, comprising: obtain Take the reading quantity, forwarding quantity and number of reviews of the article；By the reading quantity, forwarding quantity and number of reviews It is input in preset formula, the weight coefficient of the article is calculated；By the weight coefficient multiplied by first amount of increase Coefficient obtains updated second amount of increase coefficient.

In one embodiment, above-mentioned processor executes described by the reading quantity, forwarding quantity and number of reviews The step of being input in preset formula, the weight coefficient of the article be calculated, comprising: be input to the reading quantity In preset first formula, the forwarding quantity is input in preset second formula, the number of reviews is input to pre- If third formula in, calculate separately to obtain and read weight coefficient, forwarding weight coefficient and comment weight coefficient；By the reading Weight coefficient, forwarding weight coefficient are added with comment weight coefficient, obtain the weight coefficient of the article.

In one embodiment, above-mentioned processor execute it is described judge whether to have in article it is corresponding with the specified region Before the step of area field, comprising: read the first location information in the specified region；Institute is obtained in preset address base State the corresponding administrative hierarchy of first location information；A upper grade pair for the administrative hierarchy is obtained in the preset address base The corresponding the third place information of the lower level of the second location information and the administrative hierarchy answered；The first position is believed Breath, second location information and the third place information are determined as area field corresponding with the specified region.

In one embodiment, above-mentioned processor execute it is described by the term vector be input to it is trained after obtained target After attribute value prediction model, before the step of exporting the first amount of increase coefficient of the Target Attribute values in the specified region, comprising: obtain It takes test term vector and is based on the corresponding amount of increase coefficient of the test term vector, as test sample；By the test words to The input layer as preset deep neural network DNN model is measured, amount of increase coefficient corresponding with the test term vector is as defeated Out as a result, being input in the preset deep neural network DNN model, the deep neural network DNN model includes one Input layer, multiple hidden layers and an output layer；The formula of hidden layer is set are as follows: Y=a (W*X+b), wherein described in X expression Term vector is tested, Y indicates output vector, and b indicates that offset vector, W indicate that the weight matrix of hidden layer, a indicate activation primitive；If The formula for setting output layer is softmax function；The parameter of the preset deep neural network DNN model is initialized； Using the method for stochastic gradient descent, after the error for calculating the last one hidden layer and output layer, successively reversely find out upwards each The error of layer, the objective attribute target attribute value prediction model to be adjusted to parameter, after being trained.

In one embodiment, above-mentioned processor execute it is described by the term vector be input to it is trained after obtained target After attribute value prediction model, after the step of exporting the first amount of increase coefficient of the Target Attribute values in the specified region, comprising: sentence Whether the first amount of increase coefficient that breaks is more than preset threshold coefficient；If so, the first amount of increase coefficient is marked as red.

In conclusion the computer equipment of the application is automatically according to reading article on the relevant website of real estate, and according to Article objectively analyzes the amount of increase trend of room rate very much.According to the reading quantity of article, forwarding quantity, number of reviews come to rising Width trend is adjusted, and keeps the amount of increase of the room rate predicted more accurate.When predicting room rate, at the same also first judge be in article No is the room rate in specified region, further makes the amount of increase of the room rate predicted more accurate.

It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.

One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates A kind of machine program realizes prediction data variation based on deep learning method when being executed by processor, specifically: according to user The predictions request sent is held, the article issued in appointed website is obtained, the predictions request, which is used to indicate, predicts specified region Target Attribute values variation；Judge whether to have in the article area field corresponding with the specified region, the area field It is the information field for indicating the geographical location in specified region；If so, extracting the key in the article by TF-IDF matrix Word, the TF-IDF matrix are word frequency-inverse document frequency matrixes；The keyword is input to preset term vector mould Type obtains the corresponding term vector of each keyword；By the term vector be input to it is trained after obtain Target Attribute values prediction After model, the first amount of increase coefficient of the Target Attribute values in the specified region is exported.

In conclusion the computer readable storage medium of the application is literary automatically according to reading on the relevant website of real estate Chapter, and objectively analyze very much according to article the amount of increase trend of room rate.According to the reading quantity of article, forwarding quantity, comment Quantity is adjusted amount of increase trend, keeps the amount of increase of the room rate predicted more accurate.When predicting room rate, while also first sentencing Whether it is the room rate for specifying region in disconnected article, further makes the amount of increase of the room rate predicted more accurate.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.

The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations Technical field, similarly include in the scope of patent protection of the application.

Claims

1. a kind of method of the prediction data variation based on deep learning characterized by comprising

According to the predictions request that user terminal is sent, the article issued in appointed website is obtained, the predictions request is used to indicate pre- Survey the Target Attribute values variation in specified region；

Judge whether to have in the article area field corresponding with the specified region, the area field is to indicate specified area The information field in the geographical location in domain；

If so, extracting the keyword in the article by TF-IDF matrix, the TF-IDF matrix is the inverse text frequency of word frequency- Rate exponential matrix；

By the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export the mesh in the specified region Mark the first amount of increase coefficient of attribute value.

2. the method for the prediction data variation based on deep learning as described in claim 1, which is characterized in that it is described will be described Term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export the Target Attribute values in the specified region After the step of first amount of increase coefficient, comprising:

The reading quantity, forwarding quantity and number of reviews are input in preset formula, the article is calculated Weight coefficient；

3. the method for the prediction data variation based on deep learning as claimed in claim 2, which is characterized in that it is described will be described It reads quantity, forwarding quantity and number of reviews to be input in preset formula, the weight coefficient of the article is calculated Step, comprising:

The reading quantity is input in preset first formula, the forwarding quantity is input to preset second formula In, the number of reviews is input in preset third formula, calculates separately to obtain and reads weight coefficient, forwarding weight coefficient With comment weight coefficient；

The reading weight coefficient, forwarding weight coefficient are added with comment weight coefficient, obtain the weight coefficient of the article.

4. the method for the prediction data variation based on deep learning as described in claim 1, which is characterized in that the judgement text Before whether having the step of area field corresponding with the specified region in chapter, comprising:

Read the first location information in the specified region；

The corresponding second location information of a upper grade of the administrative hierarchy and described is obtained in the preset address base The corresponding the third place information of the lower level of administrative hierarchy；

The first location information, second location information and the third place information are determined as corresponding with the specified region Area field.

5. the method for the prediction data variation based on deep learning as described in claim 1, which is characterized in that it is described will be described Term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export the Target Attribute values in the specified region Before the step of first amount of increase coefficient, comprising:

Using the test term vector as the input layer of preset deep neural network DNN model, with the test term vector pair The amount of increase coefficient answered is as output as a result, being input in the preset deep neural network DNN model, the depth nerve net Network DNN model includes an input layer, multiple hidden layers and an output layer；

The formula of hidden layer is set are as follows: Y=a (W*X+b), wherein X indicates that the test term vector, Y indicate output vector, b table Show that offset vector, W indicate that the weight matrix of hidden layer, a indicate activation primitive；

The formula that output layer is arranged is softmax function；

The parameter of the preset deep neural network DNN model is initialized；

Using the method for stochastic gradient descent, after the error for calculating the last one hidden layer and output layer, successively reversely ask upwards The error of each layer out, the objective attribute target attribute value prediction model to be adjusted to parameter, after being trained.

6. the method for the prediction data variation based on deep learning as described in claim 1, which is characterized in that it is described will be described Term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export the Target Attribute values in the specified region After the step of first amount of increase coefficient, comprising:

If so, the first amount of increase coefficient is marked as red.

7. a kind of device of the prediction data variation based on deep learning characterized by comprising

Article module is obtained, the predictions request for sending according to user terminal obtains the article issued in appointed website, described pre- It surveys request and is used to indicate the Target Attribute values variation for predicting specified region；

Judgment module, for judging whether to have in the article area field corresponding with the specified region, the region word Section is the information field for indicating the geographical location in specified region；

Abstraction module, if passing through TF-IDF matrix for there is area field corresponding with the specified region in the article The keyword in the article is extracted, the TF-IDF matrix is word frequency-inverse document frequency matrix；

Term vector module obtains the corresponding word of each keyword for the keyword to be input to preset term vector model Vector；

Output module, for by the term vector be input to it is trained after after obtained objective attribute target attribute value prediction model, export institute State the first amount of increase coefficient of the Target Attribute values in specified region.

8. the device of the prediction data variation based on deep learning as claimed in claim 7, which is characterized in that further include:

Weight module is calculated, for the reading quantity, forwarding quantity and number of reviews to be input in preset formula, is counted Calculate the weight coefficient for obtaining the article；

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 6 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 6 is realized when being executed by processor.