CN105808744A - Information prediction method and device - Google Patents

Information prediction method and device Download PDF

Info

Publication number
CN105808744A
CN105808744A CN201610141577.3A CN201610141577A CN105808744A CN 105808744 A CN105808744 A CN 105808744A CN 201610141577 A CN201610141577 A CN 201610141577A CN 105808744 A CN105808744 A CN 105808744A
Authority
CN
China
Prior art keywords
information
condition
describes
characteristic vector
preset time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610141577.3A
Other languages
Chinese (zh)
Inventor
朱琛
祝恒书
丁鹏亮
熊辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610141577.3A priority Critical patent/CN105808744A/en
Publication of CN105808744A publication Critical patent/CN105808744A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The application discloses an information prediction method and device. One specific embodiment of the method comprises the following steps: acquiring a plurality of pieces of condition description information; grouping the condition description information according to preset time intervals, and dividing the condition description information of which publishing time belongs to a same time interval into one group; extracting feature vectors from the groups of condition description information respectively; analyzing the feature vectors through a machine learning method, and acquiring association relationships between the condition description information within adjacent preset time intervals; and predicting condition status information of a next preset time interval based on the association relationships. Through the embodiment, effective information prediction can be realized.

Description

The method and apparatus of information prediction
Technical field
The application relates to field of computer technology, is specifically related to Internet technical field, the method and apparatus particularly relating to a kind of information prediction.
Background technology
Development along with Internet technology, increasing user describes information by some conditions of Web Publishing, here, condition describe information can be user by some condition entries describe obtain for other users after can carry out the information of condition coupling according to described condition entry, for instance the recruitment information (such as can include the recruitment condition entry such as post, Requirements) etc. that enterprise issues by recruiting class website etc..At present, these conditions describe the issue of information and tend to rely on the subjective experience of publisher, and lack condition is described information entirety situation objective, predict timely.For recruitment information, issuing which type of recruitment information, issue how many recruitment informations, be often based upon business manpower manager on the basis of personnel transfer in the subjective experience of enterprise demand and industry, the subjective factors impact of individual is bigger.In some cases, this published method is unfavorable for that effectively issuing condition describes information.Such as, when the required talent of enterprise is exactly the current the most popular talent, the selection of enterprise is better than enterprise's selection to the talent by this talent, then enterprise needs to pay more labour and financial resources carry out personnel recruitment, and the recruitment information now issued may result in the raising of enterprises recruitment cost.
Therefore, current conditional information published method exists network data under-utilized, it is impossible to the problem realizing effective information prediction.
Summary of the invention
The purpose of the application is in that the method and apparatus proposing a kind of information prediction, solves the technical problem that background section above is mentioned.
On the one hand, this application provides a kind of information forecasting method, described method includes: obtains a plurality of condition and describes information;According to preset time period, described condition is described information to be grouped, issuing time is belonged to the condition of same preset time period and describes information and be divided into one group;Each set condition is described information and extracts characteristic vector respectively;By machine learning method, described characteristic vector is analyzed, obtains the incidence relation that the condition in adjacent preset time period describes between information;Condition situation information based on the next preset time period of described incidence relation prediction.
In certain embodiments, described information that each set condition is described is extracted characteristic vector respectively and is included: detects described condition and describes the formatted message meeting preset format in information, is represented by term vector by described formatted message;Described condition is described to the unformatted message not meeting preset format in information, generate text vector by text analyzing method;Described term vector is constituted together with described text vector described characteristic vector.
In certain embodiments, described method also includes: each set condition is described information and carries out duplicate removal process.
In certain embodiments, described information that each set condition is described is extracted characteristic vector respectively and is included: extracts each set condition and describes the feature not changed within a preset time interval in information as static nature;Extract each set condition and describe the feature changed within a preset time interval in information as behavioral characteristics;Described static nature is constituted together with described behavioral characteristics described characteristic vector.
In certain embodiments, described incidence relation includes the transition probability relation that the condition of adjacent preset time period describes between information, wherein, described condition describes the characteristic vector that information describes information by each set condition and represents, described transition probability relation is represented by the transition probability between feature each in characteristic vector.
In certain embodiments, described condition situation information includes the distribution probability of at least one feature, and described distribution probability is determined by the characteristic vector of current slot and described transition probability relation.
In certain embodiments, described method also includes: based on described distribution probability, push at least one feature that distribution probability is bigger.
Second aspect, this application provides a kind of information prediction device, and described device includes: acquisition module, and configuration is used for obtaining a plurality of condition and describes information;Grouping module, configuration is grouped for described condition being described information according to preset time period, issuing time belongs to the condition of same preset time period and describes information and be divided into one group;Extraction module, configuration extracts characteristic vector respectively for each set condition is described information;Analysis module, is configured to machine learning method and described characteristic vector is analyzed, obtain the incidence relation that the condition in adjacent preset time period describes between information;Prediction module, configuration is for the condition situation information based on the next preset time period of described incidence relation prediction.
In certain embodiments, described extraction module includes: term vector generates unit, and configuration describes, for detecting described condition, the formatted message meeting preset format in information, is represented by term vector by described formatted message;Text vector generates unit, and configuration, for describing, for described condition, the unformatted message not meeting preset format in information, generates text vector by text analyzing method;Characteristic vector generates unit, and configuration for constituting described characteristic vector by described term vector together with described text vector.
In certain embodiments, described device also includes: deduplication module, and configuration carries out duplicate removal process for each set condition is described information.
In certain embodiments, described extraction module includes: static nature extraction unit, and configuration describes the feature not changed within a preset time interval in information as static nature for extracting each set condition;Behavioral characteristics extraction unit, configuration describes the feature changed within a preset time interval in information as behavioral characteristics for extracting each set condition;Synthesis unit, configuration for constituting described characteristic vector by described static nature together with described behavioral characteristics.
In certain embodiments, described incidence relation includes the transition probability relation that the condition of adjacent preset time period describes between information, wherein, described condition describes the characteristic vector that information describes information by each set condition and represents, described transition probability relation is represented by the transition probability between feature each in characteristic vector.
In certain embodiments, described condition situation information includes the distribution probability of at least one feature, and described distribution probability is determined by the characteristic vector of current slot and described transition probability relation.
In certain embodiments, described device also includes: pushing module, and configuration is for based on described distribution probability, pushing at least one feature that distribution probability is bigger.
The information forecasting method of the application offer and device, by obtaining a plurality of condition, information is described, it is grouped then according to condition is described information by preset time period, issuing time belongs to the condition of same preset time period describe information and be divided into one group, then each set condition is described information and extract characteristic vector respectively, then pass through machine learning method characteristic vector is analyzed, obtain the incidence relation that the condition in adjacent preset time period describes between information, and based on the condition situation information of the acquired next preset time period of incidence relation prediction.The method and apparatus of this information prediction makes full use of network data, introduces machine learning method and characteristic vector is analyzed, it is possible to achieve effective information prediction.
Accompanying drawing explanation
By reading the detailed description to non-limiting example made with reference to the following drawings, other features, purpose and advantage will become more apparent upon:
Fig. 1 illustrates the exemplary system architecture that can apply the embodiment of the present application;
Fig. 2 is the flow chart of an embodiment of the information forecasting method according to the application;
Fig. 3 is an application scenarios schematic diagram of the information forecasting method according to the application;
Fig. 4 is the flow chart of another embodiment of the information forecasting method according to the application;
Fig. 5 is the structural representation of an embodiment of the information prediction device according to the application;
Fig. 6 is adapted for the structural representation of the computer system of the electronic equipment for realizing the embodiment of the present application.
Detailed description of the invention
Below in conjunction with drawings and Examples, the application is described in further detail.It is understood that specific embodiment described herein is used only for explaining related invention, but not the restriction to this invention.It also should be noted that, for the ease of describing, accompanying drawing illustrate only the part relevant to about invention.
It should be noted that when not conflicting, the embodiment in the application and the feature in embodiment can be mutually combined.Describe the application below with reference to the accompanying drawings and in conjunction with the embodiments in detail.
Fig. 1 illustrates the exemplary system architecture 100 that can apply the embodiment of the present application.
As it is shown in figure 1, system architecture 100 can include terminal unit 101,102, network 103 and server 104.Network 103 in order to provide the medium of communication link between terminal unit 101,102 and server 104.Network 103 can include various connection type, for instance wired, wireless communication link or fiber optic cables etc..
It is mutual with server 104 that terminal unit 101,102 can pass through network 103, to receive or to send message etc..Terminal unit 101,102 can be provided with the application of various telecommunication customer end, for instance the application of browser application, recruitment platform, location class application, map class application, financing class application, searching class application, shopping class application, social platform application, mailbox client, JICQ etc..
Terminal unit 101, 102 can be support browser application, the various electronic equipments that recruitment platform application etc. is mounted thereon, include but not limited to smart mobile phone, intelligent watch, panel computer, personal digital assistant, E-book reader, MP3 player (MovingPictureExpertsGroupAudioLayerIII, dynamic image expert's compression standard audio frequency aspect 3), MP4 (MovingPictureExpertsGroupAudioLayerIV, dynamic image expert's compression standard audio frequency aspect 4) player, pocket computer on knee and desk computer etc..
Server 104 can be to provide the server of various service.Such as server 104 can be provide, to the browser application of terminal unit 101,102, recruitment platform application etc., the background server etc. supported.The data received can be stored by server, generation etc. processes, and result is fed back to terminal unit.
It should be noted that, the method of the information prediction that the embodiment of the present application provides generally can be passed through server 104 and perform, in some implementations, it can also be performed by terminal unit 101,102, in other realize, it can also be performed jointly by server 104 and terminal unit 101,102, and this is not limited by the application.Correspondingly, the device of the information prediction that the embodiment of the present application provides can be arranged in server 104, can also being arranged in terminal unit 101,102, it is also possible to a part of module is arranged in server 104, another part module is arranged in terminal unit 101,102.
It should be understood that the number of terminal unit in Fig. 1, network and server is merely schematic.According to realizing needs, it is possible to have any number of terminal unit, network and server.
Refer to Fig. 2, it illustrates the flow process 200 of the method for information prediction a embodiment.The present embodiment is mainly applied to illustrate in the electronic equipment of certain operational capability in this way, this electronic equipment can be such as maybe can provide, by the recruitment website etc. that browser application conducts interviews, the background server (server 104 as shown in Figure 1 etc.) supported for recruitment platform application, and this is not limited by the application.The method of this information prediction, comprises the following steps:
Step 201, obtains a plurality of condition and describes information.
In the present embodiment, electronic equipment can obtain a plurality of condition from default website or application and describe information, it is also possible to describes from the condition prestored and obtains a plurality of condition information and describe information.
Wherein, condition describes information and can include multiple condition entry.Describing information for this condition is recruitment information, and recruitment information such as can include but not limited to following one or more condition entry: post title, job duty, qualifications, the number of recruits, wages treatment, office etc..Electronic equipment can from default website, the condition applying or prestore describes and obtains whole conditions information and describe information, can also only obtain issuing time to belong to the condition set in time range (such as 2000 so far) and describe information, the condition of preset time period (such as a year) can also be belonged to from issuing time to describe the condition obtaining predetermined number (such as 100,000) information and describe information, etc..
Electronic equipment can describe information from locally or remotely obtaining these conditions.Such as, when electronic equipment is to provide, for default website or application (such as recruitment website or application), the background server supported, or this locality prestores when having ready conditions description information, and electronic equipment can obtain a plurality of condition and describe information from this locality;When electronic equipment is not provide, for default website or application, the background server supported, or the condition obtained describes information when being stored in advance in other-end equipment, electronic equipment can obtain a plurality of condition by wired connection mode or radio connection from above-mentioned background server or terminal unit and describe information, captures, as electronic equipment can pass through the Internet crawler system (such as Nutch etc.), the recruitment information issued on default website.Above-mentioned radio connection includes but not limited to that 3G/4G connects, WiFi connects, bluetooth connects, WiMAX connects, Zigbee connects, UWB (ultrawideband) connects and other currently known or exploitation in the future radio connection.
Step 202, describes information according to preset time period to above-mentioned condition and is grouped, and issuing time belongs to the condition of same preset time period and describes information and be divided into one group.
In the present embodiment, above-mentioned condition can be described information further according to preset time period (such as one month) and be grouped by electronic equipment, and by issuing time, the condition belonged in same preset time period describes information and is divided into one group.
For conditional information, it is likely to be subject to the impact of Macro demand or current technological development level, such as recruitment information, the demand status of the talent is subject to the impact of macroeconomy and industry development situation by enterprise, when Macroeconomic Development situation is better, the demand of the talent is likely to variation by enterprise, and quantity increases, and Macroeconomic Development in face of unfavorable situation time, the quantity required of the talent is reduced by enterprise;When industry development situation is better, the quantity required of the sector talent is increased by enterprise, and when industry development is in face of unfavorable situation, the quantity required of the sector talent is reduced by enterprise.Thus, the condition situation that acquired condition describes information representative in different time sections is also different.In the present embodiment, it is possible to by different time sections, condition is described information and be analyzed.Such as, different year condition can be analyzed using 1 year as a time period and describe the general inclination change of information, condition in multiple time can be analyzed using the moon or season an as time period and describe the cyclically-varying of information, etc..
In the present embodiment, electronic equipment can preset the different time periods according to different analysis demands and above-mentioned condition is described information be grouped, and analyzes, by what the condition belonging to different group described information, the analysis realizing that the condition of each time period describes information.
Step 203, describes information to each set condition and extracts characteristic vector respectively.
In the present embodiment, each set condition after step 202 is grouped then can be described information and extract characteristic vector respectively by electronic equipment.Wherein, condition describes information can include multiple condition entry, and each condition entry can describe a feature of information as condition.Then characteristic vector can be through a set condition and describes the vector of multiple features composition that information has, it can also be the vector that formed of one group of numeric representation (as represented with 1 when one group of recruitment information includes a feature, represent with 0 when not including a feature) that condition is described the selection result that the multiple feature of information selects.
In practice, every condition can be described information and generate single characteristic vector by electronic equipment, a set condition is described the single combination of eigenvectors characteristic vector together into one group of recruitment information of information again, feature identical in identical single characteristic vector can also be merged, each feature is represented its significance level in this group recruitment information by weight coefficient or probability.Wherein, above-mentioned weight coefficient or probability can be proportional with the number of times that this feature occurs in each single characteristic vector of this group.
In some optional implementations of the present embodiment, condition can be described information and extract characteristic vector in the following manner by electronic equipment: testing conditions describes the formatted message meeting preset format in information, is represented by term vector by above-mentioned formatted message;Condition is described to the unformatted message not meeting preset format in information, build text vector by text analyzing method;By aforementioned term vector constitutive characteristic vector together with text vector.
For recruitment information, the formatted message of preset format such as may is that
Recruitment post:
Software engineer;
In the above examples, preset format is the information format separated with colon ": ", is characteristic item " recruitment post " before ": ": " after be the particular content " software engineer " of this characteristic item.When being embodied as, electronic equipment can preset the form key word judging preset format, and these form key words can include vocabulary (such as " post "), it is also possible to includes symbol (such as ": ").Such as, electronic equipment matches symbol ": " in recruitment information, may determine that recruitment information includes meeting the formatted message of preset format, afterwards, vocabulary " post " can being matched symbol ": " is front, adding in term vector corresponding with this recruitment information thus extracting the vocabulary " software engineer " after symbol ": ".These form key words can store in the electronic device with the form of set or dictionary, and it can be obtained by point analysis methods such as word frequency analysis from a large amount of recruitment informations, it is also possible to is manually set.In some implementations, what different form key words described is probably same feature, such as " skill set requirements: c++/python ", " post describes: utilize c++/python to be programmed ", here, what " skill set requirements " and " post description " was corresponding is all technical ability feature, and particular content is " c++/python ".Now, electronic equipment can preset " skill set requirements " and " post description " corresponding same feature.Alternatively, electronic equipment can preset the form of characteristic vector according to the form key word of formatted message, for instance (teacher, software engineer ...), when matching corresponding form key word in recruitment information, individual features takes 1, and otherwise individual features takes 0.For the formatted message in aforementioned exemplary, term vector can be (0,1 ...).
Condition is described to the unformatted message not meeting preset format in information, such as, " recruitment driver one " in recruitment information, electronic equipment can pass through the methods such as TF-IDF (termfrequency inversedocumentfrequency, word frequency--inverted file frequency), word bag model and build text vector.For word bag model method, first the vocabulary occurred in each unformatted recruitment information being added up, the vocabulary obtaining comprising such as includes (post, describe, recruitment, technical ability, driver, software engineer, one ...), then unformatted message " recruitment driver one " being built text vector is (0,0,1,0,1,0,1 ...).
One set condition can be described the term vector become represented by the formatted message in information by electronic equipment and the text vector that is expressed as of unformatted message is combined or merges, and constitutes the characteristic vector of this group recruitment information together.
In some optional implementations of the present embodiment, electronic equipment is when extraction conditions describes the characteristic vector of information, the information that the condition of each acquired time period can be described carries out the statistics of entirety, the feature that will change within multiple (such as 10) time periods continuously is as static nature, using the feature that changes within the time period of predetermined number (such as 3) as behavioral characteristics, and take static nature vector sum behavioral characteristics vector indescribably in the extraction characteristic vector time-division, by them, constitutive characteristic is vectorial together.
Step 204, is analyzed features described above vector by machine learning method, obtains the incidence relation that the condition in adjacent preset time period describes between information.
In the present embodiment, electronic equipment then can be analyzed by the characteristic vector that each set condition extracted by step 203 is described information by machine learning method, thus the condition obtained in adjacent preset time period describes the incidence relation between information.
Being appreciated that as a complete unit, the change to the description information of similar things is an incremental process, and therefore, the condition issued in time adjacent segments describes information and is likely to be of certain incidence relation.If the condition of each time period being described the condition situation representated by information regard a state as, this incidence relation can be used to the dependency representing current state with next state, for instance this incidence relation can be from current state to the transition probability relation of next state transfer.
Machine learning (MachineLearning, ML) it is specialize in the learning behavior how computer is simulated or realized the mankind, to obtain new knowledge or skills, reorganizing existing knowledge structure so as to constantly improve the process of the performance of self, it can be realized by the multiple theory in theory of probability, statistics, Approximation Theory, convextiry analysis, algorithm complex theory etc. and method.Electronic equipment can pass through machine learning method, current state is regarded as input, and next state regards output as, and the incidence relation between input and output is the intermediate parameters between input and output, thus founding mathematical models, solves intermediate parameters when known input and output.
Exemplarily, during transition probability relation between the condition situation that above-mentioned incidence relation is adjacent preset time period, electronic equipment assume that current state ctObey preceding state ct-1Probability distribution θt, wherein θtRepresent ct-1With ctBetween transition probability relation, at known current state ctWith preceding state ct-1When, electronic equipment can be set up the mathematical modeies such as hidden variable model by the such as method of LatentSVM (potential support vector machine), PSLA (ProbabilisticLatentSemanticAnalysis, probability latent semantic analysis) etc and solve θt.For PSLA, its thought is to estimate, by document and vocabulary both presentations, the theme that implies, and each set condition can be described the feature in the characteristic vector of information as observable variable by electronic equipment, uses ziEach bar condition in express time section t describes the theme of information, ziA corresponding corresponding condition describes the document and vocabulary that are represented in information, condition situation c by single characteristic vectortWith all z in the corresponding time periodiRepresent, by condition situation ctDeng as hidden variable, set up the joint probability distribution P (z of observable variable and hidden variablei, ct...), wherein, ziMiddle i can have a series of value, and the different values of i the condition of corresponding current slot can describe the different themes of information.It is appreciated that joint probability distribution P (zi, ct, θt...) for the product of above-mentioned observable variable and the conditional probability distribution of hidden variable.
Wherein, ziConditional probability distribution can obtain by the following method: assume in units of year, the condition acquiring 10 years describes information data, for the feature that certain is concrete, such as, feature C++ in recruitment information, collect 10000 recruitment informations a certain year, wherein there are 800 recruitment informations containing feature C++, then in this year, the probability of feature C++ can be 800/10000=0.08, obtain such a probability every year, thus obtaining the probability distribution of feature C++ in the sample of 10 years, it is possible to obtain the conditional probability distribution of feature C++.
Electronic equipment then can pass through the methods such as gibbs sampler (GibbsSampling), the calculus of variations, HMM (HiddenMarkovModel, hidden Markov model) forward-backward algorithm algorithm, solves P (zi, ct...) and when taking maximum, hidden variable ctValue.For gibbs sampler, electronic equipment can build a corresponding P (zi, ct...) and transition probability matrix, this transition probability matrix such as can build by the following method:
Assume there are four state c1、c2、c3、c4, use θjkRepresent cjTo ckThe probability of transfer, wherein, j and k can take 1,2,3,4 respectively, as shown in the table:
c1 c2 c3 c4
c1 θ11 θ12 θ13 θ14
c2 θ21 θ22 θ23 θ24
c3 θ31 θ32 θ33 θ34
c4 θ41 θ42 θ43 θ44
Then constructed probability transfer matrix can be:
θ 11 θ 12 θ 13 θ 14 θ 21 θ 22 θ 23 θ 24 θ 31 θ 32 θ 33 θ 34 θ 41 θ 42 θ 43 θ 44
Then electronic equipment can often organize known ziLower constantly change hidden variable ctValue, obtain a series of set of variables, wherein each set of variables includes observable variable and hidden variable, according to gibbs sampler rule, sampling a set of variables every the set of variables of predetermined number (such as 20) from this number of variables group, the joint probability distribution that multiple set of variables that sampling obtains are met is joint probability distribution P (zi, ct...).Electronic equipment then can change value every in above-mentioned probability transfer matrix, makes P (zi, ct...) and take the optimal solution that value every during maximum is above-mentioned probability transfer matrix.According to preceding state ct-1Value can obtain the probability distribution θ of its correspondencet
Step 205, based on the condition situation information of the next preset time period of above-mentioned incidence relation prediction.
In the present embodiment, electronic equipment further can based on the condition situation information of the above-mentioned next preset time period of incidence relation prediction.Wherein, condition situation information can be such as condition situation ct+1, the condition distribution probability describing the characteristic vector corresponding to information, at least one feature etc..
Be appreciated that above-mentioned incidence relation can be used to the dependency representing current state with next state, then electronic equipment can describe the condition situation information of information prediction next one preset time period according to above-mentioned incidence relation by the condition of current slot.Such as, when above-mentioned incidence relation is from current state to the transition probability relation of next state transfer, it is assumed that current time condition situation is ct, corresponding transition probability is θt+1, by the characteristic vector (z of conditions present situationi) represent condition situation ct, wherein, ziMiddle i can have a series of value, the different values of i can the different themes of recruitment information of corresponding current slot, (zi) it is equivalent to (z1, z2, z3...), then the probability distribution of subsequent time condition situation can be (zi) and θt+1Product.Wherein, θt+1Method for solving and θtSimilar, do not repeat them here.In some implementations, by (zi) and θt+1Product can obtain the distribution probability of at least one feature.Alternatively, electronic equipment can push, according to the distribution probability of each feature, one or more features that distribution probability is bigger.Further, these features recommended by electronic equipment may be used for user and issue condition in the next time period and describe the reference of information.
In some optional implementations of the present embodiment, the feature that characteristic vector embodies includes behavioral characteristics and static nature, each set condition describes the characteristic vector of information and includes static nature vector sum behavioral characteristics vector, electronic equipment can utilize the method similar to aforesaid prediction principle respectively static nature or behavioral characteristics to be predicted, and does not repeat them here.
Refer to Fig. 3, Fig. 3 to be applied to provide, into recruitment class application, the background server supported for the method for the information prediction of the application, give an application scenarios of above-described embodiment.As it is shown on figure 3, the information forecasting method of the present embodiment can be performed by background server 302.In step 3001, the request of recruitment information prediction is initiated in the recruitment class application that user can pass through to install in client 301 to background server 302;Then, in step 3002, background server 302 obtains a plurality of recruitment information from self and/or at least one other server 303, wherein other servers 303 can be provide, for other recruitment websites or recruitment class application, the various servers (as being analogous to the server of background server 302, or Cloud Server etc.) supported;Then, in step 3003, acquired recruitment information can be grouped by background server 302 according to preset time period (such as a season), and the recruitment information that the issuing time of recruitment information belongs to section at the same time is divided into one group;Then, in step 3004, each group of recruitment information can be extracted characteristic vector by background server 302;Then, in step 3005, the characteristic vector of each group of recruitment information can be analyzed by background server 302 further by machine learning method, thus the incidence relation between the recruitment information obtained in time adjacent segments;In step 3006, background server 302 can according to the recruitment information of acquired incidence relation and current slot, the recruitment situation information of subsequent time period being made a prediction, and will predict the outcome and be sent to client 301, client 301 can be presented to user by predicting the outcome.
The information forecasting method of the present embodiment, owing to being obtained the incidence relation between the recruitment information in adjacent preset time period by machine learning method, and based on the recruitment situation information of the acquired next preset time period of incidence relation prediction, it is possible to achieve effective information prediction.
With further reference to Fig. 4, it illustrates the flow process 400 of another embodiment of the method for the information prediction of the application.The method 400 of this information prediction comprises the following steps:
Step 401, obtains a plurality of condition and describes information.
In the present embodiment, electronic equipment can obtain a plurality of condition from default website or application and describe information, it is also possible to describes from the condition prestored and obtains a plurality of condition information and describe information.Wherein, every condition describes information and can include multiple condition entry, and electronic equipment can describe information from locally or remotely obtaining these conditions.
Step 402, describes information according to preset time period to above-mentioned condition and is grouped, and issuing time belongs to the condition of same preset time period and describes information and be divided into one group.
In the present embodiment, above-mentioned condition can be described information further according to preset time period (such as 1 year) and be grouped by electronic equipment, and by issuing time, the condition belonged in same preset time period describes information and is divided into one group.For conditional information, it is possible to by the impact of Macro demand or current technological development level, thus, in the present embodiment, condition can be described information by different time sections and be analyzed by electronic equipment.Such as, electronic equipment can be analyzed different year condition using 1 year as a time period and describe the general inclination change of information, can analyze condition in multiple time using the moon or season an as time period and describe the cyclically-varying of information, etc..
Step 403, describes information to each set condition and extracts characteristic vector respectively.
In the present embodiment, each set condition after step 402 is grouped then can be described information and extract characteristic vector respectively by electronic equipment.
Wherein, every condition describes each condition entry that information includes can describe a feature of information as this condition, characteristic vector can be through a set condition and describes the vector of multiple features composition that information has, it is also possible to the numeric representation of the selection result that the multiple feature of information selects that to be a group describe condition) vector that formed.In practice, every condition can be described information and generate single characteristic vector by electronic equipment, a set condition is described the single combination of eigenvectors characteristic vector together into one group of recruitment information of information again, feature identical in identical single characteristic vector can also be merged, each feature is represented its significance level in this group recruitment information by weight coefficient or probability.In some implementations, electronic equipment can describe the formatted message meeting preset format in information by testing conditions, is represented by term vector by above-mentioned formatted message;Condition is described to the unformatted message not meeting preset format in information, build text vector by text analyzing method;By aforementioned term vector constitutive characteristic vector together with text vector.
Step 404, describes information to each set condition and carries out duplicate removal process.
In the present embodiment, every set condition then can be described the interior condition of information group and describe the contrast of information by electronic equipment, information that the condition repeated is described by deleting, the method such as merging carry out duplicate removal process.Wherein, the condition of repetition describes information can be that the same or like condition of content that identical publisher issues describes information.
It is appreciated that some users are when issue condition describes information, it is possible to only describe information in a website or application issue condition, and other users are likely to describe information in multiple websites or by the different application same conditions of issue simultaneously.If these conditions being described information describe information as independent condition, then due to the accuracy repeating impact analysis of partial information.Therefore, in some implementations, it is also possible to each set condition is described information and carries out duplicate removal process, for instance, by condition being described the coupling of the information such as publisher's title of information, IP address, it is determined that the condition that same publisher issues describes information.The condition that same publisher is issued describes information, every condition is described information and extracts single characteristic vector respectively, contrast these single characteristic vectors between two, condition corresponding more than the multiple single characteristic vector of predetermined threshold value for similarity can be described the recruitment information that information is defined as repeating by electronic equipment, the condition that the information that can also condition identical for the eigenvalue preset in characteristic vector be described is defined as repeating describes information, identical eigenvalue number can also exceed the condition of predetermined number threshold value describe the condition that information is defined as repeating and describe information, this is not limited by the application.The condition of these repetitions is described information, electronic equipment can only retain therein any one, it is also possible to merged and described information as a condition.
Step 405, is analyzed features described above vector by machine learning method, obtains the incidence relation that the condition in adjacent preset time period describes between information.
In the present embodiment, electronic equipment then can be analyzed by the characteristic vector that each set condition extracted by step 203 is described information by machine learning method, thus the condition obtained in adjacent preset time period describes the incidence relation between information.If the condition of each time period being described the condition situation representated by information regard a state as, this incidence relation can be used to the dependency representing current state with next state, for instance this incidence relation can be from current state to the transition probability relation of next state transfer.
In some implementations, electronic equipment can be set up the mathematical modeies such as hidden variable model by the such as method of LatentSVM (potential support vector machine), PSLA (ProbabilisticLatentSemanticAnalysis, probability latent semantic analysis) etc and solve above-mentioned incidence relation.
Step 406, based on the condition situation information of the next preset time period of above-mentioned incidence relation prediction.
In the present embodiment, electronic equipment further can based on the condition situation information of the above-mentioned next preset time period of incidence relation prediction.Wherein, condition situation information can be such as condition situation ct+1, the condition distribution probability describing the characteristic vector corresponding to information, at least one feature etc..
In the present embodiment, above-mentioned realize in flow process step 401, step 402, step 403,405,406 respectively with the step 201 in previous embodiment, step 202, step 203,204,205 essentially identical, do not repeat them here.
Figure 4, it is seen that each set condition is described information the difference is that, the flow process 400 of the information forecasting method in the present embodiment adds by the embodiment corresponding with Fig. 2 carries out the step 404 of duplicate removal process.By the step 404 increased, the present embodiment the condition of repetition can be described information by deleting, the method such as merging carries out duplicate removal process, and then increases the accuracy of information prediction.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides the device of a kind of information prediction a embodiment, this device embodiment is corresponding with the embodiment of the method shown in Fig. 2, and this device specifically can apply in electronic equipment.
As it is shown in figure 5, the device 500 of the information prediction described in the present embodiment includes: acquisition module 501, grouping module 502, extraction module 503, analysis module 504 and prediction module 505.Wherein, acquisition module 501 can configure and describe information for obtaining a plurality of condition;Grouping module 502 can configure and be grouped for above-mentioned condition being described information according to preset time period, issuing time belongs to the condition of same preset time period and describes information and be divided into one group;Extraction module 503 can configure and extract characteristic vector respectively for each set condition is described information;Analysis module 504 can be configured to machine learning method and features described above vector is analyzed, and obtains the incidence relation that the condition in adjacent preset time period describes between information;Prediction module 505 can configure for the condition situation information based on the next preset time period of above-mentioned incidence relation prediction.
What deserves to be explained is, all modules recorded in the device 500 of information prediction or unit are corresponding with reference to each step in Fig. 2 method described.Thus, the operation described above with respect to method and feature are equally applicable to the device 500 of information prediction and the module wherein comprised or unit, do not repeat them here.
It will be understood by those skilled in the art that the device 500 of above-mentioned information prediction also includes some other known features, for instance processor, memorizer etc., embodiment of the disclosure in order to unnecessarily fuzzy, these known structures are not shown in Figure 5.
Below with reference to Fig. 6, it illustrates the structural representation of the computer system 600 being suitable to the electronic equipment for realizing the embodiment of the present application.
As shown in Figure 6, computer system 600 includes CPU (CPU) 601, its can according to the program being stored in read only memory (ROM) 602 or from storage part 608 be loaded into the program random access storage device (RAM) 603 and perform various suitable action and process.In RAM603, also storage has system 600 to operate required various programs and data.CPU601, ROM602 and RAM603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to bus 604.
It is connected to I/O interface 605: include the importation 606 of keyboard, mouse etc. with lower component;Output part 607 including such as cathode ray tube (CRT), liquid crystal display (LCD) etc. and speaker etc.;Storage part 608 including hard disk etc.;And include the communications portion 609 of the NIC of such as LAN card, modem etc..Communications portion 609 performs communication process via the network of such as the Internet.Driver 610 is connected to I/O interface 605 also according to needs.Detachable media 611, such as disk, CD, magneto-optic disk, semiconductor memory etc., be arranged in driver 610 as required, in order to the computer program read from it is mounted into storage part 608 as required.
Especially, according to embodiments herein, the process described above with reference to flow chart may be implemented as computer software programs.Such as, embodiments herein includes a kind of computer program, and it includes the computer program being tangibly embodied on machine readable media, and described computer program comprises the program code for performing the method shown in flow chart.In such embodiments, this computer program can pass through communications portion 609 and be downloaded and installed from network, and/or is mounted from detachable media 611.
Unit involved in the embodiment of the present application can be realized by the mode of software, it is also possible to is realized by the mode of hardware.Described module can also be arranged within a processor, for instance, it is possible to it is described as: a kind of processor includes acquisition module, grouping module, extraction module, analysis module and prediction module.Wherein the title of these modules is not intended that the restriction to this module itself under certain conditions, for instance, acquisition module is also described as " configuration describes the module of information for obtaining a plurality of condition ".
As on the other hand, present invention also provides a kind of computer-readable recording medium, this computer-readable recording medium can be the computer-readable recording medium comprised in device described in above-described embodiment;Can also be individualism, be unkitted the computer-readable recording medium allocating in terminal.Described computer-readable recording medium storage has one or more than one program, when described program is performed by one or more than one processor so that described equipment: obtain a plurality of condition and describe information;According to preset time period, described condition is described information to be grouped, issuing time is belonged to the condition of same preset time period and describes information and be divided into one group;Each set condition is described information and extracts characteristic vector respectively;By machine learning method, described characteristic vector is analyzed, obtains the incidence relation that the condition in adjacent preset time period describes between information;Condition situation information based on the next preset time period of described incidence relation prediction.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Skilled artisan would appreciate that, invention scope involved in the application, it is not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, when also should be encompassed in without departing from described inventive concept simultaneously, other technical scheme being carried out combination in any by above-mentioned technical characteristic or its equivalent feature and being formed.Such as features described above and (but not limited to) disclosed herein have the technical characteristic of similar functions and replace mutually and the technical scheme that formed.

Claims (14)

1. the method for an information prediction, it is characterised in that described method includes:
Obtain a plurality of condition and describe information;
According to preset time period, described condition is described information to be grouped, issuing time is belonged to the condition of same preset time period and describes information and be divided into one group;
Each set condition is described information and extracts characteristic vector respectively;
By machine learning method, described characteristic vector is analyzed, obtains the incidence relation that the condition in adjacent preset time period describes between information;
Condition situation information based on the next preset time period of described incidence relation prediction.
2. method according to claim 1, it is characterised in that described information that each set condition is described is extracted characteristic vector respectively and included:
Detect described condition and describe the formatted message meeting preset format in information, described formatted message is represented by term vector;
Described condition is described to the unformatted message not meeting preset format in information, generate text vector by text analyzing method;
Described term vector is constituted together with described text vector described characteristic vector.
3. method according to claim 1, it is characterised in that described method also includes:
Each set condition is described information and carries out duplicate removal process.
4. method according to claim 1, it is characterised in that described information that each set condition is described is extracted characteristic vector respectively and included:
Extract each set condition and describe the feature not changed within a preset time interval in information as static nature;
Extract each set condition and describe the feature changed within a preset time interval in information as behavioral characteristics;
Described static nature is constituted together with described behavioral characteristics described characteristic vector.
5. method according to claim 1, it is characterized in that, described incidence relation includes the transition probability relation that the condition of adjacent preset time period describes between information, wherein, described condition describes the characteristic vector that information describes information by each set condition and represents, described transition probability relation is represented by the transition probability between feature each in characteristic vector.
6. method according to claim 5, it is characterised in that described condition situation information includes the distribution probability of at least one feature, described distribution probability is determined by the characteristic vector of current slot and described transition probability relation.
7. method according to claim 6, it is characterised in that described method also includes:
Based on described distribution probability, push at least one feature that distribution probability is bigger.
8. the device of an information prediction, it is characterised in that described device includes:
Acquisition module, configuration is used for obtaining a plurality of condition and describes information;
Grouping module, configuration is grouped for described condition being described information according to preset time period, issuing time belongs to the condition of same preset time period and describes information and be divided into one group;
Extraction module, configuration extracts characteristic vector respectively for each set condition is described information;
Analysis module, is configured to machine learning method and described characteristic vector is analyzed, obtain the incidence relation that the condition in adjacent preset time period describes between information;
Prediction module, configuration is for the condition situation information based on the next preset time period of described incidence relation prediction.
9. device according to claim 8, it is characterised in that described extraction module includes:
Term vector generates unit, and configuration describes, for detecting described condition, the formatted message meeting preset format in information, is represented by term vector by described formatted message;
Text vector generates unit, and configuration, for describing, for described condition, the unformatted message not meeting preset format in information, generates text vector by text analyzing method;
Characteristic vector generates unit, and configuration for constituting described characteristic vector by described term vector together with described text vector.
10. device according to claim 8, it is characterised in that described device also includes:
Deduplication module, configuration carries out duplicate removal process for each set condition is described information.
11. device according to claim 8, it is characterised in that described extraction module includes:
Static nature extraction unit, configuration describes the feature not changed within a preset time interval in information as static nature for extracting each set condition;
Behavioral characteristics extraction unit, configuration describes the feature changed within a preset time interval in information as behavioral characteristics for extracting each set condition;
Synthesis unit, configuration for constituting described characteristic vector by described static nature together with described behavioral characteristics.
12. device according to claim 8, it is characterized in that, described incidence relation includes the transition probability relation that the condition of adjacent preset time period describes between information, wherein, described condition describes the characteristic vector that information describes information by each set condition and represents, described transition probability relation is represented by the transition probability between feature each in characteristic vector.
13. device according to claim 12, it is characterised in that described condition situation information includes the distribution probability of at least one feature, described distribution probability is determined by the characteristic vector of current slot and described transition probability relation.
14. device according to claim 13, it is characterised in that described device also includes:
Pushing module, configuration is for based on described distribution probability, pushing at least one feature that distribution probability is bigger.
CN201610141577.3A 2016-03-11 2016-03-11 Information prediction method and device Pending CN105808744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610141577.3A CN105808744A (en) 2016-03-11 2016-03-11 Information prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610141577.3A CN105808744A (en) 2016-03-11 2016-03-11 Information prediction method and device

Publications (1)

Publication Number Publication Date
CN105808744A true CN105808744A (en) 2016-07-27

Family

ID=56468221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610141577.3A Pending CN105808744A (en) 2016-03-11 2016-03-11 Information prediction method and device

Country Status (1)

Country Link
CN (1) CN105808744A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377907A (en) * 2019-07-18 2019-10-25 中科鼎富(北京)科技发展有限公司 A kind of recruitment information standardized method and device
CN110852846A (en) * 2019-11-11 2020-02-28 京东数字科技控股有限公司 Processing method and device for recommended object, electronic equipment and storage medium
CN111126103A (en) * 2018-10-30 2020-05-08 百度在线网络技术(北京)有限公司 Method and device for judging life stage state of user
CN114169869A (en) * 2022-02-14 2022-03-11 北京大学 Attention mechanism-based post recommendation method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793465A (en) * 2013-12-20 2014-05-14 武汉理工大学 Cloud computing based real-time mass user behavior analyzing method and system
CN104091054A (en) * 2014-06-26 2014-10-08 中国科学院自动化研究所 Mass disturbance warning method and system applied to short texts
CN104503959A (en) * 2014-12-12 2015-04-08 北京智谷睿拓技术服务有限公司 Method and equipment for predicting user emotion tendency

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793465A (en) * 2013-12-20 2014-05-14 武汉理工大学 Cloud computing based real-time mass user behavior analyzing method and system
CN104091054A (en) * 2014-06-26 2014-10-08 中国科学院自动化研究所 Mass disturbance warning method and system applied to short texts
CN104503959A (en) * 2014-12-12 2015-04-08 北京智谷睿拓技术服务有限公司 Method and equipment for predicting user emotion tendency

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈队永: "《系统工程原理及应用》", 28 February 2014, 北京:中国铁道出版社 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126103A (en) * 2018-10-30 2020-05-08 百度在线网络技术(北京)有限公司 Method and device for judging life stage state of user
CN111126103B (en) * 2018-10-30 2023-09-26 百度在线网络技术(北京)有限公司 Method and device for judging life stage state of user
CN110377907A (en) * 2019-07-18 2019-10-25 中科鼎富(北京)科技发展有限公司 A kind of recruitment information standardized method and device
CN110377907B (en) * 2019-07-18 2023-09-08 鼎富智能科技有限公司 Recruitment information standardization method and device
CN110852846A (en) * 2019-11-11 2020-02-28 京东数字科技控股有限公司 Processing method and device for recommended object, electronic equipment and storage medium
CN114169869A (en) * 2022-02-14 2022-03-11 北京大学 Attention mechanism-based post recommendation method and device

Similar Documents

Publication Publication Date Title
Pournarakis et al. A computational model for mining consumer perceptions in social media
Sun et al. Embracing textual data analytics in auditing with deep learning.
US11604980B2 (en) Targeted crowd sourcing for metadata management across data sets
US11636341B2 (en) Processing sequential interaction data
US9449283B1 (en) Selecting a training strategy for training a machine learning model
CN105550173A (en) Text correction method and device
CN115002200B (en) Message pushing method, device, equipment and storage medium based on user portrait
CN104364781A (en) Systems and methods for calculating category proportions
Schetgen et al. Predicting donation behavior: Acquisition modeling in the nonprofit sector using Facebook data
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
Burhanuddin et al. Analysis of mobile service providers performance using naive bayes data mining technique
CN105808744A (en) Information prediction method and device
CN107346344A (en) The method and apparatus of text matches
CN112487109A (en) Entity relationship extraction method, terminal and computer readable storage medium
CN111429161A (en) Feature extraction method, feature extraction device, storage medium, and electronic apparatus
CN105488161A (en) Information pushing method and apparatus
Samah Naïve Bayes Twitter sentiment analysis in visualizing the reputation of communication service providers: During Covid-19 pandemic
CN104077288B (en) Web page contents recommend method and web page contents recommendation apparatus
CN114780600A (en) Flight searching method, system, equipment and storage medium
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN111930944B (en) File label classification method and device
Vaganov et al. Forecasting purchase categories with transition graphs using financial and social data
CN113159934A (en) Method and system for predicting passenger flow of network, electronic equipment and storage medium
CN114036921A (en) Policy information matching method and device
CN114742645B (en) User security level identification method and device based on multi-stage time sequence multitask

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160727