CN111340669A

CN111340669A - Crowd funding project initial stage financing performance prediction system

Info

Publication number: CN111340669A
Application number: CN202010107299.6A
Authority: CN
Inventors: 陈恩红; 刘淇; 吴李康; 李徵; 张凯
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2020-06-26

Abstract

The invention discloses a crowd funding project initial stage financing performance prediction system, wherein a neural network structure of a graph is used for modeling competition influence among projects and evolution of a market environment, so that a model can model environmental factors of a crowd funding market to further improve prediction accuracy; meanwhile, the system can also visually display various information and the final prediction result in the prediction process, so that the user experience is greatly improved, and the user can conveniently know the relevant conditions of the crowd funding project.

Description

Crowd funding project initial stage financing performance prediction system

Technical Field

The invention relates to the field of figure neural network and network crowd funding, in particular to a crowd funding project initial stage financing performance prediction system.

Background

The rise of network crowd funding in recent years generates a plurality of valuable research problems, such as project success rate prediction, recommendation system based on crowd funding platform and dynamic tracking of crowd funding. Most of the existing research problems concern the financing process after the project is started, and in the crowd funding market, the initial financing performance of the project is a problem which is very concerned by both an initiator and a platform.

Evaluating the initial financing performance of a project prior to its startup can create a great deal of value, however, the prediction is more difficult and in an unexplored stage because the market environment of the project release time has a great impact on its initial investment.

At present, in the crowd funding field, no special equipment capable of realizing accurate information prediction and visually displaying various information and prediction results in the prediction process is available, and therefore improvement is needed.

Disclosure of Invention

The invention aims to provide a crowd funding project initial stage financing performance prediction system which can visually display various information and prediction results in a prediction process.

The purpose of the invention is realized by the following technical scheme:

a crowd funding project initial stage financing performance prediction system comprises:

the static data preprocessing unit is used for processing the target project and the content information of other published projects before the target project pre-publishing time to obtain corresponding feature vectors;

the dynamic data acquisition unit is used for acquiring the financing time sequence of other published items before the pre-publishing time of the target item, and processing the financing time sequence through the embedding layer to obtain a corresponding time sequence vector;

the modeling and predicting unit is used for obtaining a competition pressure state vector suffered by the target project according to the feature vector of the target project, the feature vectors and the time sequence vectors of other published projects and combining the long-short term memory network and the attention network to model a project competition relationship; according to the feature vector of the target project, the feature vectors and the financing time sequence of other published projects, and in combination with the propagation tree structure modeling historical market environment, obtaining the environment state vector of the target project; predicting an initial financing result of the target project by using the competition pressure state vector of the target project and the environment state vector of the target project; the initial stage is within 24 hours;

the display unit is used for independently displaying the target item and the content information of other published items before the target item pre-publishing time, the processing result of the static data preprocessing unit, the financing time sequence acquired by the dynamic data acquisition unit and the initial financing result of the target item acquired by the modeling and prediction unit by dividing different display areas.

According to the technical scheme provided by the invention, the competition influence among projects and the evolution of the market environment are modeled by using the graph neural network structure, so that the model can model the environmental factors of the crowd funding market and further improve the accuracy of prediction; meanwhile, the system can also visually display various information and the final prediction result in the prediction process, so that the user experience is greatly improved, and the user can conveniently know the relevant conditions of the crowd funding project.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic diagram of a crowd funding project initial stage financing performance prediction system according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a crowd funding project initial stage financing performance prediction system according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a propagation tree structure according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a crowd funding project initial stage financing performance prediction system, as shown in fig. 1, which mainly comprises:

the modeling and predicting unit is used for obtaining a competition pressure state vector suffered by the target project according to the feature vector of the target project, the feature vectors and the time sequence vectors of other published projects and combining the long-short term memory network and the attention network to model a project competition relationship; according to the feature vector of the target project, the feature vectors and the financing time sequence of other published projects, and in combination with the propagation tree structure modeling historical market environment, obtaining the environment state vector of the target project; predicting an initial financing result of the target project by using the competition pressure state vector of the target project and the environment state vector of the target project;

The system can be implemented by matching with related hardware, for example, the display unit can be implemented by matching with a display screen. The static data preprocessing unit, the dynamic data acquiring unit and the modeling and predicting unit can be implemented by cooperating with the processor, and at the same time, the static data preprocessing unit, the dynamic data acquiring unit and the modeling and predicting unit also include some necessary hardware devices, such as a storage device (providing a system operating space and a data space), a communication device (enabling the system to interact with the outside to acquire related information), and the like.

For ease of understanding, the following detailed description is directed to the above-described system.

In the embodiment of the invention, the initial financing performance of the crowd funding project to be predicted by the system mainly refers to the financing performance within 24 hours after the project is issued, but the financing amount of the project cannot be directly used as the prediction target, because the same amount has different performances for projects with different financing targets. Thus, the percentage of the financing amount to the target may be used as a prediction target, and to reduce the difference between the minimum maxima, the present invention uses log₂The (-) function constrains the percentage to facilitate model prediction.

α in the above formula_iIndicates the number of financing of item i within the first 24 hours, g_iShows a financing target of the pre-release item i, so

Indicating the percentage of the initial financing number of the project to its target.

Fig. 2 is a schematic diagram of the above system provided by the present invention.

First, static data preprocessing unit.

In the embodiment of the invention, the main information of the data used by the crowd funding platform comprises: a project description, a project category, an initiator type, a current exchange rate, a target financing period, and a target financing amount.

Because the content information needs to be converted into a vector form, the numerical type in the content information is discretized to obtain a one-hot encoder; processing the text type by using a text steering vector (doc2vec) method in a natural language processing technology to obtain a corresponding vector; and splicing the vectors corresponding to the types to obtain corresponding feature vectors (static content feature vectors).

Preferably, before using the doc2vec method, for the text data, word segmentation is firstly performed by using word segmentation technology, then all punctuations are deleted, all words are uniformly converted into lower case, and only words with the frequency of occurrence more than 5 times are reserved.

Based on the manner, corresponding feature vectors can be obtained for all the items, in the embodiment of the invention, the target item is marked as g, and because the model training is involved in the invention, a target item set is also constructed

In the training process, various results of the target project are known, and content information of the target project is known in the testing process, but since the target project is not published, various financing conditions involved are unknown, and a modeling and prediction unit mentioned later is needed for prediction. And recording the set of other published items before the target item pre-publishing time as psi, wherein the item i and the item j referred to later are published items. The feature vectors of these items are all correspondingly represented as x_g、x_i、x_j。

Second, dynamic data acquisition unit

For a given target item g, the pre-release time is T_gIts corresponding environmental factor, i.e. contextual characteristic, i.e. T_gFunding sequences for other published items in the market by the masses at time onwards.

For item i, the financing time sequence is:

in the above formula, v represents the investment amount, t represents the time stamp of the investment, subscript is the number of the investment times, | S_iAnd | represents the total investment.

Will fund time series S_iProcessing by an Embedding Layer (Embedding Layer) to obtain a corresponding time series vector TS_i，TS_i＝[ξ₀，ξ₁，...，ξ₂₃]A time series representing item i over the past 24 hours;

ξ_k＝log₂(∑v_l)

in the above formula, v_l∈S_i，T_g-(k+1)*Δ≤t_l＜T_gK Δ, k ═ 0, 1.., 23, Δ denotes the time interval of 1 hour, and the amount of the item i financed in each hour in the last 24 hours can be determined by this formula

And thirdly, a modeling and predicting unit.

1. Project competition modeling Part (PCM).

Once a project is released, it is subject to competition from the marketplace. When the competitiveness between projects is established, the pre-release time of a project g to be released is T_gThe item g and the time T are established_gAnd (3) the continuous edges of other running projects are considered to influence the target projects by different contents and different competitiveness sizes, and the competitiveness information of the other projects is aggregated by using a graph attention network (GAT) to express the competition pressure of the target projects. Wherein the competitive strength of other projects in a future period is predicted by modeling the historical time sequence of the other projects by using a long short-term memory network (LSTM); the specific implementation process is as follows:

first, to quantify the competitiveness of each competitor for a future period of time, the initial financing state (i.e., within 24 hours) can be predicted using a long-short term memory network (LSTM) based on the time series vector of each published item to express its competitiveness:

in the above formula, TS_iTime series vector representing published item i, Ψ represents T_gA set of published items that are running on the market at a time.

In consideration of the computational stress of the platform, a plurality of target items are trained in the model at the same time, and in order to achieve the aim, the invention divides a day into 6 stages according to the general work and rest time of human beings, namely 8: 00-12: 00","12: 00-14: 00","14: -17: 00","17: 00-20: 00","20: 00-24: 00 "and" 0: 00-8: 00 ", then define the target set

Containing unpublished target items at the same time period within the same day. Meanwhile, in order to prevent the common information leakage on the time sequence task, when the combination psi is obtained, the pre-release time of each item in the definition set psi is unified into

Wherein, T_iThe time of day is published for item i. Considering that time-series modeling using LSTM is time-consuming when Ψ is large, to solve this problem, a pruning method is used to select a published item from the set Ψ that is most likely to compete with the target item at the early stage of the item, i.e., select T_gItems in a just-funded tile (containing newly created items within the last three days) and a category tile (containing items of the same category) that is the same as the target item in the time crowd-funding platform are represented by an adjacency matrix:

in the above formula, the first and second carbon atoms are,

indicating that item i and item j have a continuous edge,

indicating that item i and item j do not have a continuous edge,

is to map the id in the set Ψ of published items into the column of the adjacency matrix, C_iAnd C_jIndicates the categories to which the item i and the item j belong, T_iAnd T_jIndicating the pre-issue time for item i and item j.

The pruning method can reduce the number of time sequence simulation, reduce the calculation amount and reduce the noise of information aggregation. Because of strong competitiveness or the influence of items with contents similar to the target item on the target item is large, the graph attention network is used for carrying out neighbor information aggregation on the target item g:

e_gi＝V^T[Wx_g||Wx_i]

in the above formula, x_g、x_iFeature vectors respectively representing target items g and i, V, W representing mapping parameter matrixes used in the attention mechanism, wherein specific parameters of the mapping parameter matrixes are learned and optimized in the training process of the model, α_giRepresenting attention weights, T is a matrix device symbol,

a set of neighbor nodes representing a target project comprised of published projects;

and finally, obtaining a competition pressure state vector of the target project:

in the above process, α_giIs calculated from the static content feature vector, W_hRepresents a matrix of mapping parameters optimized by learning in training, and uses attention weights α_giAnd predicting financing status

In this way, the invention can simultaneously consider the project financing capacity and the project content.

2. Market environmental evolution modeling section (MET).

In fact, the market environment is the context environment of the project, so it is necessary to refer to the initial financing conditions of other projects in the historical market environment of the target project and find out the change of the financing states of the projects along with the market evolution. Since a market can release hundreds or thousands of items in a short few days, the traditional chain structure model for time series modeling is not suitable for the scenario because the effect is significantly reduced as the time series grows. Meanwhile, if the financing status of other items in the historical data is directly aggregated to the target item, a problem arises in that the time levels of the items are put on a level, which is unreasonable in the time series modeling. Therefore, the invention constructs a graph neural network for information transfer based on a propagation tree structure for modeling the whole historical market environment.

When modeling a historical market environment, defining the published items as nodes of a propagation tree, and defining the state of the published items:

h_j＝[x_j||r_j]

in the above formula, x_jFeature vector, r, representing item j_jRepresents the initial (within the initial 24 hours) financing number of item j:

in the above formula, T_jRepresents the pre-release time of the item j; s_jRepresenting a sequence of financing times for the item j,v_lrepresents the amount of the first investment; t is t_lA timestamp representing the first investment; n is_hRepresents 24 hours of the day, and wherein there is a constraint: t is_i-T_i＞n_h*Δ，T_iRepresenting the pre-release time of the item i, and delta representing a time interval of 1 hour, in such a way that the item i can observe the initial financing state of the item j, namely the item j is released at the time when the item i is released and exceeds 1 day, the initial financing state of the item j can be observed at the time, and j is defined as an observable node of i; if the historical days are t_hThen the set of observable nodes is: phi_i＝{j|，n_h*Δ＜T_i-T_j＜n_h*t_hDelta }; the propagation tree is built as shown in part (a) of FIG. 3, which includes three nodes and respective observable nodes, and three connecting edges exist<a，g>，<b，g〉，<b, a), the length of each connecting edge is more than 24 hours. If deleted<b, g), the depth of the nodes a and b on the tree taking g as the root is 1 and 2 respectively, and the depth can represent the pre-release time point T of different nodes from the target item g_gAnd the process of information passing from node b to a to g is similar to the process of information passing in time steps in an LSTM network.

Consider the more complex case, which is shown in part (b) of fig. 3, and also apply the above method to model the tree structure. In addition, because the market environment is the context environment in the target project prediction task, considering that the model effect can be effectively prevented from being attenuated by using the equal-interval sampling method during the long-period transmission of the time sequence information, the invention constructs a propagation tree which can keep the close time interval between the layers of each subtree of the constructed propagation tree as much as possible, namely a propagation path formed from each leaf node to the root node is close to the time sequence of equal-interval sampling. When building the propagation number structure, t is_hNewly released items in each day are arranged in the same layer of the tree, and the financing state of all items released in the day closest to the expected release time

As a node of the tree, and is arranged in the first layer of the root node of the propagation tree and connected with the root node; and the financing states of all projects on the next day closest to the expected release time are used as a second layer of the root nodes, each node is connected with the node closest to the node in the first layer, and the finally generated propagation tree structure is represented by an adjacency matrix gamma.

In order to prevent attenuation of information propagation at longer depths, the present invention uses a method of a recurrent neural unit (GRU).

Before information propagation, the state of all nodes in the propagation tree is initialized:

in the above formula, x_g、x_iFeature vectors r representing target items g and i, respectively_iRepresenting the initial financing number of the item i, and using v to refer to each node as nodes are not treated differently in the information propagation process;

in each subsequent propagation process, that is, information aggregation between nodes is performed each time, and the information aggregation mode of each node is as follows:

wherein

Representing neighbor nodes representing a node v in an adjacency matrix gamma, | G ∪ Φ | is the size of a state vector set of all nodes in a propagation tree, wherein G represents a set of nodes to be predicted, Φ is a set of all observable nodes, h^(t-1)TAnd (3) representing the hidden state of the node (t-1) at the moment, subscripts are node serial numbers, b is an offset vector, and then, updating the state of each node by using a recurrent neural unit GRU:

in the above formula, W_zAnd U_zTraining parameter matrix, W, representing updated gating cells_rAnd U_rTraining parameter matrix, W, representing reset gating cells₁And U₁A corresponding parameter matrix representing the output layer.

Propagation t ═ t_hThe final state of the target item g is then:

in the above formula, the first and second carbon atoms are,

is the environmental state vector in which the target item is located.

3. And a prediction part.

Vector of competition pressure status of target item

And the environmental state vector of the target item

And combining, and through a full connection layer, wherein the activation function of the full connection layer is a ReLU function, so as to predict the initial financing result of the target project:

4. and (4) a joint training part.

In this embodiment, parameters in the modeling and prediction unit are jointly trained, taking into account two losses:

the first partial Loss is denoted Loss_pThe Mean Absolute Error (MAE) is calculated, expressed as:

in the above formula, y_gReal initial financing results for the target item g;

the second partial Loss is denoted Loss_lRepresenting the Loss of the long and short term memory network in the calculation process of the competitive pressure state vector, namely calculating the MAE Loss of the LSTM output of the competitive object competitiveness in the competitive module PCM, formula and Loss_pIn agreement, i.e. initial financing status of each issued item i calculated by the long-short term memory network

Mapping to one-dimensional y'_iNamely, the following steps are provided:

in the above formula, y_iThe real initial financing result of the published item i.

Will lose_pAnd Loss_lPerforming combined training, and respectively defining corresponding weight coefficients, wherein the loss function of the training is as follows:

wherein Θ represents a parameter set to be trained of the model, η represents a weight coefficient, the model parameters are updated by using a Stochastic Gradient Descent (SGD) algorithm, and the initial learning rate is defined to be 0.02.

According to the scheme of the embodiment of the invention, by utilizing the fusion of various metadata, the market environment is focused on modeling so as to evaluate the initial stage financing performance of the un-started crowd funding project, so that whether the pre-release time of the project is proper or not is judged, and the project is ensured to have better starting performance; meanwhile, the system is built on related hardware equipment to form a set of complete products, and a user can visually display various information and final prediction results in the prediction process through the related products, so that the user experience is greatly improved, and the user can conveniently know the related conditions of crowd funding projects.

Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A crowd funding project initial stage financing performance prediction system is characterized by comprising:

2. The crowd funding project initial financing performance prediction system as claimed in claim 1, wherein the processing of the content information of the target project and other published projects before the target project pre-publishing time to obtain the corresponding feature vector comprises:

discretizing the numerical type in the content information to obtain a one-hot coded vector; processing the text type by using a text turning method in a natural language processing technology to obtain a corresponding vector; splicing the vectors corresponding to each type to obtain corresponding characteristic vectors;

the content information includes: a project description, a project category, an initiator type, a current exchange rate, a target financing period, and a target financing amount.

3. The crowd funding project initial financing performance prediction system as claimed in claim 1, wherein the financing time sequence of project i issued before the target project pre-issuance time is:

in the above formula, v represents the investment amount, t represents the time stamp of the investment, subscript is the number of the investment times, | S_i| represents the total investment;

will fund time series S_iObtaining corresponding time series vector TS through embedding layer processing_i，TS_i＝[ξ₀，ξ₁，...，ξ₂₃]A time series representing item i over the past 24 hours;

ξ_k＝log₂(∑v_l)

in the above formula, v_j∈S_i，T_g-(k+1)*Δ≤t_l＜T_g-k Δ, k ═ 0, 1.., 23, Δ denotes a time interval of 1 hour.

4. The crowd-funding project initial financing performance prediction system as claimed in claim 1, wherein the obtaining of the competition pressure state vector of the target project according to the feature vector of the target project, the feature vector and the time series vector of other published projects, and the competition relationship between the long-short term memory network and the attention network modeling project comprises:

predicting the initial financing state based on the time sequence vector of the published item by using a long-short term memory network:

in the above formula, TS_iTime series vector representing published item i, Ψ represents T_gSet of published items, T, that are running on the market at a moment_gTime of day indicates the target item gReleasing time;

using a pruning method, a published item from the set Ψ that is most likely to compete with the target item at the beginning of the item, i.e., T is selected_gItems that are time in the just-funded tile and the same category tile as the target item, are represented using the adjacency matrix:

in the above formula, the first and second carbon atoms are,

indicating that item i and item j have a continuous edge,

indicating that item i and item j do not have a continuous edge,

is to map the id in the set Ψ of published items into the column of the adjacency matrix, C_iAnd C_jIndicates the categories to which the item i and the item j belong, T_iAnd T_jRepresenting the pre-release time of the item i and the item j;

and (3) carrying out neighbor information aggregation on the target item g by using the graph attention network:

e_gi＝V^T[Wx_g||Wx_i]

in the above formula, W_hAnd representing a mapping parameter matrix optimized by learning in training.

5. The crowd funding project initial financing performance prediction system according to claim 1, wherein the obtaining of the environmental state vector of the target project according to the feature vector of the target project, the feature vector of other published projects and the financing time sequence in combination with the propagation tree structure modeling of the historical market environment comprises:

h_j＝[x_j||r_j]

in the above formula, x_jFeature vector, r, representing item j_jInitial financing number representing item j:

in the above formula, T_jRepresents the pre-release time of the item j; s_jSequence of financing times, v, representing item j_lRepresents the amount of the first investment; t is t_lA timestamp representing the first investment; n is_hRepresents 24 hours of the day, and wherein there is a constraint: t is_i-T_j＞n_h*Δ，T_iRepresenting the pre-release time of the item i, and Δ representing a time interval of 1 hour, in such a way that the item i can observe the initial financing state of the item j, i.e. the time at which the item i is released at which the item j has been released exceeds 1 day, so that the item i can observe the initial financing state of the item jDefining j as an observable node of i;

if the historical days are t_hThen the set of observable nodes is: phi_i＝{j|n_h*Δ＜T_i-T_j＜n_h*t_hΔ}；

Building a propagation number structure, and dividing t into_hItems newly released every day in the day are arranged in the same layer of the tree, the financing states of all items released in the day closest to the predicted releasing time are used as tree nodes, and the tree nodes are arranged in the first layer of the root node of the propagation tree and connected with the root node; the financing states of all projects on the next day closest to the expected release time are used as a second layer of the root nodes, each node is connected with the node closest to the node in the first layer, and the finally generated propagation tree structure is represented by an adjacency matrix gamma;

in the above formula, x_g、x_iFeature vectors r representing target items g and i, respectively_iIndicating the initial financing number of the item i;

in each subsequent propagation process, the information aggregation mode of each node is as follows:

wherein

The neighbor nodes of the node v are represented in the adjacency matrix gamma, | G ∪ Φ | is the size of the state vector set of all the nodes in the propagation tree, G represents the node set to be predicted, Φ is the set of all the observable nodes, h^(t-1)TIndicating the time of the node (t-1)Hidden state, subscript is node sequence number, b is offset vector;

then, the state of each node is updated using a recurrent neural network:

in the above formula, W_zAnd U_zTraining parameter matrix, W, representing updated gating cells_rAnd U_rTraining parameter matrix, W, representing reset gating cells₁And U₁A corresponding parameter matrix representing an output layer;

propagation t ═ t_hThe final state of the target item g is then:

in the above formula, the first and second carbon atoms are,

is the environmental state vector in which the target item is located.

6. The crowd-funding project initial-stage financing performance prediction system as claimed in claim 1, wherein the prediction of the initial financing result of the target project using the competition pressure state vector of the target project and the environmental state vector of the target project comprises:

subject the target item to competitionVector of pressure state

And the environmental state vector of the target item

7. the crowd funding project initial financing performance prediction system as claimed in claim 1 or 6, wherein the method further comprises: training parameters in the modeling and prediction unit, wherein a training loss function is as follows:

in the above formula, Θ represents a parameter set to be trained of the model, and η represents a weight coefficient;

loss term Loss_pThe calculation formula of (2) is as follows:

in the above formula, the first and second carbon atoms are,

a target item set in the training process is shown, and g is a target item; y is_gReal initial financing results for the target item g;

loss term Loss_lLoss of the long-term and short-term memory network in the process of calculating the competitive pressure state vector; initial financing status of published item i that is to be predicted by long and short term memory network

Mapping to one-dimensional y'_iNamely, the following steps are provided:

in the above formula, y_iAnd (5) obtaining a real initial financing result of the published item i.