CN112750042A

CN112750042A - Data processing method and device and electronic equipment

Info

Publication number: CN112750042A
Application number: CN201911044056.6A
Authority: CN
Inventors: 陈尧
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2021-05-04

Abstract

The disclosure provides a data processing method and device and electronic equipment. The data processing method comprises the following steps: acquiring text data including an object; extracting a plurality of word vector groups including a correspondence of objects and categories for the text data; acquiring a dynamic index of the object; and determining the dynamic indexes of the categories according to the plurality of word vector groups and the dynamic indexes of the objects. The data processing method provided by the disclosure can automatically acquire dynamic indexes of a plurality of categories according to text data related to the object and the dynamic indexes of the object.

Description

Data processing method and device and electronic equipment

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to a data processing method and apparatus capable of automatically determining a stock index change rate corresponding to a stock concept, and an electronic device.

Background

In the field of data mining, data corresponding to a plurality of categories is often acquired according to data of a large number of individual objects, for example, stock fingers of a certain stock concept are acquired according to stock price variation rate of individual stocks. When an individual object definitely belongs to a certain category, the data processing process is simple, but when the individual object simultaneously belongs to a plurality of categories, how to determine the influence of the data of the individual object on the data of the categories is often very complicated.

In the related art, which categories a certain individual object relates to is often marked manually, or even the proportion of a certain individual object to each category is marked. However, in some complex scenarios, such as a stock index compilation process, a listed company often involves multiple businesses, data of the listed company can affect multiple stock concepts, and a manual labeling method is difficult to face the complex business field of the listed company and cannot identify newly appeared businesses, new stock concepts and the like in time.

Therefore, there is a need for a data mining technique that can automatically identify the category to which an individual object relates and automatically determine the influence of data changes of the individual object on data changes of the category.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a data processing method, apparatus, and electronic device for overcoming, at least to some extent, the problem of difficulty in accurately determining a relationship between individual object data and category data in a data mining process due to limitations and disadvantages of the related art.

According to a first aspect of the embodiments of the present disclosure, there is provided a data processing method, including: acquiring text data including an object; extracting a plurality of word vector groups including a correspondence of objects and categories for the text data; acquiring a dynamic index of the object; and determining the dynamic indexes of the categories according to the plurality of word vector groups and the dynamic indexes of the objects.

In an exemplary embodiment of the present disclosure, the determining the dynamic index of the category according to the plurality of word vector groups and the dynamic index of the object includes:

determining weights of a plurality of objects to a target category according to the plurality of word vector groups;

determining the product of the weight of each object to the target category and the dynamic index of the object;

taking the sum of a plurality of products corresponding to a plurality of objects as a numerator;

taking the sum of the weights of the plurality of objects to the target category as a denominator;

and determining the dynamic index corresponding to the target category according to the ratio of the numerator to the denominator.

In an exemplary embodiment of the present disclosure, the determining weights of the plurality of objects to the target category according to the plurality of word vector groups comprises:

inputting the word vector groups into a preset neural network model to obtain a plurality of output values;

determining m output values corresponding to the word vector group which simultaneously comprises the target object and the target category in the plurality of output values, wherein m is more than or equal to 1;

determining n output values corresponding to the word vector group comprising the target object in the plurality of output values, wherein n is more than or equal to 1;

and determining the weight of the target object to the target class according to the ratio of the sum of the m output values to the sum of the n output values.

In an exemplary embodiment of the present disclosure, the loss function of the preset neural network model includes:

where L is the loss function, j is the number of the object class, i₁、i₂Is the number of the target object, w is the weight of the target object to the target class, and x is the dynamic indicator of the target object.

In an exemplary embodiment of the present disclosure, the training process of the preset neural network model includes:

initializing network model parameters of the preset neural network model, or setting the network model parameters of the previous preset time length as the current network model parameters of the preset neural network model;

acquiring a training data set comprising the text data, and extracting the text data to acquire a plurality of word vector groups;

inputting a word vector group corresponding to a preset time length into the preset neural network model, and enabling the preset neural network model to adjust network model parameters so as to minimize the value of the loss function;

and training the preset neural network model by using a plurality of word vector groups corresponding to preset time lengths, and determining final network model parameters of the preset neural network model after training is finished.

In an exemplary embodiment of the present disclosure, the object is a listed company name, the category is a stock concept, and the dynamic index is a stock price change rate.

In an exemplary embodiment of the present disclosure, the word vector group includes a correspondence of a listed company name, a behavior, and a stock concept.

According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus including:

a text data acquisition module configured to acquire text data including an object;

a relational data extraction module configured to extract a plurality of word vector groups including a correspondence of objects and categories for the text data;

an object dynamic index obtaining module configured to obtain a dynamic index of the object;

and the category dynamic index determining module is configured to determine the dynamic indexes of the categories according to the plurality of word vector groups and the dynamic indexes of the objects.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a data processing method as recited in any of the above.

According to the method and the device, the corresponding relation between the objects and the categories is extracted through a large amount of text data, the corresponding relation data and the dynamic indexes of the objects are input into the trained data processing model, the relation between the objects and the categories and the relation between the data of the objects and the data of the categories can be accurately determined, new categories are identified in time, the data corresponding to the categories are more accurately determined, and the accuracy of data mining is improved by using a machine learning method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 is a flowchart of a data processing method in an exemplary embodiment of the present disclosure.

Fig. 2 is a flow chart of sub-steps of step S4 in an exemplary embodiment of the present disclosure.

Fig. 3 is a flowchart of sub-steps of step S41 in an exemplary embodiment of the present disclosure.

FIG. 4 is a flowchart of a training process for a neural network model in an exemplary embodiment of the present disclosure.

FIG. 5 is a schematic diagram of one application scenario of an embodiment of the present disclosure.

Fig. 6 is a block diagram of a data processing apparatus in an exemplary embodiment of the present disclosure.

FIG. 7 is a block diagram of an electronic device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Further, the drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network models and/or processor means and/or microcontroller means.

The following detailed description of exemplary embodiments of the disclosure refers to the accompanying drawings.

Fig. 1 schematically shows a flow chart of a data processing method in an exemplary embodiment of the present disclosure. Referring to fig. 1, a data processing method 100 may include:

step S1, acquiring text data including an object;

step S2, extracting a plurality of word vector groups including the correspondence between objects and categories for the text data;

step S3, acquiring the dynamic index of the object;

step S4, determining the dynamic index of the category according to the plurality of word vector groups and the dynamic index of the object.

In step S1, text data including the object is acquired.

Text data can be obtained from a variety of sources depending on the kind of object. For example, when the object is a name of a listed company, text data related to the name of the listed company may be acquired from a source such as a WeChat public number or a Web page of a self-selected stock financial information, and a method of acquiring the text data may be crawled by a crawler, for example. In some embodiments, the retrieval of textual data relating to which objects (listed companies) is to be obtained may be specified based on data such as a listing of listed companies. Of course, the sources of the text data are only examples, and those skilled in the art can select more sources to improve the data recognition accuracy.

In step S2, a plurality of word vector groups including the correspondence of objects and categories are extracted for the text data.

In the embodiment of the disclosure, data extraction can be performed on the text data through the graph network model. Graph Network models (GNs) are a collection of functions organized in a Graph structure within a topological space (topology space) for relational reasoning. Among deep learning theories are the generalization of Graphical Neural Network (GNN) and Probabilistic Graphical Model (PGM). The graph Network model is composed of a graph Network model block (GN block), has a flexible topological structure, and can be customized into various connection idea (connectionist) models, including a Feed forward Neural Network model (Feed forward Neural Network), a recurrent Neural Network model (recurrent Neural Network), and the like. More general graph network models are suitable for processing data having graph structures, such as knowledge graphs, social network models, molecular network models, and the like. In the embodiment of the present disclosure, the applied graph network model includes, but is not limited to, a probabilistic graph model, a graph convolution model, a deep probabilistic graph model, a hidden markov model, and the like.

In one embodiment of the present disclosure, knowledge graph extraction may be achieved by a neural network model trained based on a deep learning mechanism. The structure of the neural network model may include various structures such as a feedforward neural network model, a recurrent neural network model, and the like. The graph network model may be coupled to a neural network model that identifies dynamic indicators for the classes to form a data processing model.

The basis on which the graph network model extracts the word vector group may be determined according to the kind of the object processed. For example, when the data processing task is stock index compilation, the object is a listed company name, and the category is a stock concept, the text content can be extracted according to the "financial knowledge graph" as: (xx corporation, investment, bitcoin concept) and the like (corporation i, action a, concept j).

It is worth mentioning that for the task of compiling stock indices, the word vector groups may be in the form of triples as shown above. Since the simple correspondence between companies and concepts is not enough to describe what aspect of the concept is influenced by the companies, for example, a company investing in concept a, a company developing concept a, and a listed company operating concept a, the contribution to the stock index change of concept a is different, and therefore, a reference index of "action" of the listed company is added in the embodiment of the present disclosure to more accurately determine the influence degree of each listed company on the data of one stock concept. In other data processing tasks, a person skilled in the art may determine other corresponding relationships besides the corresponding relationship between the object and the category by himself/herself to improve the determination accuracy, which is not limited in this disclosure.

In step S3, a dynamic index of the object is acquired.

In one embodiment, the dynamic indicator is, for example, a price change rate of a stock.

The current day text data can be acquired and the current day stock price change rate of all listed companies can be acquired from the public market information with the day as a time period, and the stock price change rate x of the company i is calculated by the following formula (1):

wherein, P_tIs the stock closing price for company i on day t, P_t-1Is the stock closing price for company i on day t-1.

In step S4, a dynamic index of the category is determined according to the plurality of word vector groups and the dynamic index of the object.

In the embodiment of the present disclosure, the data processing task is completed through the trained neural network model, and the dynamic indicators of multiple categories are determined according to the dynamic indicators of multiple word vector groups and objects.

Fig. 2 is a flowchart of sub-steps of step S4.

Referring to fig. 2, step S4 may include:

step S41, determining the weight of a plurality of objects to the target category according to the plurality of word vector groups;

step S42, determining the product of the weight of each object to the target category and the dynamic index of the object;

step S43, defining the sum of the products corresponding to the objects as a numerator;

a step S44 of setting the sum of the weights of the plurality of objects for the target category as a denominator;

and step S45, determining the dynamic index corresponding to the target category according to the ratio of the numerator to the denominator.

Taking the task of compiling stock indexes as an example, for a trained preset neural network model (the network model structure is f, the network model parameter is w), all companies i, behaviors a and concepts j can be initialized to word vectors c_i、v_a、h_jThe tri-gram vector set is described as (c)_i,v_a,h_j) And inputting a plurality of trigram vector groups into the preset neural network model. In the embodiment of the present disclosure, the trigram vector group may be used as three word vectors and simultaneously input to the preset neural network model, or the trigram vector group may be used as three word vectors and simultaneously input to the preset neural network modelThe group connection is a word vector input preset neural network model, which is not limited by the present disclosure.

The main process of the preset neural network model to determine the concept j can be described by formula (2):

where j is the serial number of concept j, i is the serial number of company i, w_ijIs the weight of company i to concept j, (i, j) refers to the set of word vectors that relate to both company i and concept j, z_jIs the rate of change of the stock index corresponding to concept j.

The weight of a company i to a concept j refers to the proportion, w, of the company i's business field that relates to the concept j_ijIt can be first calculated by equation (3):

where f is the neural network model, (i,) refers to the entire set of word vectors that refer to company i.

The meaning of formula (3) can be represented by the flow chart shown in fig. 3.

Referring to fig. 3, step S41 may include:

step S411, inputting the word vector groups into a preset neural network model to obtain a plurality of output values;

step S412, m output values corresponding to the word vector group which simultaneously comprises the target object and the target category are determined in the output values, and m is larger than or equal to 1;

step S413, determining n output values corresponding to the word vector group comprising the target object in the output values, wherein n is larger than or equal to 1;

and step S414, determining the weight of the target object to the target category according to the ratio of the sum of the m output values to the sum of the n output values.

In equation (3), sigma_(i,,j)f(c_i,v_a,h_j| W) corresponds to the above-mentioned characterSum of m output values in the description, Σ_(i,,)f(c_i,v_a,h_j| W) corresponds to the sum of the n output values in the above-described text description.

The neural network model may be structured in various ways, for example, may include at least pairs of c_i、v_a、h_jThree layers of treatment were performed.

FIG. 4 is a flow chart of a training process for a neural network model.

Referring to fig. 4, the training process of the neural network model may include:

step S401, initializing network model parameters of the preset neural network model, or setting the network model parameters of the previous preset time length as the current network model parameters of the preset neural network model;

step S402, acquiring a training data set including the text data, and extracting the text data to acquire a plurality of word vector groups;

step S403, inputting a word vector group corresponding to a preset time length into the preset neural network model, and enabling the preset neural network model to adjust network model parameters so as to minimize the value of the loss function;

step S404, training the preset neural network model by using a plurality of word vector groups corresponding to preset time lengths, and determining final network model parameters of the preset neural network model after training.

In the task of compiling stock indexes, a neural network model can be trained by using text data and stock price information corresponding to a plurality of dates.

First, the parameters of the neural network model can be initially set to D₀And training the neural network model for the first time by using text data and stock price information corresponding to the first date of the plurality of dates, wherein the training target is to minimize the value of the loss function L in the formula (4):

wherein i₁And i₂Each refers to the serial number of any two companies. The derivative of the loss function with respect to the network parameters (c, v, h) can be used to adjust the network parameters to minimize the value of the loss function, using "newton's method" or the like.

The loss function is defined by using the stock price change rate as the supervision information and applying the same concept j, company i₁Multiplying the weight of concept j by company i₁The rate of change of stock price on the day should be as close as possible to company i₂Multiplying the weight of concept j by company i₂The rate of change of the day's stock price. That is, after the weights of different companies are adjusted and determined based on the knowledge graph, the influence on the concept j should be consistent.

For example, three companies, company a, company B, and company C, are involved in the game concept in the statistics, and the weight of the game concept in the business of company a is 70%, the weight of the game concept in the business of company B is 50%, and the weight of the game concept in the business of company C is 30%. Then, the training target of the neural network model is to adjust the network parameter D so that the stock price change rate of company a, the weight of company B, and the stock price change rate of company C are as close as possible to each other, and then determine the trained network parameter.

Wherein, the network parameter of the neural network model trained by using the data of the 1 st day can be named as D₁The network parameters of the neural network model trained using the day 2 data may be named, for example, D₂And so on. The neural network parameters before training for each day of data may be copied to the network parameters determined after training using the previous day of data, as shown in equation (5):

D_t＝D_t-1………………………………………(5)

where t is the number of the date corresponding to the training data.

In the training process, the Yard server can be used for training the model parameters, and the sample data is stored in the HDFS distributed storage environment.

After the training data set is completely trained, the training data set is to be usedModel parameters D after training by using data on the Tth day (T is the largest date and number in the training data set)_TAnd curing the parameters into parameters of the neural network model, namely forming a data processing model capable of realizing the data processing process from the step S1 to the step S4. It is worth mentioning that the network parameter D_tIncluding at least a pair-word vector c_i、v_a、h_jNetwork parameters of the three layers that are processed.

In some embodiments, the knowledge-graph structure may be complex, such that the neural network model and the loss function described above may be very complex. For example, company a's business involves hundreds of concepts of social interaction, internet advertising, cell phone games, cloud, content, and so on.

In one embodiment of the present disclosure, the network structure may be simplified using a random deactivation (dropout) structure or adding a penalty directly in the loss function.

Random inactivation (dropout) is a method for optimizing an artificial neural network with a deep structure, and in the learning process, partial weights or outputs of hidden layers are randomly zeroed, so that interdependency (co-dependency) among nodes is reduced, regularization (regularization) of the neural network is realized, and the structural risk (structural risk) of the neural network is reduced. The implementation method of random inactivation may be various according to different structures of the neural network, for example, for a Multi-Layer Perceptron (MLP), random inactivation usually zeros the output of the selected node; for Convolutional Neural Networks (CNN), random deactivation can randomly zero some elements of a Convolutional kernel, i.e., random connection deactivation (drop connect), or randomly zero channels of an entire feature map in a multi-channel case, i.e., spatial random deactivation (spatial drop); for Recurrent Neural Networks (RNNs), random deactivation can be applied to the input and state matrices at each time step according to the topology of the Network.

Adding a penalty term in the loss function, for example, multiplying the inverse of the square of all weights by an attenuation coefficient, and then calculating the sum of the weights after attenuation, the smallest weight value can be penalized, so that the concept with the smaller weight value does not participate in the calculation. Of course, there are many ways to add the penalty term, and those skilled in the art can set the form of the penalty term according to the network structure and the actual situation, and the disclosure is not limited thereto.

After the network structure is simplified by using the ways of random inactivation, adding penalty terms and the like, the calculation efficiency can be improved, for example, only a plurality of concepts with the maximum weight (head) in the business of company A, such as two concepts of internet advertisement and mobile phone game, are considered.

In yet another embodiment of the present disclosure, a neural network model may be used to perform a series of tasks such as text data capture, object dynamic index capture, word vector group extraction, weight calculation, category dynamic index extraction, and the like. At this time, the neural network model is formed by sequentially splicing the graph network model and the data processing model in the foregoing, the input layer is used for receiving the text data and the dynamic index of the object, and the output layer is used for outputting the dynamic index of the category.

Since the neural network model needs to perform word vector group extraction, the loss function in the training process can be expressed as:

wherein, sigma_(i,a,j)l(c_i,v_a,h_j) Is a loss function of the traditional knowledge graph mapping method (extracting word vectors for text data), namely a loss function of a graph network model,

is the loss function of the dynamic index corresponding to the word vector calculation category in equation (4).

By using one data processing model to automatically capture and process the text data, dynamic indexes of a plurality of categories can be automatically obtained. The universality of the text data helps to identify new classes, and the analysis of the text data by the data processing model helps to improve the accuracy of determining the weight of the object to the target class.

In summary, according to the embodiment of the present disclosure, a large amount of text data is processed by using a graph network model, a corresponding relationship between an object and a category is extracted, and then a weight of the object to the category is calculated, and dynamic indexes of a plurality of categories are determined according to the weight and the dynamic indexes of the object, so that a new category can be identified in time, accuracy of determining the weight of the object to the category is improved, problems of untimely information, inaccurate labeling and the like caused by considering a labeled object category in a related technology are avoided, and precision of data mining can be effectively improved.

Referring to fig. 5, in a stock index compilation scenario, the following process may be performed by a trained neural network model to determine a real-time conceptual stock index change rate:

step S51, capturing real-time text data including the name of the listed company from the sources such as WeChat public number, self-selected stock news information, etc.;

step S52, obtaining real-time stock price information of a plurality of listed companies from the exchange platform, and calculating stock price change rates of the plurality of listed companies according to the closing price information of the previous day and a formula (1);

step S53, processing the text data into a trigram vector group in the form of (company i, behavior a, concept j) according to the financial knowledge map by using the neural network layer of the graph network model part;

step S54, determining the weight of the concept j of the company i according to the ternary word vector group and the formula (3);

step S55, determining the stock price change rate of concept j according to the above weight, stock price change rate and formula (2).

The method can be used for index compilation of stock concepts and financial products such as stocks, futures, options, bonds, exchange rates, interest rate derivatives and the like. Of course, the method of the embodiment of the disclosure is not limited to processing index writing of financial products, and may also be applied to other various scenarios, and the disclosure is not limited thereto.

Corresponding to the above method embodiment, the present disclosure also provides a data processing apparatus, which may be used to execute the above method embodiment.

Fig. 6 schematically shows a block diagram of a data processing apparatus in an exemplary embodiment of the present disclosure.

Referring to fig. 6, the data processing apparatus 600 may include:

a text data obtaining module 602 configured to obtain text data including an object.

A relation data extraction module 604 arranged to extract a plurality of word vector groups comprising correspondences of objects and categories for the text data.

An object dynamic indicator obtaining module 606 is configured to obtain a dynamic indicator of the object.

A category dynamic indicator determining module 608 configured to determine a dynamic indicator of the category according to the plurality of word vector groups and the dynamic indicator of the object.

In an exemplary embodiment of the disclosure, the category dynamic index determination module 608 includes:

a weight determination unit 6081 configured to determine weights of the plurality of objects to the target category from the plurality of word vector groups;

a comprehensive calculation unit 6082 arranged to determine a product of the weight of each object to the target category and the dynamic index of the object; taking the sum of a plurality of products corresponding to a plurality of objects as a numerator; taking the sum of the weights of the plurality of objects to the target category as a denominator; and determining the dynamic index corresponding to the target category according to the ratio of the numerator to the denominator.

In an exemplary embodiment of the present disclosure, the weight determination unit 6081 is configured to:

In an exemplary embodiment of the present disclosure, the system further includes a model training module 610 configured to:

Since the functions of the apparatus 600 have been described in detail in the corresponding method embodiments, the disclosure is not repeated herein.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 700 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one memory unit 720, and a bus 730 that couples various system components including the memory unit 720 and the processing unit 710.

Wherein the storage unit stores program code that is executable by the processing unit 710 such that the processing unit 710 performs the steps according to various exemplary embodiments of the present invention as described in the above section "exemplary method" of the present specification. For example, the processing unit 710 may execute step S1 as shown in fig. 1: acquiring text data including an object; step S2: extracting a plurality of word vector groups including a correspondence of objects and categories for the text data; step S3: acquiring a dynamic index of the object; step S4: and determining the dynamic indexes of the categories according to the plurality of word vector groups and the dynamic indexes of the objects.

The storage unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.

The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network model environment.

Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 800 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more network models (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network model such as the internet) via the network adapter 760. As shown, the network adapter 760 communicates with the other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network model, and includes several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network model device, etc.) execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

The program product for implementing the above method according to an embodiment of the present invention may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any of a variety of network models, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A data processing method, comprising:

acquiring text data including an object;

extracting a plurality of word vector groups including a correspondence of objects and categories for the text data;

acquiring a dynamic index of the object;

and determining the dynamic indexes of the categories according to the plurality of word vector groups and the dynamic indexes of the objects.

2. The data processing method of claim 1, wherein the determining the dynamic indicator for the category based on the plurality of word vector groups and the dynamic indicator for the object comprises:

3. The data processing method of claim 2, wherein said determining weights of a plurality of objects to a target class from the plurality of word vector groups comprises:

4. The data processing method of claim 3, wherein the loss function of the predetermined neural network model comprises:

5. The data processing method of claim 4, wherein the training process of the pre-set neural network model comprises:

6. The data processing method according to any one of claims 1 to 5, wherein the object is a name of a listed company, the category is a stock concept, and the dynamic index is a stock price change rate.

7. The data processing method of claim 6, wherein the word vector group includes a correspondence of a listed company name, a behavior, and a stock concept.

8. A data processing apparatus, comprising:

9. An electronic device, comprising:

a memory; and

a processor coupled to the memory, the processor being configured to perform the data processing method of any of claims 1-7 based on instructions stored in the memory.

10. A computer-readable storage medium on which a program is stored, which program, when executed by a processor, implements a data processing method according to any one of claims 1 to 7.