CN111160484A

CN111160484A - Data processing method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN111160484A
Application number: CN201911419795.9A
Authority: CN
Inventors: 缪畅宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-15
Anticipated expiration: 2039-12-31
Also published as: CN111160484B

Abstract

The embodiment of the application discloses a data processing method, a data processing device, a computer-readable storage medium and electronic equipment, wherein a user time sequence sample set is acquired, and initial positive sample data and initial negative sample data are generated from the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model; identifying a user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutant negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain the second trained preset neural network model. Therefore, through secondary training, the second trained preset neural network model with the sequence division function is obtained, automatic division of user time sequence data is achieved, and the efficiency and accuracy of data processing are greatly improved.

Description

Data processing method and device, computer readable storage medium and electronic equipment

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a computer-readable storage medium, and an electronic device.

Background

With the wide application of networks and the rapid development of terminal technologies, the connection between the life and work of the whole human society and the terminal technologies is more and more intimate, and in order to provide more intelligent services for users, a series of behavior characteristic data of the users interacting on the terminals are often required to be acquired, and then the preferences and habits of the users are analyzed.

In the prior art, a terminal may obtain a plurality of behavior feature data that a user continuously interacts with the terminal, for example, obtain a plurality of behavior feature data that the user continuously purchases a plurality of commodities, and manually divide the plurality of behavior feature data to obtain behavior feature data of a plurality of categories.

In the course of research and practice on the prior art, the inventors of the present application found that, in the prior art, although a means for manually classifying a plurality of behavior feature data is provided, the efficiency of data processing is greatly reduced by manually classifying the behavior feature data.

Disclosure of Invention

The embodiment of the application provides a data processing method and device, a computer readable storage medium and an electronic device, which can improve the efficiency and accuracy of data processing.

In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:

a method of data processing, comprising:

acquiring a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set;

inputting the initial positive sample data and the initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model;

identifying the user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutant negative sample data;

and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model, wherein the second trained preset neural network model is used for carrying out sequence division on the user time sequence data set to be divided.

Correspondingly, an embodiment of the present application further provides a data processing apparatus, including:

the generating unit is used for acquiring a user time sequence sample set and generating initial positive sample data and initial negative sample data from the user time sequence sample set;

the first training unit is used for inputting the initial positive sample data and the initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model;

the determining unit is used for identifying the user time sequence sample set according to the first trained preset neural network model and determining target positive sample data and mutant negative sample data;

and the second training unit is used for inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model, and the second trained preset neural network model is used for performing sequence division on the user time sequence data set to be divided.

In some embodiments, the generating unit includes:

the acquisition subunit is used for acquiring a user time sequence sample set;

a selecting subunit, configured to select a first target user timing sample and a corresponding first target user timing sample sequence from the user timing sample set in sequence;

the positive labeling subunit is used for performing positive labeling on the first target user time sequence sample, and determining the first target user time sequence sample after positive labeling and a corresponding first target user time sequence sample sequence as initial positive sample data;

the setting subunit is configured to set a first preset number of second target user time sequence samples, where a category of the second target user time sequence samples is different from a category of the user time sequence sample set;

and the negative labeling subunit is used for performing negative labeling on the second target user time sequence sample, and determining the second target user time sequence sample and the first target user time sequence sample sequence after the negative labeling as initial negative sample data.

In some embodiments, the selecting subunit is configured to:

sequentially selecting a first target user time sequence sample from the user time sequence sample set according to the time sequence;

a first target user timing sample sequence generated from a plurality of user timing samples that are consecutive in time in a user timing sample set with the first target user timing sample is obtained.

In some embodiments, the selecting subunit is further configured to:

selecting a first target user time sequence sample sequence formed by a preset number of user time sequence samples which are continuous in time based on the first target user time sequence sample; or

Determining a user timing sample of the set of user timing samples that is left excluding the first target user timing sample as a first target user timing sample sequence.

In some embodiments, the preset neural network model is a recurrent neural network model, and the first training unit is configured to:

modeling the time sequence sample sequence of the first target user through a recurrent neural network model to obtain a corresponding sequence vector;

inputting the first target user time sequence sample, positive labeling information and a corresponding sequence vector into an activation function layer of the recurrent neural network model for first training;

and inputting the second target user time sequence sample, the negative marking information and the corresponding sequence vector into an activation function layer of the recurrent neural network model for first training to obtain the recurrent neural network model after first training.

In some embodiments, the determining unit is configured to:

sequentially identifying each user time sequence sample in the user time sequence sample set through a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample;

determining a user time sequence sample with an output value larger than a preset positive sample threshold value as a third target user time sequence sample, and determining a user time sequence sample with an output value larger than a preset negative sample threshold value and smaller than a preset positive sample threshold value as a fourth target user time sequence sample;

positively labeling the third target user time sequence sample, and determining the positively labeled third target user time sequence sample and the corresponding first target user time sequence sample sequence as target positive sample data;

and performing negative labeling on the fourth target user time sequence sample, and determining the fourth target user time sequence sample subjected to negative labeling and the corresponding first target user time sequence sample sequence as mutation negative sample data.

In some embodiments, the preset neural network model is a recurrent neural network model, and the second training unit is configured to:

modeling the first target user time sequence sample sequence through the initialized recurrent neural network model to obtain a corresponding sequence vector;

inputting the third target user time sequence sample, the positive marking information and the corresponding sequence vector into an activation function layer of the initialized recurrent neural network model for second training;

and inputting the fourth target user time sequence sample, the negative marking information and the corresponding sequence vector into an activation function layer of the initialized recurrent neural network model for second training to obtain the second trained recurrent neural network model.

In some embodiments, the data processing apparatus further comprises:

the classification unit is used for classifying the user time sequence data in the user time sequence data set to be divided based on the second trained recurrent neural network model;

and the dividing unit is used for dividing the user time sequence data of the same type into the same user time sequence to obtain a plurality of sections of user time sequence sequences.

In some embodiments, the classification unit is configured to:

determining the first user time sequence data in the user time sequence data set as a target user time sequence according to the time sequence;

acquiring target user time sequence data after the target user time sequence;

inputting the target user time sequence data and the current target user time sequence into a second trained recurrent neural network model, and outputting a classification value corresponding to the target user time sequence data;

when the classification value is greater than the preset confidence coefficient, merging the target user time sequence data into a target user time sequence, and returning to execute the step of acquiring target user time sequence data after the target user time sequence until the user time sequence data classification is finished;

and when the classification value is not greater than the preset confidence, storing and ending the current target user time sequence, generating a new target user time sequence based on the target user time sequence data, and returning to execute the step of acquiring the target user time sequence data after the target user time sequence until the classification of the user time sequence data is ended.

Correspondingly, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to perform the steps in the data processing method.

Correspondingly, an embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory stores a plurality of computer instructions, and the processor loads the computer instructions to execute the steps in any one of the data processing methods provided in the embodiments of the present application.

The method comprises the steps of collecting a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model; identifying a user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutant negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain the second trained preset neural network model. Therefore, corresponding target positive sample data and mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the initialized preset neural network is subjected to second training according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified through the preset neural network after the second training, further, the automatic division of the user time sequence data can be realized, and the efficiency and the accuracy of data processing are greatly improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 2a is a schematic flow chart of a data processing method according to an embodiment of the present disclosure;

FIG. 2b is a schematic structural diagram of a recurrent neural network model provided in an embodiment of the present application;

FIG. 3 is another schematic flow chart diagram of a data processing method according to an embodiment of the present application;

fig. 4 is a schematic view of an application scenario of a data processing method according to an embodiment of the present application;

FIG. 5a is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 5b is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present application;

FIG. 5c is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a data processing method, a data processing device and a computer readable storage medium.

Referring to fig. 1, fig. 1 is a schematic view of a scenario of a data processing system according to an embodiment of the present application, including: the terminal a and the server (the data processing system may also include other terminals besides the terminal a, and the specific number of the terminals is not limited herein), the terminal a and the server may be connected through a communication network, which may include a wireless network and a wired network, wherein the wireless network includes one or more of a wireless wide area network, a wireless local area network, a wireless metropolitan area network, and a wireless personal area network. The network includes network entities such as routers, gateways, etc., which are not shown in the figure. The terminal a may perform information interaction with the server through a communication network, for example, the terminal a may perform collecting a plurality of behavior feature data of a user continuously interacting on the terminal, generate a user time sequence sample set, and send the user time sequence sample set to the server.

The data processing system may include a data processing apparatus, which may be specifically integrated in a server, and in some embodiments, the data processing apparatus may also be integrated in a terminal having an arithmetic capability, and in this embodiment, the data processing apparatus is integrated in the server for description, as shown in fig. 1, the server acquires a user time series sample set sent by a terminal a, where the user time series sample set includes a plurality of user time series samples sorted according to time sequence, and generates initial positive sample data and initial negative sample data from the user time series sample set. And inputting the initial positive sample data and the initial negative sample data into a preset neural network model for first training to obtain the first trained preset neural network model. And identifying the user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutant negative sample data in the user time sequence sample set, wherein the mutant negative sample data is a mutation point in the whole user time sequence sample set. Inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training again to obtain a second trained preset neural network model, enabling the second trained preset neural network model to have the capability of identifying mutation points, further identifying the mutation points in the user time sequence data set to be divided based on the second trained preset neural network model, dividing the user time sequence data set to be divided into a plurality of sections of user time sequence data sequences, and obtaining sequence division results.

The data processing system may also include a terminal a that may install various user-desired applications such as instant messaging applications, e-commerce applications, and multimedia applications.

It should be noted that the scenario diagram of the data processing system shown in fig. 1 is only an example, and the data processing system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows, with the evolution of the data processing system and the occurrence of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

The following are detailed below. The numbers in the following examples are not intended to limit the order of preference of the examples.

The first embodiment,

In the present embodiment, description will be made from the viewpoint of a data processing apparatus which can be integrated specifically in a server having a storage unit and a microprocessor mounted thereon with an arithmetic capability.

A method of data processing, comprising: acquiring a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model; identifying a user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutant negative sample data; and inputting target positive sample data and mutant negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model, wherein the second trained preset neural network model is used for carrying out sequence division on a user time sequence data set to be divided.

Referring to fig. 2a, fig. 2a is a schematic flow chart of a data processing method according to an embodiment of the present disclosure. The data processing method comprises the following steps:

in step 101, a user time series sample set is collected, and initial positive sample data and initial negative sample data are generated from the user time series sample set.

It can be understood that, in order to provide a more intelligent recommendation service for a user, after obtaining a plurality of behavior feature data (i.e., user time series data) that a user continuously interacts with a terminal, a background server needs to divide the plurality of user time series data into user time series data sequences (sessions) according to categories, where the user time series data sequences are continuous interaction windows corresponding to user operation behaviors at the terminal, and the user time series data sequences can reflect potential continuous behaviors of the user, for example, when the user is in a bad mood, the user continuously listens to a plurality of impairment songs, and the user continuously purchases low-priced goods during a period of a sales promotion. The accurate behavior characteristic data sequence can better analyze the behavior characteristics of the user, and further more accurately provide accurate recommendation service for the user.

In some embodiments, the step of acquiring a user time series sample set may include: the method comprises the steps of obtaining a plurality of behavior characteristic data of a user continuously interacting on a terminal, wherein the behavior characteristic data are composed of multidimensional characteristics, and combining the behavior characteristic data according to a time sequence to generate a user time sequence sample set.

In this embodiment of the present application, a user time sequence sample set is first acquired, where the user time sequence sample set includes a plurality of user time sequence samples distributed in a time sequence, for example, the user time sequence sample set may be represented as { t1, t2, t3,.., tk }, where t1, t2, t3, and tk are all user time sequence samples representing that a user and a corresponding article have an interaction behavior in time sequence, the interaction time of t1 is the user time sequence sample closest to the current time, and the interaction time of tk is the user time sequence sample farthest from the current time. Each user time sequence sample is behavior characteristic data corresponding to one-time interaction of a user on the terminal, and the user time sequence sample may be formed by combining multidimensional characteristic data, for example, when the behavior characteristic data is a purchasing behavior, the multidimensional characteristic data may be a commodity identification characteristic, a commodity information characteristic, a purchasing time characteristic, and the like.

Furthermore, the server may excavate a plurality of initial positive sample data according to the user time sequence sample in the user time sequence sample set, where the initial positive sample data may be composed of a first target user time sequence sample corresponding to an item to be determined and interacted selected from the user time sequence sample set, and a first target user time sequence sample sequence corresponding to the user time sequence sample set and temporally continuous with the first target user time sequence sample, where labels of the initial positive sample data are all 1, and represent that the first target user time sequence sample and the corresponding first target user time sequence sample are correlated with each other. And because the user time sequence samples in the user time sequence sample set are all positive samples, random sampling can be performed from the articles which are not interacted by the user, and a plurality of initial negative sample data can be mined, wherein the initial negative sample data can be composed of a false user time sequence sample generated in the articles which are not interacted by the user and a first target user time sequence sample sequence which is randomly acquired, the type of the initial negative sample data is inconsistent with the type of the initial positive sample data, the labels of the initial negative sample data are all 0, and the false user time sequence sample and the first target user time sequence sample sequence are not correlated.

In some embodiments, the step of generating initial positive sample data and initial negative sample data from the user time series sample set may include:

(1) sequentially selecting a first target user time sequence sample and a corresponding first target user time sequence sample sequence from the user time sequence sample set;

(2) performing positive labeling on the first target user time sequence sample, and determining the first target user time sequence sample after positive labeling and a corresponding first target user time sequence sample sequence as initial positive sample data;

(3) setting a first preset number of second target user time sequence samples, wherein the category of the second target user time sequence samples is different from that of the user time sequence sample set;

(4) and performing negative labeling on the second target user time sequence sample, and determining the second target user time sequence sample and the first target user time sequence sample sequence after the negative labeling as initial negative sample data.

The corresponding first target user time sequence sample may be selected from the user time sequence sample set according to a chronological order, for example, t1 is selected from the user time sequence sample set { t1, t2, t 3. Accordingly, a first target user timing sample sequence related to the first target user timing sample is obtained, the first target user timing sample sequence may be a sequence related to the first target user timing sample in time, for example, the first target user timing sample sequence corresponding to t1 may be selected as { t2}, { t2, t3}, { t2,. once, tk }, and the like, and a specific sequence length may be selected according to a user setting, which is not limited herein. Since the first target user time sequence samples are all positive samples, the first target user time sequence samples may be positively labeled, in this embodiment, 1 is used as a positive label and 0 is used as a negative label, and the positively labeled first target user time sequence samples and the corresponding first target user time sequence sample sequence are determined as initial positive sample data, for example, { x ═ t 2., tk }, x' ═ t1, y ═ 1} are determined as one piece of initial positive sample data, and so on, and multiple pieces of initial positive sample data may be obtained.

Further, a first preset number of second target user timing samples may be set, which, in one embodiment, in order to prevent the negative samples from affecting the training effect of the positive samples, the first preset number may be limited to be smaller than the number of samples in the user time series sample set, the category of the second target user timing sample is different from the category of the user timing samples in the set of user timing samples, and, therefore, the second target user timing sample is not correlated with the first target user timing sample, the second target user timing sample is a negative sample, and thus, the time sequence sample of the second target user can be subjected to negative labeling, the time sequence sample of the second target user after the negative labeling and the time sequence sample sequence of the first target user obtained randomly are determined as initial negative sample data, for example, x ═ t 2.., tk }, x' ═ t0, y ═ 0} is determined as a piece of initial negative sample data.

In some embodiments, the step of sequentially selecting a first target user time series sample and a corresponding first target user time series sample sequence from the set of user time series samples may include:

(1.1) sequentially selecting a first target user time sequence sample from the user time sequence sample set according to the time sequence;

(1.2) obtaining a first target user timing sample sequence generated by a plurality of user timing samples in the user timing sample set that are consecutive in time with the first target user timing sample.

The first target user time sequence sample may be sequentially selected from the user time sequence sample set according to a time sequence order, for example, first, t1 is selected from the user time sequence sample set { t1, t2, t 3.., tk } as the first target user time sequence sample, and then, t2 is selected from the user time sequence sample set { t1, t2, t 3.., tk } as the first target user time sequence sample, and so on.

Further, obtaining a plurality of user time series samples in the user time series sample set that are consecutive in time with the first target user time series sample generates a first target user time series sample sequence, and the number of the plurality may be any number from 1 to the number of samples in the user time series sample set minus 1.

For example, when t1 is selected as the first target user timing sample, the first target user timing sample sequence may be generated by acquiring 3, 4, or all user timing samples except t1 consecutive in time from the t 1.

In some embodiments, the step of obtaining a first target user timing sample sequence generated by a plurality of user timing samples that are consecutive in time in the user timing sample set with the first target user timing sample may comprise:

(2.1) selecting a first target user time sequence sample sequence formed by a preset number of user time sequence samples which are continuous in time based on the first target user time sequence sample; or

(2.2) determining the user timing samples in the set of user timing samples that are left by the first target user timing sample as a first target user timing sample sequence.

For example, when t1 is selected as the first target user timing sample, t2, t3, …, and tk are used as the first target user timing sample sequence.

In an embodiment, since the computing capability of the actual processor is limited, in order to improve the efficiency of the computation, a preset number may be set, where the preset number is at least smaller than the number of samples in the user time sequence sample set minus 1, and the preset number is used to select user time sequence samples with a certain length to form a first target user time sequence sample sequence, for example, the preset number may be 4, and when t1 is selected as the first target user time sequence sample, 4 user time sequence samples that are consecutive in time, that is, t2, t3, t4, and t5, are selected to form the first target user time sequence sample sequence.

In step 102, inputting initial positive sample data and initial negative sample data to a preset neural network model for first training, so as to obtain a first trained preset neural network model.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence and the like, and is specifically explained by the following embodiment:

the embodiment of the present application uses the idea of a language model in natural language processing for reference, and uses a continuous item sequence (user time sequence sample sequence) to predict the next item, but this is a multi-classification problem, and the object of the embodiment of the present application is not to make item prediction but to find out a "mutation point" therein, so that the embodiment of the present application applies a two-classification problem to predict whether a certain user time sequence sample is associated with a user time sequence sample sequence of a user history. Therefore, the predetermined neural network can be a plurality of neural network models, such as a conventional statistical language model, or a neural network language model, in one embodiment, the predetermined neural network model can be a recurrent neural network model, as shown in fig. 2b, fig. 2b is a schematic structural diagram of the recurrent neural network model provided in the embodiment of the present application, the recurrent neural network model 10 can include an input layer, a hidden layer, a recurrent layer, and an output layer, U is a weight matrix from the input layer to the hidden layer, V is a weight matrix from the hidden layer to the output layer, x represents a vector, represents a value of the input layer, s is a vector, represents a value of the hidden layer, the hidden layer is composed of a plurality of nodes, the number of nodes is the same as the dimension of the vector s, w is a value of the previous hidden layer as a weight of the input of this time, o is also a vector, representing the value of the output layer. Because the hidden layer can refer to the last value circularly, the recurrent neural network model can better process the information of the user time sequence.

Further, inputting the initial positive sample data and the initial negative sample data into the preset neural network model for first training, wherein the preset neural network model can model a first target user time sequence sample sequence in the initial positive sample data to obtain a corresponding sequence vector, the sequence vector reflects the characteristics of the user sequence sample in the sequence, continuously learns the relationship among the first target user time sequence sample, the positive marking information and the corresponding sequence vector, adjusts the network parameters of the preset neural network model, models the first target user time sequence sample sequence corresponding to the false user time sequence sample in the initial negative sample data to obtain a corresponding sequence vector, continuously learns the relationship among the false user time sequence sample, the negative marking information and the corresponding sequence vector, adjusts the network parameters of the preset neural network model to obtain the first trained preset neural network model, the first trained recurrent neural network model has initial classification capability, and can identify initial positive sample data and initial negative sample data in user time sequence data.

In some embodiments, the step of inputting the initial positive sample data and the initial negative sample data to the preset neural network model for a first training to obtain a first trained preset neural network model may include:

(1) modeling the first target user time sequence sample sequence through a recurrent neural network model to obtain a corresponding sequence vector;

(2) inputting the first target user time sequence sample, positive labeling information and a corresponding sequence vector into an activation function layer of the recurrent neural network model for first training;

(3) and inputting the second target user time sequence sample, the negative marking information and the corresponding sequence vector into an activation function layer of the recurrent neural network model for first training to obtain the recurrent neural network model after first training.

In One embodiment, the user time sequence sample may also be represented in a One-Hot (One-Hot) form, and One-Hot encoding, also called One-bit effective encoding, mainly uses an N-bit status register to encode N states, each state is represented by its own independent register bit, and only One bit is effective at any time. The distance between the features can be calculated more reasonably by representing in a one-hot form.

Furthermore, modeling calculation can be performed on the first target user time sequence sample sequence through a recurrent neural network model to obtain a sequence vector, the sequence vector represents the characteristics of the historical first target user time sequence sample sequence, in order to implement the classification problem, the recurrent neural network model further comprises an activation function layer, an output layer of the recurrent neural network can be connected with the corresponding activation function layer, the activation function layer can be a softmax layer and is formed by a softmax function, the softmax layer is used in the classification process, the output of the recurrent neural network model is mapped into a (0, 1) interval and can be understood as a probability, and classification is performed.

In the embodiment of the application, the relation between the first target user time sequence sample and the corresponding sequence vector is learned through a softmax layer, the training target is positive labeling information, network parameters in the softmax layer are continuously adjusted through the learning process, the relation between the second target user time sequence sample and the corresponding sequence vector is learned through the learning process, the training target is negative labeling information, the network parameters in the softmax layer are continuously adjusted through the learning process, and a first trained recurrent neural network model is obtained.

In step 103, a user time sequence sample set is identified according to the first trained preset neural network model, and target positive sample data and mutant negative sample data are determined.

In the embodiment of the application, the time series data of users in the same category needs to be divided into a user behavior characteristic sequence, however, the influence of the mutation point is not considered in the training of the preset neural network model after the first training, for example, the user time sequence sample set comprises a sample of the behavior of continuously purchasing household appliance goods by the user and then a sample of the behavior of purchasing pet goods by the user, the sample behavior for purchasing pet supplies is certainly different from the prior sample behavior for continuously purchasing e-commerce goods, the sample of the behavior of the user purchasing the pet is the catastrophe point, but because the user time sequence samples in the user time sequence sample set are all positive samples, therefore, the first trained preset neural network model can only identify the sample of the user behavior characteristics of the non-purchased article, such as the user behavior characteristics of listening to songs, and cannot identify the user time sequence sample corresponding to the mutation point.

Furthermore, the number of samples corresponding to the mutation point can be set to only account for a very low proportion of the user time sequence sample set, so that the user time sequence sample set is identified through the first trained preset neural network model, the corresponding output scores of the user time sequence samples corresponding to the non-mutation points are very high and are both close to 1 because the user time sequence samples corresponding to the mutation points are positive sample data, the user time sequence samples corresponding to the mutation points are similar to the user time sequence samples of the positive sample data, but the user time sequence samples corresponding to the mutation points are not negative sample data, therefore, although the user time sequence samples corresponding to the first trained preset neural network model can be predicted, the scores are not high, the first trained preset neural network model can be confused, the output scores are about 0.5 in the middle part of 0 to 1, and therefore, the user time sequence samples close to 1 and the first target user time sequence sample sequence continuous in time can be determined as target positive sample data, the labeling of the target positive sample data is 1, the user time sequence sample of about 0.5 in the middle part and the corresponding first target user time sequence sample sequence which is continuous in time are determined as the mutation negative sample data, so that the automatic identification of the mutation negative sample data is realized, the manual labeling time is saved, and the data processing efficiency is improved.

In some embodiments, the identifying the user time-series sample set according to the first trained preset neural network model, and the determining the target positive sample data and the mutant negative sample data may include:

(1) sequentially identifying each user time sequence sample in the user time sequence sample set through a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample;

(2) determining a user time sequence sample with an output value larger than a preset positive sample threshold value as a third target user time sequence sample, and determining a user time sequence sample with an output value larger than a preset negative sample threshold value and smaller than a preset positive sample threshold value as a fourth target user time sequence sample;

(3) performing positive labeling on the third target user time sequence sample, and determining the labeled third target user time sequence sample and the corresponding first target user time sequence sample sequence as target positive sample data;

(4) and performing negative labeling on the fourth target user time sequence sample, and determining the fourth target user time sequence sample subjected to negative labeling and the corresponding first target user time sequence sample sequence as mutation negative sample data.

And identifying each user time sequence sample and the corresponding first target user time sequence sample sequence through a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample, wherein the output value is close to 1 to represent that the user time sequence sample is a positive sample, and the output value is close to 0 to represent that the user time sequence sample is a negative sample.

The preset positive sample threshold is a critical value defining whether the user time sequence sample is a positive sample, for example, 0.8, the preset negative sample threshold is a critical value defining whether the user time sequence sample is a negative sample, for example, 0.2, the user time sequence sample with the output value greater than the preset positive sample threshold is a positive sample, the user time sequence sample with the output value less than the preset negative sample threshold is a negative sample, and for the user time sequence sample corresponding to the mutation point, the output value is neither too high nor too low, so that the user time sequence sample greater than the preset negative sample threshold and less than the preset positive sample threshold can be determined as a mutation negative sample.

Further, a third target user time sequence sample is determined from the user time sequence sample with the output value greater than the preset positive sample threshold, the third target user time sequence sample is a positive sample, the user time sequence sample with the output value greater than the preset negative sample threshold and smaller than the preset positive sample threshold is determined as a fourth target user time sequence sample, positive labeling is performed on the third target user time sequence sample, for example, labeling is performed by 1, and the positively labeled third target user time sequence sample and the corresponding first target user time sequence sample sequence are determined as target positive sample data. And performing negative labeling, such as labeling 0, on the fourth target user time sequence sample, and determining the fourth target user time sequence sample subjected to negative labeling and the corresponding first target user time sequence sample sequence as mutation negative sample data.

In step 104, inputting the target positive sample data and the mutation negative sample data to the initialized preset neural network model for second training, so as to obtain a second trained preset neural network model.

Wherein, the target positive sample data and the mutation negative sample data identified by the first trained preset neural network model are input to the reinitialized preset neural network model for second training, the reinitialized preset neural network model can model a first target user time sequence sample sequence corresponding to the user time sequence sample in the target positive sample data to obtain a corresponding sequence vector, continuously learn the relationship among the user time sequence sample, the positive marking information and the corresponding sequence vector in the target positive sample data, adjust the network parameters of the preset neural network model, model the first target user time sequence sample sequence corresponding to the user time sequence data in the mutation negative sample data to obtain a corresponding sequence vector, continuously learn the relationship among the user time sequence sample, the negative marking information and the corresponding sequence vector in the mutation negative sample data, and adjusting the network parameters of the preset neural network model to obtain a second trained preset neural network model, wherein the second trained preset neural network has perfect binary capability, can identify target positive sample data and mutation negative sample in the user time sequence data, and can realize the sequence division of the user time sequence data according to the mutation negative sample.

In some embodiments, the step of inputting the target positive sample data and the mutant negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model may include:

(1) modeling the first target user time sequence sample sequence through the initialized recurrent neural network to obtain a corresponding sequence vector;

(2) inputting the third target user time sequence sample, the positive marking information and the corresponding sequence vector into an activation function layer of the initialized recurrent neural network model for second training;

(3) and inputting the fourth target user time sequence sample, the negative marking information and the corresponding sequence vector into an activation function layer of the initialized recurrent neural network for second training to obtain a recurrent neural network model after second training.

And modeling and calculating the first target user time sequence sample sequence corresponding to the third target user time sequence sample and the fourth target user time sequence sample through the initialized recurrent neural network model to obtain a sequence vector.

Further, the relation between a third target user time sequence sample and a corresponding sequence vector is learned through a softmax layer of the recurrent neural network model, a training target is positive label information 1, network parameters in the softmax layer are continuously adjusted through the learning process, the relation between a fourth target user time sequence sample and the corresponding sequence vector is learned, the training target is negative label information, network parameters in the softmax layer are continuously adjusted through the learning process, a recurrent neural network model after second training is obtained, and the recurrent neural network model after second training has the function of performing sequence division on a user time sequence data set to be divided.

In some embodiments, the step of obtaining the second trained neural network model may further include:

(1.1) classifying the user time sequence data in the user time sequence data set to be divided based on the second trained recurrent neural network model;

and (1.2) dividing the user time sequence data of the same type into the same user time sequence to obtain a plurality of sections of user time sequence sequences.

Wherein, the first user time sequence data in the user time sequence data set can be determined as a target user time sequence according to the time sequence, a target user time sequence data behind the target user time sequence is used as a test user time sequence data, and the target user time sequence data and the corresponding target user time sequence are simultaneously input into a second trained recurrent neural network model for identification, when the output classification value is close to 1, the target user time sequence data and the target user time sequence belong to the same category, the target user time sequence data is merged into the target user time sequence, when the output classification value is close to 0, the target user time sequence data and the target user time sequence do not belong to the same category, the current target user time sequence is stored and ended, a new target user time sequence is generated based on the target user time sequence data, and by analogy, dividing the same type of user time sequence data into the same user time sequence to obtain a plurality of sections of user time sequence sequences.

According to the method, the initial positive sample data and the initial negative sample data are generated from the user time sequence sample set by collecting the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model; identifying a user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutant negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain the second trained preset neural network model. Therefore, corresponding target positive sample data and mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the initialized preset neural network is subjected to second training according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified through the preset neural network after the second training, further, the automatic division of the user time sequence data can be realized, and the efficiency and the accuracy of data processing are greatly improved.

Example II,

The method described in the first embodiment is further illustrated by way of example.

In the present embodiment, the data processing apparatus will be described by taking an example in which the data processing apparatus is specifically integrated in a server, and specific reference will be made to the following description.

Referring to fig. 3, fig. 3 is another schematic flow chart of a data processing method according to an embodiment of the present disclosure.

The method flow can comprise the following steps:

in step 201, the server sequentially selects a first target user time sequence sample from the user time sequence sample set according to a time sequence order, and selects a first target user time sequence sample sequence formed by a preset number of user time sequence samples that are continuous in time based on the first target user time sequence sample.

The user time sequence sample set can be represented as { t1, t2, t3,.. and tk }, and is assumed to be a plurality of behavior feature data (user time sequence samples) sets generated by users for purchasing commodities on terminals, t1, t2, t3 and tk are different user time sequence samples, the interaction time of t1 is the user time sequence sample closest to the current time, and the interaction time of tk is the user time sequence sample farthest from the current time. The user time sequence sample can comprise multidimensional characteristic data such as purchased commodity identification characteristic, commodity information characteristic, purchase time characteristic and the like, for example, the t1 can comprise 3-dimensional characteristic data, namely the user time sequence sample consisting of 001, 010 and 15-52, wherein 001 represents the purchased commodity identification characteristic, can be the corresponding commodity identification characteristic of the refrigerator, 010 represents the commodity information characteristic, can be a double door type commodity information characteristic, 15-52 represents the purchase time, and can be the purchase time characteristic of 15 hours and 52 minutes. It should be noted that the user time sequence samples in the user time sequence sample set are all samples generated by real occurring interactive behaviors, so that the user time sequence samples in the user time sequence sample set are all positive samples.

Further, the server sequentially selects first target user time sequence samples t1, t2, t3 to tk according to the time sequence from the user time sequence sample set { t1, t2, t 3. And selecting a preset number of user time sequence samples which are consecutive in time based on the first target user time sequence sample, where the preset number may be 4, that is, when the first target user time sequence sample is t1, selecting 4 user time sequence samples which are consecutive in time based on the t1 may be { t2, t3, t4, t5}, and so on. Each target user timing sample and a corresponding first target user timing sample sequence are obtained.

In an embodiment, when the user timing sample sequence is taken backward to the end in the first target user timing sample set, the previous user timing sample sequence may be taken backward to form the first target user timing sample sequence, e.g., the first target user timing sample for tk may be { tk-1, t k-2, t k-3, t k-4 }.

In step 202, the server performs positive labeling on the first target user time sequence sample, and determines the first target user time sequence sample after positive labeling and a corresponding first target user time sequence sample sequence as initial positive sample data.

In the embodiment of the present application, a positive label is 1, and a negative label is 0, that is, the server performs a determination setting, that is, labels 1 on the first target user time sequence sample, and determines the first target user time sequence sample after the positive label and the corresponding first target user time sequence sample sequence as initial positive sample data, for example, initial positive sample data { x ═ t2, t3, t4, t5}, x '═ t1, y ═ 1}, where x represents the first target sequence sample, x' is the first target user time sequence sample, and y is the label information.

In step 203, the server sets a first preset number of second target user time sequence samples, performs negative labeling on the second target user time sequence samples, and determines the second target user time sequence samples and the first target user time sequence sample sequence after negative labeling as initial negative sample data.

Wherein the server sets a first preset number of second target user time series samples, the second target user time series samples being of a different category than the first target user time series samples, e.g. the second target user time series samples may comprise t0, t01 and t02 etc., the second target user time series samples may also comprise 3-dimensional feature data, but the second target user time series samples are completely different in category from the first target user time series samples, the second target user time series samples may be of a type of behavior that the user listened to the song on the terminal under the condition that the first target user time series samples are of a type of behavior that the user purchased goods on the terminal, e.g. t0 may comprise second target user time series samples consisting of 5000, 0500 and 16-05, the 5000 representing a music identification feature listened to, may be a music identification feature corresponding to the music, the 0500 representing a music information feature, may be a jazz type music information feature, the 16-05 representing a time of purchase feature, may be a time of purchase 05 point of purchase feature. The first preset number is less than the number of samples in the user time sequence sample set, 0 is labeled for the second target user time sequence sample, and the negatively labeled second target user time sequence sample and the randomly selected first target user time sequence sample sequence are determined as initial negative sample data, for example, the initial negative sample data { x ═ t2, t3, t4, t5}, x ═ t0, y ═ 0 }.

In step 204, the server models the first target user time sequence sample sequence through the recurrent neural network model to obtain a corresponding sequence vector, and inputs the first target user time sequence sample, the positive labeling information and the corresponding sequence vector into an activation function layer of the recurrent neural network model for first training.

The server models the first target user time sequence sample sequence 12 through a recurrent neural network 11 to obtain a corresponding sequence vector 13, the sequence vector 13 represents the characteristics of the first target user time sequence sample sequence 12, the relation between the first target user time sequence sample 14 and the corresponding sequence vector 13 is learned through an activation function layer 15, a training target is the positive label information 1, and accordingly network parameters in a softmax layer are continuously adjusted.

In step 205, the server inputs the second target user timing sample, the negative labeling information, and the corresponding sequence vector into an activation function layer of the recurrent neural network model for first training, so as to obtain a first trained recurrent neural network model.

The relation between the second target user time sequence sample and the corresponding sequence vector is learned through the activation function layer 15, the training target is positive labeling information 0, therefore, network parameters in the softmax layer are continuously adjusted, and a first trained recurrent neural network model is obtained.

In step 206, the server sequentially identifies each user time sequence sample in the user time sequence sample set through the first trained preset neural network model, and obtains a corresponding output value of each user time sequence sample.

Although the user time sequence samples belong to purchasing behaviors, the purchasing of the household appliance commodities is different from the purchasing of the pet products, but the recurrent neural network model after the first training cannot identify the type change of the purchased articles under the purchasing behavior, so that the classification fine granularity is poor.

In the embodiment of the application, the server obtains each user time sequence sample in the user time sequence sample set and a corresponding first target user time sequence sample sequence, and re-inputs each user time sequence sample and the corresponding first target user time sequence sample sequence into the first trained preset neural network model to obtain a corresponding output value of each user time sequence sample, and the 'false' user time sequence sample data cannot be distinguished through the first trained preset neural network model. The true sample data output value is close to 1, and the output value of the "false" user timing sample data is neither too high nor too low, and is around 0.5.

In step 207, the server determines the user time sequence sample with the output value greater than the preset positive sample threshold as a third target user time sequence sample, and determines the user time sequence sample with the output value greater than the preset negative sample threshold and less than the preset positive sample threshold as a fourth target user time sequence sample.

The preset positive sample threshold value may be 0.85, the preset negative sample threshold value may be 0.25, and thus, the server determines the user time sequence sample with the output value greater than 0.85 as a third target user time sequence sample, and the third target user time sequence sample is a positive sample, that is, the user time sequence sample data of the behavior of purchasing the household appliance commodity. And determining the user time sequence sample with the output value of more than 0.25 and less than 0.85 as a fourth target user time sequence sample, namely the user time sequence sample of the behavior of purchasing the non-household appliance commodity.

In step 208, the server performs positive labeling on the third target user time sequence sample, and determines the labeled third target user time sequence sample and the corresponding first target user time sequence sample sequence as target positive sample data.

The server positively marks 1 the third target user time sequence sample, namely marks 1 the user time sequence sample of the behavior of purchasing the household appliance commodity, determines the third target user time sequence sample which is positively marked 1 and the corresponding first target user time sequence sample sequence as the target positive sample data, and refers to the selection method for the selection mode of the first target user time sequence sample sequence and the subsequent first target user time sequence sample sequence, which is not repeated here.

In step 209, the server performs negative labeling on the fourth target user time sequence sample, and determines the negatively labeled fourth target user time sequence sample and the corresponding first target user time sequence sample sequence as the mutation negative sample data.

The server positively marks 0 on the fourth target user time sequence sample, namely marks 0 on the user time sequence sample of the behavior of purchasing the non-household appliance commodity, and determines the fourth target user time sequence sample which is negatively marked with 0 and the corresponding first target user time sequence sample sequence as target positive sample data.

In step 210, the server models the first target user timing sequence sample sequence through the initialized recurrent neural network model to obtain a corresponding sequence vector, and inputs the third target user timing sequence sample, the positive label information, and the corresponding sequence vector to the activation function layer of the initialized recurrent neural network model for the second training.

And the server performs modeling calculation on the first target user time sequence sample sequence corresponding to the third target user time sequence sample and the fourth target user time sequence sample through a recurrent neural network initialized by network parameters to obtain a sequence vector.

Furthermore, the relation between the third target user time sequence sample and the corresponding sequence vector is learned through the activation function layer of the initialized recurrent neural network model, the training target is the positive marking information 1, and the network parameters in the activation function layer are continuously adjusted through the learning process, namely, the initialized recurrent neural network model learns the capacity of recognizing the user time sequence sample of the household appliance commodity purchasing behavior.

In step 211, the server inputs the fourth target user timing sequence sample, the negative labeling information, and the corresponding sequence vector into the activation function layer of the initialized recurrent neural network model for the second training, so as to obtain the second trained recurrent neural network model.

The server continuously learns 'false' user time sequence samples (namely user time sequence samples for purchasing non-household appliance commodity behaviors) through the initialized activation function of the recurrent neural network model, namely the relationship between the fourth target user time sequence samples and corresponding sequence vectors, the training target is negative label information 0, network parameters in the activation function layer are continuously adjusted through the learning process, and a second trained recurrent neural network model is obtained, so that the second trained recurrent neural network model learns to obtain the capacity of identifying the user time sequence samples for purchasing the non-household appliance commodity behaviors.

In step 212, the server determines the first user time series data in the user time series data set as a target user time series sequence according to the time sequence.

The user time sequence data set comprises a plurality of user time sequence data to be divided, wherein the user time sequence data to be divided are distributed according to a time sequence, for example, the user time sequence data set to be divided is { s1, s2, …, sk }, the user time sequence data set to be divided comprises a plurality of user time sequence data, the occurrence time of the user time sequence data s1 is prior to that of the user time sequence data s2, and the server determines { s1} as a target user time sequence at the end of the occurrence time of the user time sequence data sk.

In step 213, the server obtains a target user timing sequence data after the target user timing sequence, inputs the target user timing sequence data and the current target user timing sequence into the second trained recurrent neural network model, and outputs a classification value corresponding to the target user timing sequence data.

Wherein, the server obtains a target user time sequence data s2 behind a target user time sequence { s1}, determines s2 as the target user time sequence data, and inputs the target user time sequence data s2 and a corresponding first user time sequence { s1} into a second trained recurrent neural network model, the second trained recurrent neural network model outputs a classification value corresponding to the target user time sequence data according to the relation between a sequence vector of the target user time sequence { s1} and the target user time sequence data s2, assuming that the target user time sequence s1 is a user time sequence sample for purchasing household appliance and commodity behaviors, the target user time sequence data s2 is also a user time sequence sample for purchasing household appliance and commodity behaviors, after inputting the target user time sequence data and the corresponding target user time sequence into the second trained recurrent neural network model, the second trained recurrent neural network model can perform modeling calculation on the target user time sequence to obtain a corresponding sequence vector, then calculate the target user time sequence and target user time sequence data through an activation function of the second trained recurrent neural network, and output a classification value corresponding to the target user time sequence data, wherein the closer the classification value is to 1, the more similar the category of the target user time sequence data and the target user time sequence is, the closer the classification value is to 0, the more dissimilar the category of the target user time sequence data and the target user time sequence is, and the target user time sequence data is a mutation point.

In step 214, it is checked whether the classification value is greater than a preset confidence level.

The preset confidence is a defined value defining whether the target user time sequence data belongs to the target user time sequence, the preset confidence may be set, the higher the preset confidence is, the higher the defined standard is, the lower the preset confidence is, the lower the defined standard is, when the classification value is detected to be greater than the preset confidence, step 215 is executed, and when the classification value is detected to be not greater than the preset confidence, step 216 is executed.

In step 215, the server merges the target user timing data into the target user timing sequence.

When the server detects that the classification value is greater than a preset confidence level, for example, the preset confidence level is 0.75, and the classification value is 0.88, then the classification value is greater than the preset confidence level, which indicates that the target user time sequence data belongs to the target user time sequence, and the server merges the target user time sequence data { s2} of the same category into the target user time sequence { s1}, so as to obtain a merged target user time sequence { s1, s2 }.

In step 216, the server saves and ends the current target user time series sequence, and generates a new target user time series sequence based on the target user time series data.

When the server detects that the classification value is not greater than a preset confidence level, for example, the classification value is 0.22, the classification value is smaller than the preset confidence level, it indicates that the target user time sequence data does not belong to the target user time sequence, if s3 is a user time sequence sample of a behavior of purchasing pet supplies, the target user time sequence data and the user time sequence { s1, s2} of a behavior of purchasing household appliances do not obviously belong to the same category, the server stores and ends the current target user time sequence { s1, s2}, and a new target user time sequence { s3} is generated based on the target user time sequence data s3 of another category of behavior of purchasing pet supplies.

In step 217, it is detected whether the user time series data classification is finished.

After merging the target user time sequence or generating a new target user time sequence, the server correspondingly detects whether the user time sequence data classification is finished, the finishing judgment condition is whether user time sequence data which are not traversed exist in the user time sequence data set, when the user time sequence data in the user time sequence data set are traversed, namely all data in the user time sequence data set { s1, s2, … and sk } to be divided are traversed, the user time sequence data classification is judged to be finished, step 218 is executed, when the user time sequence data in the user time sequence data set are not traversed, namely the data are not traversed to sk, the user time sequence data classification is judged to be not finished, the step 213 is returned to be executed, and the division processing is continued until the traversal is finished.

In step 218, the server determines that the user time series data classification is finished.

When the user time sequence data classification is detected to be finished, the multi-segment user time sequence sequences divided according to the mutation points can be obtained, the multi-segment target user time sequence sequences can be obtained, the user time sequence data can be accurately and automatically classified, and the accuracy and the efficiency of data processing are improved.

Example III,

In order to better implement the data processing method provided by the embodiment of the present application, an embodiment of the present application further provides a device based on the data processing method. The terms are the same as those in the data processing method, and details of implementation can be referred to the description in the method embodiment.

Referring to fig. 5a, fig. 5a is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure, wherein the data processing apparatus may include a generating unit 301, a first training unit 302, a determining unit 303, and a second training unit 304.

The generating unit 301 is configured to collect a user time sequence sample set, and generate initial positive sample data and initial negative sample data from the user time sequence sample set.

In some embodiments, as shown in fig. 5b, the generating unit 301 includes:

an acquisition subunit 3011, configured to acquire a user time sequence sample set;

a selecting subunit 3012, configured to sequentially select a first target user timing sample and a corresponding first target user timing sample sequence from the user timing sample set;

a positive labeling subunit 3013, configured to perform positive labeling on the first target user timing sample, and determine the first target user timing sample and a corresponding first target user timing sample sequence after the positive labeling as initial positive sample data;

a setting subunit 3014, configured to set a first preset number of second target user time-series samples, where a category of the second target user time-series samples is different from a category of the user time-series sample set;

and the negative labeling subunit 3015 is configured to perform negative labeling on the second target user time sequence sample, and determine the negatively labeled second target user time sequence sample and the first target user time sequence sample sequence as initial negative sample data.

In some embodiments, the selecting subunit 3012 is configured to: sequentially selecting a first target user time sequence sample from the user time sequence sample set according to the time sequence; a first target user timing sample sequence generated from a plurality of user timing samples that are consecutive in time in the user timing sample set with the first target user timing sample is obtained.

In some embodiments, the selecting subunit 3012 is further configured to: sequentially selecting a first target user time sequence sample from the user time sequence sample set according to the time sequence; selecting a first target user time sequence sample sequence formed by a preset number of user time sequence samples which are continuous in time based on the first target user time sequence sample; or determining the user timing samples in the user timing sample set except the first target user timing sample as a first target user timing sample sequence.

The first training unit 302 is configured to input the initial positive sample data and the initial negative sample data to a preset neural network model for first training, so as to obtain a first trained preset neural network model.

In some embodiments, the preset neural network model is a recurrent neural network model, and the first training unit 302 is configured to: modeling the first target user time sequence sample sequence through a recurrent neural network model to obtain a corresponding sequence vector; inputting the first target user time sequence sample, positive labeling information and a corresponding sequence vector into an activation function layer of the recurrent neural network model for first training; and inputting the second target user time sequence sample, the negative marking information and the corresponding sequence vector into an activation function layer of the recurrent neural network model for first training to obtain the recurrent neural network model after first training.

The determining unit 303 is configured to identify the user time sequence sample set according to the first trained preset neural network model, and determine target positive sample data and mutant negative sample data.

In some embodiments, the determining unit 303 is configured to: sequentially identifying each user time sequence sample in the user time sequence sample set through a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample; determining a user time sequence sample with an output value larger than a preset positive sample threshold value as a third target user time sequence sample, and determining a user time sequence sample with an output value larger than a preset negative sample threshold value and smaller than a preset positive sample threshold value as a fourth target user time sequence sample; performing positive labeling on the third target user time sequence sample, and determining the labeled third target user time sequence sample and the corresponding first target user time sequence sample sequence as target positive sample data; and performing negative labeling on the fourth target user time sequence sample, and determining the fourth target user time sequence sample subjected to negative labeling and the corresponding first target user time sequence sample sequence as mutation negative sample data.

The second training unit 304 is configured to input the target positive sample data and the mutant negative sample data to the initialized preset neural network model for second training, so as to obtain a second trained preset neural network model.

In some embodiments, the preset neural network model is a recurrent neural network model, and the second training unit 304 is configured to: modeling the first target user time sequence sample sequence through the initialized recurrent neural network model to obtain a corresponding sequence vector; inputting the third target user time sequence sample, the positive marking information and the corresponding sequence vector into an activation function layer of the initialized recurrent neural network model for second training; and inputting the fourth target user time sequence sample, the negative marking information and the corresponding sequence vector into an activation function layer of the initialized recurrent neural network model for second training to obtain the second trained recurrent neural network model.

In some embodiments, as shown in fig. 5c, the data processing apparatus further includes:

a classification unit 305, configured to classify, based on the second trained recurrent neural network model, user time series data in the user time series data set to be divided;

the dividing unit 306 is configured to divide the user time sequence data of the same type into the same user time sequence, so as to obtain multiple user time sequence sequences.

In some embodiments, the classification unit 305 is configured to: determining the first user time sequence data in the user time sequence data set as a target user time sequence according to the time sequence; acquiring target user time sequence data after the target user time sequence; inputting the target user time sequence data and the corresponding first user time sequence into a second trained recurrent neural network model, and outputting a classification value corresponding to the target user time sequence data; when the classification value is larger than the preset confidence level, merging the target user time sequence data into a target user time sequence, and returning to execute the step of acquiring target user time sequence data after the target user time sequence until the classification of the user time sequence data is finished; and when the classification value is not greater than the preset confidence coefficient, storing and ending the current target user time sequence, generating a new target user time sequence based on the target user time sequence data, and returning to execute the step of acquiring the target user time sequence data after the target user time sequence until the classification of the user time sequence data is ended.

The specific implementation of each unit can refer to the previous embodiment, and is not described herein again.

As can be seen from the above, in the embodiment of the present application, the generating unit 301 acquires the user time sequence sample set, and generates the initial positive sample data and the initial negative sample data from the user time sequence sample set; the first training unit 302 inputs the initial positive sample data and the initial negative sample data to a preset neural network model for first training to obtain a first trained preset neural network model; the determining unit 303 identifies a user time sequence sample set according to the first trained preset neural network model, and determines target positive sample data and mutant negative sample data; the second training unit 304 inputs the target positive sample data and the mutation negative sample data to the initialized preset neural network model for second training, so as to obtain a second trained preset neural network model. Therefore, corresponding target positive sample data and mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the initialized preset neural network is subjected to second training according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified through the preset neural network after the second training, further, the automatic division of the user time sequence data can be realized, and the efficiency and the accuracy of data processing are greatly improved.

Example four,

An embodiment of the present application further provides an electronic device, and the embodiment of the present application takes the electronic device as an example to be explained, as shown in fig. 6, it shows a schematic structural diagram of a server according to the embodiment of the present application, and specifically:

the server may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the server architecture shown in FIG. 6 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

Wherein:

the processor 401 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the server. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The server further includes a power supply 403 for supplying power to each component, and preferably, the power supply 403 may be logically connected to the processor 401 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The server may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the server may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 401 in the server loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:

acquiring a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set; inputting the initial positive sample data and the initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model; identifying the user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutant negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model, wherein the second trained preset neural network model is used for carrying out sequence division on the user time sequence data set to be divided.

In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the data processing method, and are not described herein again.

As can be seen from the above, the server in the embodiment of the present application may generate the initial positive sample data and the initial negative sample data from the user time sequence sample set by collecting the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a first trained preset neural network model; identifying a user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutant negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain the second trained preset neural network model. Therefore, corresponding target positive sample data and mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the initialized preset neural network is subjected to second training according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified through the preset neural network after the second training, further, the automatic division of the user time sequence data can be realized, and the efficiency and the accuracy of data processing are greatly improved.

Example V,

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any data processing method provided by the embodiments of the present application. For example, the instructions may perform the steps of:

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any data processing method provided in the embodiments of the present application, the beneficial effects that can be achieved by any data processing method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.

The foregoing detailed description has provided a data processing method, an apparatus, a computer-readable storage medium, and an electronic device according to embodiments of the present application, and specific examples are applied herein to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A data processing method, comprising:

2. The data processing method of claim 1, wherein the step of generating initial positive sample data and initial negative sample data from the user time series sample set comprises:

sequentially selecting a first target user time sequence sample and a corresponding first target user time sequence sample sequence from the user time sequence sample set;

performing positive labeling on the first target user time sequence sample, and determining the first target user time sequence sample after positive labeling and a corresponding first target user time sequence sample sequence as initial positive sample data;

setting a first preset number of second target user time sequence samples, wherein the categories of the second target user time sequence samples are different from the categories of the user time sequence sample set;

and performing negative labeling on the second target user time sequence sample, and determining the second target user time sequence sample and the first target user time sequence sample sequence after the negative labeling as initial negative sample data.

3. The data processing method of claim 2, wherein said step of sequentially selecting a first target user time series sample and a corresponding first target user time series sample sequence from said set of user time series samples comprises:

4. The data processing method of claim 3, wherein the step of obtaining a first target user time series of samples generated from a plurality of user time series samples in a user time series sample set that are consecutive in time with the first target user time series of samples comprises:

5. The data processing method according to claims 2 to 4, wherein the preset neural network model is a recurrent neural network model, and the step of inputting the initial positive sample data and the initial negative sample data to the preset neural network model for a first training to obtain a first trained preset neural network model comprises:

6. The data processing method according to any one of claims 1 to 4, wherein the step of identifying the user time series sample set according to the first trained preset neural network model and determining target positive sample data and mutant negative sample data comprises:

7. The data processing method according to claim 6, wherein the preset neural network model is a recurrent neural network model, and the step of inputting the target positive sample data and the mutant negative sample data into the initialized preset neural network model for second training to obtain the second trained preset neural network model comprises:

8. The data processing method of claim 7, wherein the step of obtaining the second trained neural network model is followed by:

classifying the user time sequence data in the user time sequence data set to be divided based on the second trained recurrent neural network model;

and dividing the user time sequence data of the same type into the same user time sequence to obtain a plurality of sections of user time sequence sequences.

9. The data processing method according to claim 8, wherein the step of classifying the user time series data in the user time series data set to be divided based on the second trained recurrent neural network model comprises:

acquiring target user time sequence data after the target user time sequence;

10. A data processing apparatus, comprising:

11. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the data processing method according to any one of claims 1 to 9.

12. An electronic device comprising a processor and a memory, said memory storing a plurality of computer instructions, wherein said processor loads said computer instructions to perform the steps of the data processing method of any of claims 1 to 9.