CN111160484B

CN111160484B - Data processing method, data processing device, computer readable storage medium and electronic equipment

Info

Publication number: CN111160484B
Application number: CN201911419795.9A
Authority: CN
Inventors: 缪畅宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-08-29
Anticipated expiration: 2039-12-31
Also published as: CN111160484A

Abstract

The embodiment of the application discloses a data processing method, a data processing device, a computer readable storage medium and electronic equipment, wherein initial positive sample data and initial negative sample data are generated from a user time sequence sample set by collecting the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a preset neural network model after first training; identifying a user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training, and obtaining the second trained preset neural network model. Therefore, through secondary training, a second trained preset neural network model with a sequence division function is obtained, automatic division of time sequence data of a user is achieved, and the efficiency and accuracy of data processing are greatly improved.

Description

Data processing method, data processing device, computer readable storage medium and electronic equipment

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method, a data processing device, a computer readable storage medium, and an electronic device.

Background

With the wide application of networks and the high-speed development of terminal technologies, the life and work of the whole human society are more and more closely related to the terminal technologies, and in order to provide more intelligent services for users, a series of behavior characteristic data of the interactions of the users on the terminals are often required to be acquired, so that the preference and habit of the users are analyzed.

In the prior art, a terminal may acquire a plurality of behavior feature data of a user continuously interacting with the terminal, for example, acquire a plurality of behavior feature data generated by the user continuously purchasing a plurality of commodities, and manually divide the plurality of behavior feature data to obtain a plurality of types of behavior feature data.

In the course of research and practice of the prior art, the inventors of the present application found that, although a means of manually classifying a plurality of behavior feature data was provided in the prior art, the efficiency of data processing was greatly reduced by manually classifying.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, a computer readable storage medium and electronic equipment, which can improve the efficiency and accuracy of data processing.

In order to solve the technical problems, the embodiment of the application provides the following technical scheme:

a data processing method, comprising:

collecting a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set;

inputting the initial positive sample data and the initial negative sample data into a preset neural network model for first training to obtain a preset neural network model after the first training;

identifying the user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data;

and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model, wherein the second trained preset neural network model is used for carrying out sequence division on a user time sequence data set to be divided.

Correspondingly, the embodiment of the application also provides a data processing device, which comprises:

the generating unit is used for collecting a user time sequence sample set and generating initial positive sample data and initial negative sample data from the user time sequence sample set;

The first training unit is used for inputting the initial positive sample data and the initial negative sample data into a preset neural network model to perform first training, so as to obtain the preset neural network model after the first training;

the determining unit is used for identifying the user time sequence sample set according to a first trained preset neural network model and determining target positive sample data and mutation negative sample data;

the second training unit is used for inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model to perform second training to obtain a second trained preset neural network model, and the second trained preset neural network model is used for performing sequence division on a user time sequence data set to be divided.

In some embodiments, the generating unit includes:

the acquisition subunit is used for acquiring a time sequence sample set of the user;

a selecting subunit, configured to sequentially select a first target user timing sample and a corresponding first target user timing sample sequence from the user timing sample set;

the positive marking subunit is used for carrying out positive marking on the first target user time sequence sample, and determining the first target user time sequence sample after positive marking and a corresponding first target user time sequence sample sequence as initial positive sample data;

A setting subunit, configured to set a first preset number of second target user timing samples, where a class of the second target user timing samples is different from a class of the user timing sample set;

and the negative annotation subunit is used for carrying out negative annotation on the second target user time sequence sample, and determining the second target user time sequence sample and the first target user time sequence sample sequence after the negative annotation as initial negative sample data.

In some embodiments, the selecting subunit is configured to:

sequentially selecting a first target user time sequence sample from the user time sequence sample set according to time sequence;

a first target user-timing sample sequence generated from a plurality of user-timing samples in a user-timing sample set that are temporally consecutive to the first target user-timing sample is obtained.

In some embodiments, the selecting subunit is further configured to:

selecting a first target user time sequence sample sequence formed by a preset number of user time sequence samples which are continuous in time based on the first target user time sequence samples; or (b)

And determining the user time sequence samples remained by dividing the first target user time sequence sample in the user time sequence sample set as a first target user time sequence sample sequence.

In some embodiments, the preset neural network model is a recurrent neural network model, and the first training unit is configured to:

modeling the first target user time sequence sample sequence through a cyclic neural network model to obtain a corresponding sequence vector;

inputting the first target user time sequence sample, the positive annotation information and the corresponding sequence vector into an activation function layer of the cyclic neural network model for first training;

and inputting the second target user time sequence sample, the negative annotation information and the corresponding sequence vector into an activation function layer of the cyclic neural network model to perform first training, so as to obtain the cyclic neural network model after the first training.

In some embodiments, the determining unit is configured to:

sequentially identifying each user time sequence sample in the user time sequence sample set through a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample;

determining a user time sequence sample with an output value larger than a preset positive sample threshold value as a third target user time sequence sample, and determining a user time sequence sample with an output value larger than a preset negative sample threshold value and smaller than the preset positive sample threshold value as a fourth target user time sequence sample;

Performing positive annotation on the third target user time sequence sample, and determining the third target user time sequence sample after positive annotation and a corresponding first target user time sequence sample sequence as target positive sample data;

and negative labeling is carried out on the fourth target user time sequence sample, and the fourth target user time sequence sample after negative labeling and the corresponding first target user time sequence sample sequence are determined to be mutation negative sample data.

In some embodiments, the preset neural network model is a recurrent neural network model, and the second training unit is configured to:

modeling the first target user time sequence sample sequence through the initialized cyclic neural network model to obtain a corresponding sequence vector;

inputting the time sequence sample, the positive annotation information and the corresponding sequence vector of the third target user into an activated function layer of the initialized cyclic neural network model for second training;

and inputting the fourth target user time sequence sample, the negative annotation information and the corresponding sequence vector into an activated function layer of the initialized cyclic neural network model for second training, and obtaining the cyclic neural network model after the second training.

In some embodiments, the data processing apparatus further comprises:

the classification unit is used for classifying the user time sequence data in the user time sequence data set to be divided based on the second trained cyclic neural network model;

the dividing unit is used for dividing the same type of user time sequence data into the same user time sequence to obtain a plurality of sections of user time sequence sequences.

In some embodiments, the classification unit is configured to:

determining first user time sequence data in the user time sequence data set as a target user time sequence according to time sequence;

acquiring target user time sequence data after the target user time sequence;

inputting the target user time sequence data and the current target user time sequence into a second trained cyclic neural network model, and outputting a classification value corresponding to the target user time sequence data;

when the classification value is larger than the preset confidence, merging the target user time sequence data into a target user time sequence, and returning to execute the step of acquiring the target user time sequence data after the target user time sequence until the user time sequence data classification is finished;

and when the classification value is not greater than the preset confidence, storing and ending the current target user time sequence, generating a new target user time sequence based on the target user time sequence data, and returning to the step of executing the target user time sequence data after the target user time sequence is acquired until the classification of the user time sequence data is ended.

Accordingly, embodiments of the present application also provide a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the data processing method described above.

Correspondingly, the embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the memory stores a plurality of computer instructions, and the processor loads the computer instructions to execute the steps in any one of the data processing methods provided by the embodiment of the application.

The embodiment of the application collects a user time sequence sample set and generates initial positive sample data and initial negative sample data from the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a preset neural network model after first training; identifying a user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training, and obtaining the second trained preset neural network model. According to the method, the corresponding target positive sample data and the corresponding mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the second training is carried out on the initialized preset neural network according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified by the second trained preset neural network, further automatic division of the user time sequence data can be achieved, and the efficiency and the accuracy of data processing are greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a scenario of a data processing system provided by an embodiment of the present application;

FIG. 2a is a schematic flow chart of a data processing method according to an embodiment of the present application;

fig. 2b is a schematic structural diagram of a recurrent neural network model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of another flow chart of a data processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present application;

FIG. 5a is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 5b is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present application;

FIG. 5c is a schematic diagram of another structure of a data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The embodiment of the application provides a data processing method, a data processing device and a computer readable storage medium.

Referring to fig. 1, fig. 1 is a schematic view of a scenario of a data processing system according to an embodiment of the present application, including: terminal a, and a server (the data processing system may further include other terminals besides terminal a, the specific number of which is not limited herein), may be connected through a communication network between terminal a and server, where the communication network may include a wireless network and a wired network, and the wireless network includes one or more of a wireless wide area network, a wireless local area network, a wireless metropolitan area network, and a wireless personal area network. The network includes network entities such as routers, gateways, etc., which are not shown. Terminal a may interact with the server via a communication network, e.g., terminal a may implement collecting a plurality of behavioral characteristic data of a user interacting continuously on the terminal, generating a user time-series sample set, and transmitting the user time-series sample set to the server.

The data processing system may include a data processing device, which may be specifically integrated in a server, and in some embodiments, the data processing device may also be integrated in a terminal having an operation capability, and in this embodiment, the data processing device is integrated in the server to illustrate that, as shown in fig. 1, the server collects a user time sequence sample set sent by the terminal a, where the user time sequence sample set includes a plurality of user time sequence samples sequenced in time sequence, and generates initial positive sample data and initial negative sample data from the user time sequence sample set. And inputting the initial positive sample data and the initial negative sample data into a preset neural network model for first training to obtain the preset neural network model after the first training. And identifying the user time sequence sample set according to the first trained preset neural network model, and determining target positive sample data and mutation negative sample data in the user time sequence sample set, wherein the mutation negative sample data is the mutation point in the whole user time sequence sample set. Inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model to carry out second training again to obtain a second trained preset neural network model, enabling the second trained preset neural network model to have the capability of identifying mutation points, further identifying the mutation points in the user time sequence data set to be divided based on the second trained preset neural network model, and dividing the user time sequence data set to be divided into a plurality of sections of user time sequence data sequences to obtain a sequence division result.

The data processing system may further include a terminal a, and the terminal a may install various applications required by the user, such as an instant messaging application, an e-commerce application, and a multimedia application.

It should be noted that, the schematic view of the scenario of the data processing system shown in fig. 1 is only an example, and the data processing system and scenario described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the data processing system and the appearance of a new service scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.

The following will describe in detail. The numbers of the following examples are not intended to limit the preferred order of the examples.

Embodiment 1,

In this embodiment, description will be made from the viewpoint of a data processing apparatus which can be integrated in a server having a storage unit and a microprocessor mounted thereon and having arithmetic capability.

A data processing method, comprising: collecting a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a preset neural network model after first training; identifying a user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model, wherein the second trained preset neural network model is used for carrying out sequence division on the user time sequence data set to be divided.

Referring to fig. 2a, fig. 2a is a flow chart of a data processing method according to an embodiment of the application. The data processing method comprises the following steps:

in step 101, a user time series sample set is acquired and initial positive sample data and initial negative sample data are generated from the user time series sample set.

It can be understood that, in order to provide a more intelligent recommendation service for a user, after the background server obtains a plurality of behavior feature data (i.e., user time sequence data) of continuous interaction of the user on the terminal, the user time sequence data (session) needs to be divided into a plurality of user time sequence data sequences according to categories, where the user time sequence data sequences are continuous interaction windows corresponding to the operation behaviors of the user on the terminal, and the user time sequence data sequences can reflect potential continuous behaviors of the user, for example, the user can continuously listen to a plurality of wounded songs when the mood is bad, and the user can continuously purchase low-price commodities in the period of a promotion activity. The accurate behavior characteristic data sequence can better analyze the behavior characteristics of the user, so as to provide accurate recommendation service for the user more accurately.

In some embodiments, the step of collecting a user time series sample set may include: and acquiring a plurality of behavior feature data of continuous interaction of the user on the terminal, wherein the behavior feature data consists of multidimensional features, and combining the behavior feature data according to time sequence to generate a user time sequence sample set.

In the embodiment of the present application, a user time sequence sample set is collected first, where the user time sequence sample set includes a plurality of user time sequence samples distributed according to a time sequence, for example, the user time sequence sample set may be represented as { t1, t2, t 3... Each user time sequence sample is corresponding behavior characteristic data of one interaction performed by a user on the terminal, the user time sequence sample can be formed by combining multidimensional characteristic data, for example, when the behavior characteristic data is purchasing behavior, the multidimensional characteristic data can be commodity identification characteristics, commodity information characteristics, purchasing time characteristics and the like, and it is required to particularly explain that each user time sequence sample in the user time sequence sample set is a positive sample, namely, each user time sequence sample representing the user time sequence sample is generated by the interaction of the user with a corresponding object on the terminal, so each user time sequence sample in the user time sequence sample set is a positive sample certainly.

Further, the server may mine a plurality of initial positive sample data according to the user time sequence samples in the user time sequence sample set, where the initial positive sample data may be composed of a first target user time sequence sample corresponding to the item to be judged to interact selected from the user time sequence sample set and a first target user time sequence sample sequence corresponding to the first target user time sequence sample in the user time sequence sample set, where all labels of the initial positive sample data are 1, and represent that the first target user time sequence sample and the corresponding first target user time sequence sample are correlated. The user time sequence samples in the user time sequence sample set are positive samples, so that random sampling can be carried out from articles which are not interacted by the user, a plurality of initial negative sample data can be mined, the initial negative sample data can consist of false user time sequence samples generated in the articles which are not interacted by the user and first target user time sequence sample sequences which are randomly acquired, the categories of the initial negative sample data are inconsistent with the categories of the initial positive sample data, the labels of the initial negative sample data are all 0, and the false user time sequence samples and the first target user time sequence sample sequences are not associated with each other.

In some embodiments, the step of generating initial positive sample data and initial negative sample data from the user time series sample set may comprise:

(1) Sequentially selecting a first target user time sequence sample and a corresponding first target user time sequence sample sequence from the user time sequence sample set;

(2) The first target user time sequence sample is marked positively, and the first target user time sequence sample after being marked positively and the corresponding first target user time sequence sample sequence are determined to be initial positive sample data;

(3) Setting a first preset number of second target user time sequence samples, wherein the categories of the second target user time sequence samples are different from the categories of the user time sequence sample sets;

(4) And negative labeling is carried out on the second target user time sequence sample, and the second target user time sequence sample and the first target user time sequence sample after negative labeling are determined to be initial negative sample data.

The corresponding first target user timing sample may be selected from the user timing sample set according to a time sequence, for example, t1 may be selected from the user timing sample set { t1, t2, t3,..tk } as the first target user timing sample. Accordingly, a first target user time sequence sample sequence related to the first target user time sequence sample is obtained, where the first target user time sequence sample sequence may be a sequence related to the first target user time sequence sample in time, for example, the first target user time sequence sample sequence corresponding to the selected t1 may be { t2}, { t2, t3}, { t2,., tk } and the specific selected sequence length may be set according to the user, and is not limited herein specifically. Because the first target user time sequence samples are positive samples, positive labels can be performed on the first target user time sequence samples, in the embodiment of the application, 1 is positive labels and 0 is negative labels, the first target user time sequence samples after positive labels and corresponding first target user time sequence sample sequences are determined to be initial positive sample data, for example { x= { t2,...

Further, a first preset number of second target user time sequence samples may be set, in an embodiment, in order to make the negative samples not affect the training effect of the positive samples, the first preset number may be limited to be smaller than the number of samples in the user time sequence sample set, and the category of the second target user time sequence samples is different from the category of the user time sequence samples in the user time sequence sample set, so that the second target user time sequence samples are not related to the first target user time sequence samples, and are all negative samples, so that negative labeling can be performed on the second target user time sequence samples, and the second target user time sequence samples after negative labeling and the first target user time sequence sample sequence acquired randomly are determined as initial negative sample data, for example { x= { t2, tk, x' =t0, y=0 } are determined as one piece of initial negative sample data.

In some embodiments, the step of sequentially selecting the first target user timing sample and the corresponding first target user timing sample sequence from the set of user timing samples may include:

(1.1) sequentially selecting a first target user time sequence sample from the user time sequence sample set according to time sequence;

(1.2) obtaining a first target user-timing sample sequence generated from a plurality of user-timing samples in the set of user-timing samples that are temporally consecutive to the first target user-timing sample.

The first target user timing sample may be sequentially selected from the user timing sample set according to a time sequence, for example, t1 is selected from the user timing sample set { t1, t2, t3, & gt, tk } as the first target user timing sample, then t2 is selected from the user timing sample set { t1, t2, t3, & gt, tk } as the first target user timing sample, and so on.

Further, obtaining a plurality of user-timing samples in the set of user-timing samples that are temporally consecutive to the first target user-timing sample generates a first target user-timing sample sequence, the plurality of user-timing samples may be any one of 1 to the number of samples in the set of user-timing samples minus 1.

For example, when t1 is selected as the first target user timing sample, 3, 4, or all user timing samples of the user timing sample set except t1 that are consecutive in time may be acquired to generate the first target user timing sample sequence.

In some embodiments, the step of obtaining a first target user-timing sample sequence generated from a plurality of user-timing samples in the user-timing sample set that are temporally consecutive to the first target user-timing sample may include:

(2.1) selecting a first target user timing sample sequence formed by a predetermined number of user timing samples that are consecutive in time based on the first target user timing sample; or (b)

(2.2) determining a user-timing sample of the set of user-timing samples remaining from dividing the first target user-timing sample as a first target user-timing sample sequence.

The user time sequence samples remaining from the user time sequence sample set divided by the selected first target user time sequence sample may be all determined as the first target user time sequence sample sequence, for example, when t1 is selected as the first target user time sequence sample, t2, t3, …, and tk are selected as the first target user time sequence sample sequence.

In an embodiment, since the computing power of the actual processor is limited, in order to improve the computing efficiency, a preset number may be set, where the preset number is at least smaller than the number of samples in the user time sequence sample set minus 1, and a first target user time sequence sample sequence is formed by selecting a user time sequence sample with a certain length by the preset number, for example, the preset number may be 4, and when t1 is selected as the first target user time sequence sample, 4 user time sequence samples that are consecutive in time, that is, t2, t3, t4, and t5 are selected to form the first target user time sequence sample sequence.

In step 102, the initial positive sample data and the initial negative sample data are input to a preset neural network model for performing a first training, so as to obtain the preset neural network model after the first training.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as machine learning, and the like, and is specifically described by the following embodiments:

the embodiment of the application uses a continuous article sequence (user time sequence sample sequence) to predict the thought of the next article by referring to the thought of the language model in natural language processing, but the aim of the embodiment of the application is not to make article prediction but to find out the 'mutation points' in the article prediction, so the embodiment of the application applies the classification problem to predict whether a certain user time sequence sample is related to the user time sequence sample sequence of the user history. Therefore, the preset neural network may be various neural network models, such as a conventional statistical language model, or may be a neural network language model, in an embodiment, the preset neural network model may be a cyclic neural network model, as shown in fig. 2b, fig. 2b is a schematic structural diagram of the cyclic neural network model provided by the embodiment of the present application, the cyclic neural network model 10 may include an input layer, a hidden layer, a cyclic layer and an output layer, where U is a weight matrix from the input layer to the hidden layer, V is a weight matrix from the hidden layer to the output layer, x represents a vector, s represents a value of the input layer, s is a vector, represents a value of the hidden layer, the hidden layer is composed of a plurality of nodes, the number of nodes is the same as the dimension of the vector s, w is a last value of the hidden layer as the weight of the input, o is also a vector, and represents a value of the output layer. Since the hidden layer can circularly refer to the last value, the cyclic neural network model can better process the information of the user time sequence.

Further, the initial positive sample data and the initial negative sample data are input into the preset neural network model together for first training, the preset neural network model can model a first target user time sequence sample sequence in the initial positive sample data to obtain a corresponding sequence vector, the sequence vector reflects the characteristics of a user sequence sample in the sequence, the relation among the first target user time sequence sample, positive labeling information and the corresponding sequence vector is continuously learned, network parameters of the preset neural network model are adjusted, the corresponding first target user time sequence sample sequence of a false user time sequence sample in the initial negative sample data is modeled to obtain a corresponding sequence vector, the relation among the false user time sequence sample, negative labeling information and the corresponding sequence vector is continuously learned, network parameters of the preset neural network model are adjusted to obtain the preset neural network model after first training, and the cyclic neural network model after first training has initial two-classification capacity and can identify the initial positive sample data and the initial negative sample data in the user time sequence data.

In some embodiments, the step of inputting the initial positive sample data and the initial negative sample data into the preset neural network model to perform the first training, and obtaining the preset neural network model after the first training may include:

(1) Modeling the first target user time sequence sample sequence through a cyclic neural network model to obtain a corresponding sequence vector;

(2) Inputting the first target user time sequence sample, the positive annotation information and the corresponding sequence vector into an activation function layer of the cyclic neural network model for first training;

(3) And inputting the second target user time sequence sample, the negative annotation information and the corresponding sequence vector into an activation function layer of the cyclic neural network model for first training to obtain the cyclic neural network model after the first training.

The user timing samples may be represented in a multi-dimensional manner, and in One embodiment, the user timing samples may be represented in One-Hot (One-Hot) form, which is also referred to as One-bit active code, where N states are encoded using N-bit state registers, each of which is represented by its own register bit, and only One bit is active at any time. Represented in one-hot form, the distance between features can be calculated more reasonably.

Further, modeling calculation can be performed on the first target user time sequence sample sequence through a cyclic neural network model to obtain a sequence vector, the sequence vector represents the characteristics of the historical first target user time sequence sample sequence, in order to achieve the classification problem, the cyclic neural network model further comprises an activation function layer, an output layer of the cyclic neural network can be connected with a corresponding activation function layer, the activation function layer can be a softmax layer and is formed by softmax functions, and the softmax layer is used in the classification process, and the output of the cyclic neural network model is mapped into a (0, 1) interval and can be understood as probability so as to classify.

In the embodiment of the application, the relation between the first target user time sequence sample and the corresponding sequence vector is learned through the softmax layer, the training target is positive annotation information, the relation between the second target user time sequence sample and the corresponding sequence vector is learned through the learning process, the training target is negative annotation information, the network parameter in the softmax layer is continuously adjusted through the learning process, the first trained cyclic neural network model is obtained, the first trained cyclic neural network model can have initial classification capacity, and initial positive sample data and initial negative sample data in the user time sequence data can be identified.

In step 103, the user time sequence sample set is identified according to the first trained preset neural network model, and target positive sample data and abrupt negative sample data are determined.

In the embodiment of the application, the user time sequence data of the same category is required to be divided into a user behavior feature sequence, however, the influence of the mutation point is not considered in the training of the preset neural network model after the first training, for example, the user time sequence sample set includes samples of the household appliance commodity behavior purchased continuously by the user and then samples of the pet commodity behavior purchased immediately after the samples of the pet commodity behavior purchased are definitely different from the samples of the commodity behavior purchased continuously before, the samples of the purchased pet user behavior are mutation points, but because the user time sequence samples in the user time sequence sample set are all positive samples, the preset neural network model after the first training only can identify the samples of the user behavior feature of the non-purchased object, for example, the user behavior feature of the user listening to the song, and the user time sequence sample corresponding to the mutation point cannot be identified.

Further, the number of samples corresponding to the mutation point can be set to occupy a very low proportion in the user time sequence sample set, so that the user time sequence sample set is identified through the preset neural network model after the first training, the corresponding output of the user time sequence sample corresponding to the non-mutation point is very high because of positive sample data, the corresponding output of the user time sequence sample corresponding to the mutation point is close to 1, but the corresponding user time sequence sample corresponding to the mutation point is a user time sequence sample similar to the positive sample data, but the corresponding user time sequence sample corresponding to the mutation point is not negative sample data, so that although the preset neural network model after the first training can predict, the score is not high, the preset neural network model after the first training can be confused, the output of the preset neural network model is about 0 to 1, the user time sequence sample corresponding to the first target user time sequence sample is 0.5 in the middle, the user time sequence sample corresponding to the first target time sequence sample is 1, the user time sequence sample corresponding to the first target time sequence sample is determined to be negative sample data, the user time sequence sample corresponding to the first target time sequence sample is negative, the target time sequence sample data is negative, the user time sequence sample is determined to be negative, and the target time sequence is negative sample data, and the identification time sequence is realized, and the identification time of the negative sample data is improved, and the time is saved.

In some embodiments, the step of identifying the user time-series sample set according to the first trained preset neural network model and determining the target positive sample data and the abrupt negative sample data may include:

(1) Sequentially identifying each user time sequence sample in the user time sequence sample set through a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample;

(2) Determining a user time sequence sample with an output value larger than a preset positive sample threshold value as a third target user time sequence sample, and determining a user time sequence sample with an output value larger than a preset negative sample threshold value and smaller than the preset positive sample threshold value as a fourth target user time sequence sample;

(3) The third target user time sequence sample is positively marked, and the third target user time sequence sample after being positively marked and the corresponding first target user time sequence sample sequence are determined to be target positive sample data;

(4) And negative labeling is carried out on the fourth target user time sequence sample, and the fourth target user time sequence sample after negative labeling and the corresponding first target user time sequence sample sequence are determined to be mutation negative sample data.

Each user time sequence sample and a corresponding first target user time sequence sample sequence in the user time sequence sample set are sequentially obtained, each user time sequence sample and a corresponding first target user time sequence sample sequence are identified through a preset neural network model after first training, and a corresponding output value of each user time sequence sample is obtained, wherein the output value is close to 1 and represents that the user time sequence sample is a positive sample, and the output value is close to 0 and represents that the user time sequence sample is a negative sample.

The preset positive sample threshold is a critical value defining whether the user time sequence sample is a positive sample, for example, 0.8, the preset negative sample threshold is a critical value defining whether the user time sequence sample is a negative sample, for example, 0.2, the user time sequence sample with the output value being greater than the preset positive sample threshold is a positive sample, the user time sequence sample with the output value being less than the preset negative sample threshold is a negative sample, and for the user time sequence sample corresponding to the mutation point, the output value is not too high or too low, so that the user time sequence sample with the output value being greater than the preset negative sample threshold and less than the preset positive sample threshold can be determined as the mutation negative sample.

Further, determining a third target user time sequence sample from the user time sequence samples with output values larger than a preset positive sample threshold, wherein the third target user time sequence sample is a positive sample, determining a user time sequence sample with output values larger than a preset negative sample threshold and smaller than the preset positive sample threshold as a fourth target user time sequence sample, performing positive annotation on the third target user time sequence sample, for example, marking 1, and determining the third target user time sequence sample after positive annotation and the corresponding first target user time sequence sample sequence as target positive sample data. And carrying out negative annotation on the fourth target user time sequence sample, for example, annotating 0, and determining the fourth target user time sequence sample after negative annotation and the corresponding first target user time sequence sample sequence as abrupt negative sample data.

In step 104, the target positive sample data and the mutation negative sample data are input to the initialized preset neural network model for the second training, so as to obtain the second trained preset neural network model.

The method comprises the steps of inputting target positive sample data and abrupt negative sample data identified by a first trained preset neural network model into a reinitialized preset neural network model for second training, modeling a first target user time sequence sample sequence corresponding to a user time sequence sample in the target positive sample data by the reinitialized preset neural network model to obtain a corresponding sequence vector, continuously learning a relation among the user time sequence sample in the target positive sample data, positive labeling information and the corresponding sequence vector, adjusting network parameters of the preset neural network model, modeling a first target user time sequence sample sequence corresponding to the user time sequence data in the abrupt negative sample data to obtain a corresponding sequence vector, continuously learning a relation among the user time sequence sample in the abrupt negative sample data, the negative labeling information and the corresponding sequence vector, adjusting network parameters of the preset neural network model to obtain a second trained preset neural network model, wherein the second preset neural network has perfect two-classification capacity, can identify the target positive sample data and the abrupt negative sample in the user time sequence data, and can realize the subsequent division of the user time sequence data according to the abrupt negative sample.

In some embodiments, the step of inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model to perform the second training, and obtaining the second trained preset neural network model may include:

(1) Modeling the first target user time sequence sample sequence through the initialized cyclic neural network to obtain a corresponding sequence vector;

(2) Inputting the time sequence sample, the positive annotation information and the corresponding sequence vector of the third target user into an activated function layer of the initialized cyclic neural network model for second training;

(3) And inputting the fourth target user time sequence sample, the negative annotation information and the corresponding sequence vector into an initialized activation function layer of the cyclic neural network for second training, and obtaining a cyclic neural network model after the second training.

The first target user time sequence sample sequence corresponding to the third target user time sequence sample and the fourth target user time sequence sample can be modeled and calculated through the initialized cyclic neural network model to obtain a sequence vector.

Further, the relation between the time sequence sample of the third target user and the corresponding sequence vector is learned through the softmax layer of the circulating neural network model, the training target is positive annotation information 1, the relation between the time sequence sample of the fourth target user and the corresponding sequence vector is learned through continuously adjusting network parameters in the softmax layer in the learning process, the training target is negative annotation information, and the network parameters in the softmax layer are continuously adjusted through the learning process to obtain a second trained circulating neural network model, so that the second trained circulating neural network model has the function of carrying out sequence division on the time sequence data set of the user to be divided.

In some embodiments, after the step of obtaining the second trained preset neural network model, the method may further include:

(1.1) classifying user time sequence data in the user time sequence data set to be divided based on the second trained cyclic neural network model;

and (1.2) dividing the same type of user time sequence data into the same user time sequence to obtain a plurality of sections of user time sequence sequences.

The first user time sequence data in the user time sequence data set can be determined as a target user time sequence according to time sequence, the next target user time sequence data in the target user time sequence is used as test user time sequence data, the target user time sequence data and the corresponding target user time sequence data are simultaneously input into a second trained cyclic neural network model for identification, when the output classification value is close to 1, the target user time sequence data and the target user time sequence data are indicated to belong to the same class, the target user time sequence data are merged into the target user time sequence, when the output classification value is close to 0, the target user time sequence data and the target user time sequence data are indicated to not belong to the same class, the current target user time sequence is saved and ended, a new target user time sequence is generated based on the target user time sequence data, and the same type of user time sequence data are divided into the same user time sequence to obtain a plurality of sections of user time sequence sequences by analogy.

From the above, the embodiment of the present application collects the user time sequence sample set and generates initial positive sample data and initial negative sample data from the user time sequence sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a preset neural network model after first training; identifying a user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training, and obtaining the second trained preset neural network model. According to the method, the corresponding target positive sample data and the corresponding mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the second training is carried out on the initialized preset neural network according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified by the second trained preset neural network, further automatic division of the user time sequence data can be achieved, and the efficiency and the accuracy of data processing are greatly improved.

Embodiment II,

The method described in accordance with embodiment one is described in further detail below by way of example.

In this embodiment, description will be given by taking an example in which the data processing apparatus is specifically integrated in a server, with specific reference to the following description.

Referring to fig. 3, fig. 3 is a schematic flow chart of a data processing method according to an embodiment of the application.

The method flow may include:

in step 201, the server sequentially selects a first target user timing sample from the user timing sample set according to the time sequence, and selects a first target user timing sample sequence formed by a preset number of user timing samples that are consecutive in time based on the first target user timing sample.

Wherein the user time series sample set may be expressed as { t1, t2, t3,.,. Tk }, assuming that the user time series sample set is a plurality of sets of behavioral characteristic data (user time series samples) generated by a user purchasing a good on a terminal, t1, t2, t3 and tk are all different user time sequence samples, the interaction time of t1 is the user time sequence sample closest to the current time, and the interaction time of tk is the user time sequence sample farthest from the current time. The user time series sample may include multidimensional feature data such as a purchase commodity identification feature, a commodity information feature, a purchase time feature, etc., for example, the t1 may include 3-dimensional feature data, i.e., a user time series sample composed of, for example, 001, 010, and 15-52, the 001 representing the purchased commodity identification feature, the 010 representing the commodity identification feature corresponding to the refrigerator, the 010 representing the commodity information feature, the two-door type commodity information feature, the 15-52 representing the purchase time, and the purchase time feature of the purchase of 15 hours 52 minutes. It should be noted that, the user time sequence samples in the user time sequence sample set are all samples generated by actually occurring interaction behaviors, so that the user time sequence samples in the user time sequence sample set are all positive samples.

Further, the server sequentially selects first target user time sequence samples t1, t2, t3 through tk according to time sequence from the user time sequence sample set { t1, t2, t 3. And selecting a preset number of user time sequence samples which are continuous in time based on the first target user time sequence samples, wherein the preset number can be 4, namely when the first target user time sequence samples are t1, 4 user time sequence samples which are continuous in time are selected based on t1 to be { t2, t3, t4, t5}, and so on. Each target user timing sample and a corresponding first target user timing sample sequence are obtained.

In an embodiment, when a user-sequential sample sequence is taken backwards to the end in the first set of target user-sequential samples, the previous user-sequential sample sequence may be taken backwards to form the first sequence of target user-sequential samples, e.g., the first target user-sequential sample for tk may be { tk-1, t k-2, t k-3, t k-4}.

In step 202, the server performs positive annotation on the first target user time sequence sample, and determines the first target user time sequence sample after positive annotation and the corresponding first target user time sequence sample sequence as initial positive sample data.

In the embodiment of the present application, positive labeling is performed on the first target user time sequence sample by the server, and negative labeling is performed with positive labeling being 1 and negative labeling being 0, that is, labeling 1 is performed on the first target user time sequence sample, and the first target user time sequence sample after positive labeling and the corresponding first target user time sequence sample sequence are determined as initial positive sample data, for example, initial positive sample data { x= { t2, t3, t4, t5}, x '= t1, y=1 }, where x represents the first target sequence sample, x' is the first target user time sequence sample, and y is labeling information.

In step 203, the server sets a first preset number of second target user time sequence samples, performs negative annotation on the second target user time sequence samples, and determines the second target user time sequence samples and the first target user time sequence sample sequence after negative annotation as initial negative sample data.

The server sets a first preset number of second target user time sequence samples, where the second target user time sequence samples are different from the first target user time sequence samples in category, for example, the second target user time sequence samples may include t0, t01, t02 and so on, the second target user time sequence samples may also include 3-dimensional feature data, but the second target user time sequence samples are completely different from the first target user time sequence samples in category, where the first target user time sequence samples are the type of behavior of the user purchasing goods on the terminal, the second target user time sequence samples may be the type of behavior of the user listening to songs on the terminal, for example, t0 may include a second target user time sequence sample composed of 5000, 0500 and 16-05, where 5000 represents a music identification feature listened to, may be a music identification feature corresponding to music, 0500 represents a music information feature, may be a jazz type music information feature, and 16-05 represents a purchasing time feature of 16-05 minutes. The first preset number is smaller than the number of samples in the user time sequence sample set, marking 0 is performed on the second target user time sequence sample, and the second target user time sequence sample after negative marking and the first target user time sequence sample sequence selected randomly are determined as initial negative sample data, for example, initial negative sample data { x= { t2, t3, t4, t5}, x' =t0, y=0 }.

In step 204, the server models the first target user time sequence sample sequence through the recurrent neural network model to obtain a corresponding sequence vector, and inputs the first target user time sequence sample, the positive labeling information and the corresponding sequence vector into an activation function layer of the recurrent neural network model to perform first training.

Referring to fig. 4, the server models the first target user time sequence sample sequence 12 through the recurrent neural network 11 to obtain a corresponding sequence vector 13, where the sequence vector 13 represents the characteristic of the first target user time sequence sample sequence 12, learns the relationship between the first target user time sequence sample 14 and the corresponding sequence vector 13 by activating the function layer 15, and the training target is the positive label information 1, so as to continuously adjust the network parameters in the softmax layer.

In step 205, the server inputs the second target user time sequence sample, the negative annotation information and the corresponding sequence vector into the activation function layer of the recurrent neural network model to perform the first training, so as to obtain the recurrent neural network model after the first training.

The relation between the second target user time sequence sample and the corresponding sequence vector is learned by activating the function layer 15, and the training target is positive annotation information 0, so that network parameters in the softmax layer are continuously adjusted to obtain a first trained cyclic neural network model, the first trained cyclic neural network model can have initial classification capability, the first target user time sequence sample corresponding to purchasing behavior and the second target user time sequence sample corresponding to singing behavior can be identified.

In step 206, the server sequentially identifies each user time sequence sample in the user time sequence sample set through the first trained preset neural network model, and obtains a corresponding output value of each user time sequence sample.

Because the user time sequence sample set has a plurality of corresponding 'fake' user time sequence sample data, for example, the user time sequence samples of the behavior of the household appliances are purchased by the user and then the user time sequence samples of the behavior of the pet appliances are purchased, and the household appliances are purchased differently from the pet appliances, the first trained cyclic neural network model cannot identify the type change of the purchased goods under the behavior of the purchase, so that the classification granularity is poor.

In the embodiment of the application, the server acquires each user time sequence sample and a corresponding first target user time sequence sample sequence in the user time sequence sample set, re-inputs each user time sequence sample and a corresponding first target user time sequence sample sequence into a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample, and the first trained preset neural network model cannot distinguish the 'fake' user time sequence sample data, for example, the user time sequence sample in the user time sequence sample set for purchasing household appliance commodity behaviors occupies a main proportion, such as ninety five percent, and the rest five percent are user time sequence samples for purchasing non-household appliance commodity behaviors, so that the five percent of user time sequence samples for purchasing non-household appliance commodity behaviors are 'fake' user time sequence sample data. The true sample data output value is close to 1, and the output value of the "false" user time series sample data is not too high nor too low, and is in the vicinity of 0.5.

In step 207, the server determines a user timing sample whose output value is greater than a preset positive sample threshold as a third target user timing sample, and a user timing sample whose output value is greater than a preset negative sample threshold and less than a preset positive sample threshold as a fourth target user timing sample.

The preset positive sample threshold may be 0.85, and the preset negative sample threshold may be 0.25, so that the server determines a user time sequence sample with an output value greater than 0.85 as a third target user time sequence sample, where the third target user time sequence sample is a positive sample, and is user time sequence sample data of a household appliance commodity purchasing behavior. And determining the user time sequence sample with the output value larger than 0.25 and smaller than 0.85 as a fourth target user time sequence sample, namely the user time sequence sample for purchasing non-household appliance commodity behaviors.

In step 208, the server performs positive annotation on the third target user time sequence sample, and determines the third target user time sequence sample after positive annotation and the corresponding first target user time sequence sample sequence as target positive sample data.

The server performs positive annotation 1 on the third target user time sequence sample, namely, the user time sequence sample annotation 1 of the behavior of purchasing the household appliance commodity, and determines the third target user time sequence sample after the positive annotation 1 and the corresponding first target user time sequence sample sequence as target positive sample data, wherein the selection modes of the first target user time sequence sample sequence and the subsequent first target user time sequence sample sequence refer to the selection methods, and are not described herein.

In step 209, the server performs negative annotation on the fourth target user time sequence sample, and determines the fourth target user time sequence sample after negative annotation and the corresponding first target user time sequence sample sequence as abrupt negative sample data.

The server performs positive annotation 0 on the fourth target user time sequence sample, namely the user time sequence sample annotation 0 for purchasing non-home appliance commodity behaviors, and determines the fourth target user time sequence sample after the negative annotation 0 and the corresponding first target user time sequence sample sequence as target positive sample data.

In step 210, the server models the first target user time sequence sample sequence through the initialized recurrent neural network model to obtain a corresponding sequence vector, and inputs the third target user time sequence sample, the positive labeling information and the corresponding sequence vector into the initialized activated function layer of the recurrent neural network model to perform the second training.

The server carries out modeling calculation on a first target user time sequence sample sequence corresponding to a third target user time sequence sample and a fourth target user time sequence sample through a cyclic neural network initialized by network parameters to obtain a sequence vector.

Further, the relation between the time sequence sample of the third target user and the corresponding sequence vector is learned through the activated function layer of the initialized cyclic neural network model, the training target is the positive annotation information 1, and the network parameters in the activated function layer are continuously adjusted through the learning process, so that the initialized cyclic neural network model learns to obtain the capability of identifying the time sequence sample of the user purchasing the commodity behavior of the household appliance.

In step 211, the server inputs the fourth target user time sequence sample, the negative label information and the corresponding sequence vector into the activated function layer of the initialized recurrent neural network model to perform the second training, so as to obtain the second trained recurrent neural network model.

The server continuously learns a false user time sequence sample (namely, a user time sequence sample for purchasing non-household appliance commodity behaviors) through the initialized activation function of the circulating neural network model, namely, the relationship between a fourth target user time sequence sample and a corresponding sequence vector, wherein a training target is negative annotation information 0, network parameters in an activation function layer are continuously adjusted through a learning process, a second trained circulating neural network model is obtained, and the second trained circulating neural network model learns to obtain the capability of identifying the user time sequence sample for purchasing the non-household appliance commodity behaviors.

In step 212, the server determines the first user timing data in the set of user timing data as a target user timing sequence in chronological order.

The user time sequence data set includes a plurality of user time sequence data to be divided distributed according to time sequence, for example, the user time sequence data set to be divided is { s1, s2, …, sk }, the user time sequence data set to be divided includes a plurality of user time sequence data, the occurrence time of the user time sequence data s1 is earlier than the occurrence time of the user time sequence data s2, and finally, the server determines { s1} as a target user time sequence.

In step 213, the server acquires the next target user time sequence data after the target user time sequence, inputs the target user time sequence data and the current target user time sequence data into the second trained recurrent neural network model, and outputs the classification value corresponding to the target user time sequence data.

The server acquires target user time sequence { s1} and then acquires target user time sequence data s2, determines s2 as target user time sequence data, inputs the target user time sequence data s2 and a corresponding first user time sequence { s1} into a second trained cyclic neural network model, the second trained cyclic neural network model calculates a classification value corresponding to the target user time sequence data according to the relation between a sequence vector of the target user time sequence { s1} and the target user time sequence data s2, and the target user time sequence data s1 is assumed to be a user time sequence sample for purchasing household appliance commodity behaviors, the target user time sequence data s2 is also a user time sequence sample for purchasing household appliance commodity behaviors, and after the target user time sequence data and the corresponding target user time sequence data are input into the second trained cyclic neural network model, the second trained cyclic neural network model can firstly perform modeling calculation on the target user time sequence data to obtain a corresponding sequence vector, and further calculate the target user time sequence data and the target user time sequence data through an activation function of the second trained cyclic neural network, and the target user time sequence data is output a classification value corresponding to the target user time sequence data, and the target user time sequence data is not similar to the classification value of the target user time sequence data, and the target user time sequence data is similar to the target user time sequence data, and the classification value is not similar to the target time sequence data.

In step 214, it is detected whether the classification value is greater than a preset confidence level.

The preset confidence is a defined value defining whether the target user time sequence data belongs to the target user time sequence, the preset confidence may be considered to be set, the higher the preset confidence is, the lower the defined standard is, when the classification value is detected to be greater than the preset confidence, step 215 is executed, and when the classification value is detected to be not greater than the preset confidence, step 216 is executed.

In step 215, the server merges the target user timing data into a target user timing sequence.

When the server detects that the classification value is greater than the preset confidence, for example, the preset confidence is 0.75, and the classification value is 0.88, then the classification value is greater than the preset confidence, which indicates that the target user time sequence data belongs to the target user time sequence, and the server merges the target user time sequence data { s2} of the same category into the target user time sequence { s1} to obtain a merged target user time sequence { s1, s2}.

In step 216, the server saves and ends the current target user timing sequence, generating a new target user timing sequence based on the target user timing data.

When the server detects that the classification value is not greater than the preset confidence, for example, the classification value is 0.22, then the classification value is smaller than the preset confidence, which indicates that the target user time sequence data does not belong to the target user time sequence, and supposing that s3 is a user time sequence sample of the behavior of purchasing the pet supplies, then the target user time sequence data and the user time sequence { s1, s2} of the behavior of purchasing the household appliances obviously do not belong to the same category, the server stores and ends the current target user time sequence { s1, s2}, and generates a new target user time sequence { s3} based on the target user time sequence data s3 of the behavior of purchasing the pet supplies of another category.

In step 217, it is detected whether the user time series data classification is ended.

After the server merges the target user time sequence or generates a new target user time sequence, it will correspondingly detect whether the user time sequence data classification is finished, where the judging condition of the finish is whether there is unremoved user time sequence data in the user time sequence data set, when the user time sequence data in the user time sequence data set is completely traversed, that is, when all data in the user time sequence data sets { s1, s2, …, sk } to be divided are completely traversed, it is judged that the user time sequence data classification is finished, step 218 is executed, when the user time sequence data in the user time sequence data set is not completely traversed, that is, when the user time sequence data in the user time sequence data set is not completely traversed yet, it is judged that the user time sequence data classification is not finished, step 213 is executed, and the dividing process is continued until the user time sequence data in the user time sequence data set is completely traversed.

In step 218, the server determines that the user time series data classification is ended.

When the user time sequence data classification is detected to be finished, multiple sections of user time sequence sequences after division according to the mutation points can be obtained, namely multiple sections of target user time sequence sequences are obtained, accurate and automatic division and classification of the user time sequence data are realized, and the accuracy and the efficiency of data processing are improved.

Third embodiment,

In order to facilitate better implementation of the data processing method provided by the embodiment of the application, the embodiment of the application also provides a device based on the data processing method. Where the meaning of a noun is the same as in the data processing method described above, specific implementation details may be referred to in the description of the method embodiments.

Referring to fig. 5a, fig. 5a is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the data processing apparatus may include a generating unit 301, a first training unit 302, a determining unit 303, and a second training unit 304.

A generating unit 301 for acquiring a user time series sample set and generating initial positive sample data and initial negative sample data from the user time series sample set.

In some embodiments, as shown in fig. 5b, the generating unit 301 includes:

an acquisition subunit 3011, configured to acquire a user time-sequence sample set;

a selecting subunit 3012, configured to sequentially select a first target user timing sample and a corresponding first target user timing sample sequence from the user timing sample set;

the positive labeling subunit 3013 is configured to perform positive labeling on the first target user time sequence sample, and determine the first target user time sequence sample after positive labeling and the corresponding first target user time sequence sample sequence as initial positive sample data;

A setting subunit 3014, configured to set a first preset number of second target user timing samples, where a class of the second target user timing samples is different from a class of the user timing sample set;

the negative annotation subunit 3015 is configured to perform negative annotation on the second target user time sequence sample, and determine the second target user time sequence sample and the first target user time sequence sample after negative annotation as initial negative sample data.

In some embodiments, the selecting subunit 3012 is configured to: sequentially selecting a first target user time sequence sample from the user time sequence sample set according to the time sequence; a first target user-timing sample sequence generated from a plurality of user-timing samples in a set of user-timing samples that are temporally consecutive to the first target user-timing sample is obtained.

In some embodiments, the selecting subunit 3012 is further configured to: sequentially selecting a first target user time sequence sample from the user time sequence sample set according to the time sequence; selecting a first target user time sequence sample sequence formed by a preset number of user time sequence samples which are continuous in time based on the first target user time sequence samples; or determining the user time sequence samples remaining from the user time sequence sample set divided by the first target user time sequence sample as a first target user time sequence sample sequence.

The first training unit 302 is configured to input the initial positive sample data and the initial negative sample data to a preset neural network model for performing a first training, so as to obtain the preset neural network model after the first training.

In some embodiments, the preset neural network model is a recurrent neural network model, and the first training unit 302 is configured to: modeling the first target user time sequence sample sequence through a cyclic neural network model to obtain a corresponding sequence vector; inputting the first target user time sequence sample, the positive annotation information and the corresponding sequence vector into an activation function layer of the cyclic neural network model for first training; and inputting the second target user time sequence sample, the negative annotation information and the corresponding sequence vector into an activation function layer of the cyclic neural network model for first training to obtain the cyclic neural network model after the first training.

And the determining unit 303 is configured to identify the user time sequence sample set according to the first trained preset neural network model, and determine target positive sample data and abrupt negative sample data.

In some embodiments, the determining unit 303 is configured to: sequentially identifying each user time sequence sample in the user time sequence sample set through a first trained preset neural network model to obtain a corresponding output value of each user time sequence sample; determining a user time sequence sample with an output value larger than a preset positive sample threshold value as a third target user time sequence sample, and determining a user time sequence sample with an output value larger than a preset negative sample threshold value and smaller than the preset positive sample threshold value as a fourth target user time sequence sample; the third target user time sequence sample is positively marked, and the third target user time sequence sample after being positively marked and the corresponding first target user time sequence sample sequence are determined to be target positive sample data; and negative labeling is carried out on the fourth target user time sequence sample, and the fourth target user time sequence sample after negative labeling and the corresponding first target user time sequence sample sequence are determined to be mutation negative sample data.

The second training unit 304 is configured to input the target positive sample data and the mutation negative sample data to the initialized preset neural network model for performing a second training, so as to obtain a second trained preset neural network model.

In some embodiments, the preset neural network model is a recurrent neural network model, and the second training unit 304 is configured to: modeling the first target user time sequence sample sequence through the initialized cyclic neural network model to obtain a corresponding sequence vector; inputting the time sequence sample, the positive annotation information and the corresponding sequence vector of the third target user into an activated function layer of the initialized cyclic neural network model for second training; and inputting the fourth target user time sequence sample, the negative annotation information and the corresponding sequence vector into an activated function layer of the initialized cyclic neural network model for second training, and obtaining the cyclic neural network model after the second training.

In some embodiments, as shown in fig. 5c, the data processing apparatus further comprises:

a classification unit 305, configured to classify user time sequence data in the user time sequence data set to be divided based on the second trained recurrent neural network model;

The dividing unit 306 is configured to divide the user time sequence data of the same type into the same user time sequence, so as to obtain a plurality of segments of user time sequence.

In some embodiments, the classifying unit 305 is configured to: determining first user time sequence data in the user time sequence data set as a target user time sequence according to the time sequence; acquiring target user time sequence data after the target user time sequence; inputting the target user time sequence data and the corresponding first user time sequence into a second trained cyclic neural network model, and outputting a classification value corresponding to the target user time sequence data; when the classification value is greater than the preset confidence, merging the target user time sequence data into a target user time sequence, and returning to execute the step of acquiring the target user time sequence data after the target user time sequence until the user time sequence data classification is finished; when the classification value is not greater than the preset confidence, the current target user time sequence is saved and ended, a new target user time sequence is generated based on the target user time sequence data, and the step of acquiring the target user time sequence data after the target user time sequence is executed is returned until the classification of the user time sequence data is ended.

The specific implementation of each unit can be referred to the previous embodiments, and will not be repeated here.

As can be seen from the foregoing, in the embodiment of the present application, the generating unit 301 collects the user time-series sample set, and generates initial positive sample data and initial negative sample data from the user time-series sample set; the first training unit 302 inputs the initial positive sample data and the initial negative sample data to a preset neural network model to perform first training, so as to obtain a preset neural network model after the first training; the determining unit 303 identifies the time sequence sample set of the user according to the first trained preset neural network model, and determines target positive sample data and mutation negative sample data; the second training unit 304 inputs the target positive sample data and the mutation negative sample data to the initialized preset neural network model for second training, and obtains the second trained preset neural network model. According to the method, the corresponding target positive sample data and the corresponding mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the second training is carried out on the initialized preset neural network according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified by the second trained preset neural network, further automatic division of the user time sequence data can be achieved, and the efficiency and the accuracy of data processing are greatly improved.

Fourth embodiment,

The embodiment of the present application further provides an electronic device, and the embodiment of the present application is described by taking the electronic device as a server as an example, as shown in fig. 6, which shows a schematic structural diagram of the server according to the embodiment of the present application, specifically:

the server may include one or more processors 401 of a processing core, memory 402 of one or more computer readable storage media, a power supply 403, and an input unit 404, among other components. Those skilled in the art will appreciate that the server architecture shown in fig. 6 is not limiting of the server and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

Wherein:

the processor 401 is a control center of the server, connects respective portions of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the server. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The server also includes a power supply 403 for powering the various components, and preferably, the power supply 403 may be logically connected to the processor 401 by a power management system so as to implement functions such as charge, discharge, and power consumption management by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The server may also include an input unit 404, which input unit 404 may be used to receive entered numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the server may further include a display unit or the like, which is not described herein. In this embodiment, the processor 401 in the server loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

collecting a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set; inputting the initial positive sample data and the initial negative sample data into a preset neural network model for first training to obtain a preset neural network model after the first training; identifying the user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training to obtain a second trained preset neural network model, wherein the second trained preset neural network model is used for carrying out sequence division on a user time sequence data set to be divided.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and the portions of a certain embodiment that are not described in detail may be referred to the above detailed description of the data processing method, which is not repeated herein.

As can be seen from the foregoing, the server according to the embodiment of the present application may collect the user time-series sample set, and generate initial positive sample data and initial negative sample data from the user time-series sample set; inputting initial positive sample data and initial negative sample data into a preset neural network model for first training to obtain a preset neural network model after first training; identifying a user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data; and inputting the target positive sample data and the mutation negative sample data into the initialized preset neural network model for second training, and obtaining the second trained preset neural network model. According to the method, the corresponding target positive sample data and the corresponding mutation negative sample data are identified through the preset neural network model after the first training is carried out on the initial positive sample data and the initial negative sample data, and the second training is carried out on the initialized preset neural network according to the target positive sample data and the mutation negative sample data, so that the mutation user time sequence data in the sequence can be identified by the second trained preset neural network, further automatic division of the user time sequence data can be achieved, and the efficiency and the accuracy of data processing are greatly improved.

Fifth embodiment (V),

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the data processing methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Because the instructions stored in the computer readable storage medium may execute the steps in any data processing method provided by the embodiments of the present application, the beneficial effects that any data processing method provided by the embodiments of the present application can be achieved, which are detailed in the previous embodiments and are not described herein.

The foregoing has described in detail a data processing method, apparatus, computer readable storage medium and electronic device according to embodiments of the present application, and specific examples have been applied to illustrate the principles and embodiments of the present application, where the foregoing examples are provided to assist in understanding the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. A method of data processing, comprising:

collecting a user time sequence sample set, and generating initial positive sample data and initial negative sample data from the user time sequence sample set, wherein the user time sequence sample set comprises a plurality of behavior characteristic data of continuous interaction of a user on a terminal, the initial positive sample data comprises a first target user time sequence sample corresponding to an interactive object selected from the user time sequence sample set and a first target user time sequence sample sequence corresponding to the first target user time sequence sample in the user time sequence sample set, the user time sequence samples in the user time sequence sample set are all positive samples, the initial negative sample data comprises a second target user time sequence sample and a first target user time sequence sample sequence acquired randomly, the category of the initial negative sample data is inconsistent with the category of the initial positive sample data, and the second target user time sequence sample is a false user time sequence sample generated in the object which the user does not interact with;

Identifying the user time sequence sample set according to a first trained preset neural network model, and determining target positive sample data and mutation negative sample data, wherein the mutation negative sample data is mutation points of user behaviors in the user time sequence sample set, and the target positive sample data is non-mutation points of the user behaviors in the user time sequence sample set;

2. The data processing method of claim 1, wherein the step of generating initial positive sample data and initial negative sample data from the user time series sample set comprises:

sequentially selecting a first target user time sequence sample and a corresponding first target user time sequence sample sequence from the user time sequence sample set;

performing positive annotation on the first target user time sequence sample, and determining the first target user time sequence sample after positive annotation and a corresponding first target user time sequence sample sequence as initial positive sample data;

Setting a first preset number of second target user time sequence samples, wherein the categories of the second target user time sequence samples are different from the categories of the user time sequence sample sets;

and negative labeling is carried out on the second target user time sequence sample, and the second target user time sequence sample and the first target user time sequence sample after negative labeling are determined to be initial negative sample data.

3. The data processing method according to claim 2, wherein the step of sequentially selecting the first target user time-series samples and the corresponding first target user time-series sample sequences from the user time-series sample sets comprises:

4. A data processing method according to claim 3, wherein the step of obtaining a first target user-timing sample sequence generated from a plurality of user-timing samples that are temporally consecutive to the first target user-timing sample in a user-timing sample set comprises:

5. The method for processing data according to any one of claims 2 to 4, wherein the predetermined neural network model is a cyclic neural network model, and the step of inputting the initial positive sample data and the initial negative sample data into the predetermined neural network model to perform the first training, to obtain the predetermined neural network model after the first training includes:

inputting the first target user time sequence sample, the positive annotation information and the corresponding sequence vector into an activation function layer of the cyclic neural network model for first training to obtain a first model;

and inputting the second target user time sequence sample, the negative annotation information and the corresponding sequence vector into an activation function layer of the first model to perform first training, so as to obtain a first trained cyclic neural network model.

6. The method according to any one of claims 1 to 4, wherein the step of identifying the user time-series sample set according to a first trained preset neural network model, and determining target positive sample data and abrupt negative sample data includes:

7. The method for processing data according to claim 6, wherein the predetermined neural network model is a cyclic neural network model, and the step of inputting the target positive sample data and the abrupt negative sample data to the initialized predetermined neural network model to perform the second training to obtain the second trained predetermined neural network model includes:

inputting the time sequence sample, the positive annotation information and the corresponding sequence vector of the third target user into an activated function layer of the initialized cyclic neural network model for second training to obtain a second model;

and inputting the fourth target user time sequence sample, the negative annotation information and the corresponding sequence vector into an activation function layer of the second model to perform second training, and obtaining a second trained cyclic neural network model.

8. The method of claim 7, further comprising, after the step of obtaining the second trained pre-set neural network model:

classifying the user time sequence data in the user time sequence data set to be divided based on the second trained cyclic neural network model;

And dividing the same type of user time sequence data into the same user time sequence to obtain a plurality of sections of user time sequence.

9. The data processing method according to claim 8, wherein the step of classifying the user time series data in the user time series data set to be divided based on the second trained recurrent neural network model comprises:

acquiring target user time sequence data after the target user time sequence;

10. A data processing apparatus, comprising:

the generating unit is used for collecting a user time sequence sample set and generating initial positive sample data and initial negative sample data from the user time sequence sample set, wherein the user time sequence sample set comprises a plurality of behavior characteristic data of continuous interaction of a user on a terminal, the initial positive sample data comprises a first target user time sequence sample corresponding to an interacted object selected from the user time sequence sample set and a first target user time sequence sample sequence corresponding to the first target user time sequence sample in the user time sequence sample set, the user time sequence samples in the user time sequence sample set are positive samples, the initial negative sample data comprises a second target user time sequence and a first target user time sequence sample sequence obtained randomly, the category of the initial negative sample data is inconsistent with the category of the initial positive sample data, and the second target user time sequence sample is a false user time sequence sample generated in the object which the user does not interact with;

The determining unit is used for identifying the user time sequence sample set according to a first trained preset neural network model, determining target positive sample data and mutation negative sample data, wherein the mutation negative sample data is mutation points of user behaviors in the user time sequence sample set, and the target positive sample data is non-mutation points of the user behaviors in the user time sequence sample set;

11. A computer readable storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor for performing the steps in the data processing method according to any of claims 1 to 9.

12. An electronic device comprising a processor and a memory, the memory storing a plurality of computer instructions, wherein the processor loads the computer instructions to perform the steps in the data processing method of any of claims 1 to 9.