CN112929751B

CN112929751B - System, method and terminal for determining action execution

Info

Publication number: CN112929751B
Application number: CN201911244065.XA
Authority: CN
Inventors: 姜飞; 韩帅; 卞俊杰; 王天驹; 杨乃君; 叶璨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2022-11-18
Anticipated expiration: 2039-12-06
Also published as: CN112929751A

Abstract

The embodiment of the invention provides a system, a method and a terminal for determining action execution. The system comprises: the system comprises a characteristic calling unit, a service processing unit and a service processing unit, wherein the characteristic calling unit is used for responding to a received access request carrying account information and acquiring characteristic information of a target service corresponding to the account information; the action determining model is used for receiving the characteristic information of the target service sent by the characteristic calling unit and generating an action instruction corresponding to the target service based on the characteristic information of the target service, wherein the action instruction is used for indicating whether to execute a service action corresponding to the target service; the feature calling unit is further configured to return the action instruction sent by the received action determining model to the sender of the access request. By the scheme provided by the embodiment of the invention, the problem that the accuracy of judging whether to execute the service action corresponding to the service at a specific time is lower in the prior art is solved.

Description

System, method and terminal for determining action execution

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a system, a method, and a terminal for determining execution of an action.

Background

In many application scenarios, a computer is often required to generate action instructions at specific times. For example, in an application program, when a user network is switched from a wifi network to a 4g network, it is necessary to determine whether to pop up a window to remind the user to switch to low-resolution video playing. Therefore, in this scenario, it is necessary for the computer to generate an action command according to the current state by an algorithm, so that the best experience can be brought to the user. However, pop-up interrupts the user's viewing process, creating a distraction, while not pop-up tends to cause the user to consume a lot of traffic.

In addition, each generated action command may affect the state of the user, and further affect the long-term experience of the user. For example, if the message notification function of the mobile phone pushes a message to the user, which causes disturbance, the user may have some negative feedback, such as turning off the system notification, due to annoyance, even if there is better content in the next period.

As can be seen from the above, in many cases, these scenarios requiring the generation of action commands use some simple strategies. For example, how often to pop up, or show a card, record the last page closed when an application starts, etc. However, these strategies are clearly not optimal.

In summary, in the prior art, the accuracy of determining whether to execute the service action corresponding to the service at a specific time is low.

Disclosure of Invention

The embodiment of the invention provides a system and a method for determining action execution, and the system, the method and the terminal for determining action execution, so as to solve the problem that the accuracy of judging whether to execute a service action corresponding to a service at a specific time is lower in the prior art.

According to a first aspect of embodiments of the present invention, there is provided a system for determining an action execution, comprising:

the system comprises a characteristic calling unit, a service processing unit and a service processing unit, wherein the characteristic calling unit is used for responding to a received access request carrying account information and acquiring characteristic information of a target service corresponding to the account information;

the action determining model is used for receiving the feature information of the target service sent by the feature calling unit and generating an action instruction corresponding to the target service based on the feature information of the target service, wherein the action instruction is used for indicating whether to execute a service action corresponding to the target service;

the feature calling unit is further configured to return the received action instruction sent by the action determining model to a sender of the access request;

the action determining model is obtained by training a training sample of feedback information for executing the action instruction according to characteristic information containing multiple services, the action instruction corresponding to the characteristic information and a corresponding account.

Optionally, the method further includes: the system comprises a sample splicing unit, a data stream unit and a model training unit;

the sample splicing unit is used for acquiring characteristic information of multiple services, action instructions corresponding to the characteristic information and feedback information of corresponding accounts for executing the action instructions, splicing the characteristic information, the action instructions and the feedback information into training samples, and sending the training samples to the data flow unit;

the data stream unit is used for splicing the training samples into a state transition sample stream;

the model training unit is configured to receive the state transition sample stream sent by the data stream unit, train the state transition sample stream, and obtain the action determination model.

Optionally, the process of splicing the training samples into the state transition sample stream by the data stream unit includes:

and splicing two training samples which belong to the same user and are adjacent in time into one state transition sample stream.

Optionally, the process of training the state transition sample stream by the model training unit includes:

and training the state transition sample stream by adopting a time sequence difference method.

Optionally, the model training unit adopts a time sequence difference method, and the process of training the state transition sample stream includes:

storing the state transition sample stream into a data warehouse hive;

converting the data stored in the data warehouse hive into a data format matched with the action determination model to obtain a sample to be trained;

storing the sample to be trained in a distributed file system hdfs;

and reading data in the hdfs of the distributed file system, and training by adopting a time sequence difference method.

Optionally, the method further includes:

the data queue is used for receiving the feature information of the multiple services sent by the feature calling unit and the action instruction corresponding to the feature information sent by the action determining model, storing the feature information and the action instruction according to the time interval of the feedback information of two adjacent action instructions and sending the feature information and the action instruction to the sample splicing unit;

the process of splicing the characteristic information, the action instruction and the feedback information into the training sample by the sample splicing unit comprises the following steps:

and splicing the characteristic information, the action instruction and the feedback information into a training sample according to the time interval between two adjacent feedback information and the delay time between two adjacent characteristic information in the data queue.

According to a second aspect of embodiments of the present invention, there is provided a method for determining execution of an action, comprising:

responding to a received access request carrying account information, and acquiring characteristic information of a target service corresponding to the account information;

inputting the characteristic information of the target service into a predetermined action determination model, and outputting an action instruction corresponding to the target service;

sending the action instruction to a sender of the access request;

Optionally, the motion determination model is obtained by training through the following process:

acquiring feature information of multiple services, action instructions corresponding to the feature information and feedback information of corresponding accounts on execution of the action instructions;

splicing the characteristic information, the action instruction and the feedback information into a training sample;

splicing the training samples into a state transition sample stream;

and training the state transition sample flow to obtain the action determination model.

Optionally, the splicing the training samples into a state transition sample stream includes:

Optionally, training the state transition sample stream includes:

Optionally, the training the state transition sample stream by using a time sequence difference method includes:

storing the state transition sample stream into a data warehouse hive;

storing the sample to be trained in a distributed file system hdfs;

Optionally, the obtaining of the action instruction corresponding to the feature information includes:

acquiring action instructions which are output by the action determining model and correspond to the characteristic information;

the splicing the characteristic information, the action instruction and the feedback information into a training sample comprises:

storing the characteristic information and the action instruction according to the time interval of the feedback information of two adjacent action instructions;

and splicing the characteristic information, the action instruction and the feedback information into a training sample according to the time interval between two adjacent pieces of feedback information and the stored delay time between two adjacent pieces of characteristic information.

According to a third aspect of embodiments of the present invention, there is provided a terminal, including:

a processor;

a memory configured to store the processor-executable instructions;

wherein the processor is configured to execute to implement the operations performed by the method for determining the performance of an action as provided by the present invention.

According to a fourth aspect of the embodiments of the present invention, there is provided a terminal, including: the computer program product comprises a memory, a processor and a program stored on the memory and executable on the processor for determining action execution, wherein the program for determining action execution realizes the steps of any method for determining action execution in the invention when executed by the processor.

According to a fifth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a program for determining execution of an action, which when executed by a processor implements the steps of any one of the methods for determining execution of an action described in the present invention.

Compared with the prior art, the invention has the following advantages:

in the system for determining action execution provided by the embodiment of the present invention, a feature calling unit responds to a received access request carrying account information, and acquires feature information of a target service corresponding to the account information, so as to send the feature information to an action determination model, the action determination model outputs an action instruction corresponding to the target service, and returns the action instruction to the feature calling unit, so that the feature calling unit returns the action instruction to a sender of the access request, where the action determination model is obtained by training a training sample for executing feedback information of the action instruction according to feature information including multiple services, an action instruction corresponding to the feature information, and a corresponding account. Therefore, the system of the embodiment of the invention can generate the action instruction to indicate whether to execute the service action corresponding to the target service on the basis of training and learning the characteristic information and the action instruction of a plurality of services and the feedback information of the function corresponding to the execution action instruction of the user, so that the generated action instruction can better accord with the characteristic information of each service, the generated action instruction has higher accuracy, and the requirement of the user is better met, thereby improving the user experience.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings.

In the drawings:

FIG. 1 is a schematic diagram of a system for determining the performance of an action according to a first embodiment of the invention;

FIG. 2 is a schematic diagram of a system for determining action execution according to a second embodiment of the present invention;

FIG. 3 is a schematic diagram of a particular implementation of a system for determining performance of an action in an embodiment of the invention;

FIG. 4 is a flow chart of steps of a method for determining execution of an action according to a third embodiment of the present invention;

fig. 5 is a block diagram of a terminal according to a fourth embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Example one

Referring to fig. 1, a system for determining execution of an action according to a first embodiment of the present invention is shown, and the system for determining execution of an action may include:

the system comprises a feature calling unit, a feature analyzing unit and a feature analyzing unit, wherein the feature calling unit is used for responding to a received access request carrying account information and acquiring feature information of a target service corresponding to the account information;

In the embodiment of the present invention, the target service is a service that can be implemented by the terminal, for example, a service for watching a video using a video application, a service for searching information using a browser, a service for switching networks by the terminal, and the like. It should be understood that the target service is only illustrated here, and the content included in the target service is not limited to the description herein.

The multiple services for obtaining the action determination model are services that can be implemented by the terminal, such as a service for watching a video using a video application program, a service for searching information using a browser, and a service for switching networks by the terminal. It should be understood that the service is only illustrated here, and the content included in the service is not limited to the description here. The action instruction corresponding to the characteristic information is a control instruction for controlling the sender to execute the service corresponding to the characteristic information, so that the corresponding function is realized. The feedback information is feedback of the function realized by the sender after the action instruction is executed.

In addition, the characteristic information includes at least one of a user portrait, a scene characteristic, a historical behavior, and a content characteristic. Specifically, these items are specifically described as follows:

the user portrait includes attribute information of the user, such as age, sex, hobbies, and historical browsing records on the terminal.

And the scene characteristics are used for describing the application scene of the target service, such as the application time of the target service, the type of the applied network and the like.

Historical behavior: including the user's operational behavior with respect to the target service, such as number of clicks, praise, etc. over a predetermined period of time.

Content characteristics are as follows: the characteristics of the content displayed on the display interface of the target service are included, such as the attribute of the pushed content when information is pushed, the value of the bid item when an advertisement bids, and the like.

In addition, the running state of the sender executing the service action according to the action instruction may change, for example, the target service is to watch a video by using a video application program, and the action instruction is to push the video, so that after receiving the action instruction, the terminal may display a pushed video list when refreshing a page.

As can be seen from the above, in the system for determining action execution according to the embodiment of the present invention, the feature calling unit responds to the received access request carrying account information, and acquires feature information of a target service corresponding to the account information, so as to send the feature information to the action determining model, the action determining model outputs an action instruction corresponding to the target service, and returns the action instruction to the feature calling unit, so that the feature calling unit returns the action instruction to the sender of the access request, where the action determining model is obtained by training a training sample of feedback information for executing the action instruction according to feature information including multiple services, an action instruction corresponding to the feature information, and a corresponding account. Therefore, the system of the embodiment of the invention can generate the action instruction to indicate whether to execute the service action corresponding to the target service on the basis of training and learning the characteristic information and the action instruction of a plurality of services and the feedback information of the function corresponding to the execution action instruction of the user, so that the generated action instruction can better accord with the characteristic information of each service, the generated action instruction has higher accuracy, and the requirement of the user is better met, thereby improving the user experience.

Example two

Referring to fig. 2, a system for determining execution of an action according to a second embodiment of the present invention is shown, where the system for determining execution of an action may include:

the action determining model is obtained by training a training sample of feedback information for executing the action instruction according to characteristic information containing multiple services, the action instruction corresponding to the characteristic information and a corresponding account;

the system comprises a sample splicing unit, a data stream unit and a model training unit;

the model training unit is used for receiving the state transition sample stream sent by the data stream unit, and training the state transition sample stream to obtain the action determination model.

In the embodiment of the present invention, the target service is a service that can be implemented by the terminal, for example, a service for watching a video using a video application, a service for searching information using a browser, a service for switching networks by the terminal, and the like. It should be understood that the target service is only illustrated here, and the content included in the target service is not limited to the description here.

The multiple services for obtaining the action determination model are services that can be implemented by the terminal, such as a service for watching a video using a video application program, a service for searching information using a browser, a service for switching networks by the terminal, and the like. It should be understood that the service is only illustrated here, and the content included in the service is not limited to the description here. The action instruction corresponding to the characteristic information is a control instruction for controlling the sender to execute the service corresponding to the characteristic information, so that the corresponding function is realized. The feedback information is feedback of the function realized by the sender after the action instruction is executed.

In addition, the characteristic information includes at least one of a user portrait, a scene characteristic, a historical behavior, and a content characteristic. Specifically, the following are specific descriptions of these items:

Historical behavior: including the user's operation behavior with respect to the target service, such as number of clicks, praise, etc. in a predetermined period of time.

In addition, in the embodiment of the invention, the training action determination model needs mass data, namely needs mass service characteristic information and action instructions and feedback information of the user to the function corresponding to the action instructions, and is trained. In the training process, firstly, feature information, action instructions and feedback information of a service need to be spliced to obtain training samples, wherein one training sample comprises feature information of one service, action instructions corresponding to the service and feedback information of functions corresponding to the action instructions by a user.

Specifically, the data format of a training sample may be as follows:

characteristic information: a dictionary-like structure, each entry corresponding to a feature name and its corresponding value;

and (3) action instructions: a dictionary type structure, wherein each entry corresponds to an action instruction or supplementary information given by the model;

feedback information: dictionary-like structure, each entry corresponding to a form of reward, such as click, like, etc.

Action command sequence number (epicode id): for identifying to which sequence of action instructions the training sample belongs. For example, a sequence of motion instructions for a user within a day may be an epsilon. In this case, the user id may be referred to as the epadiode id.

After the training samples are obtained, state transition stitching may be performed on the training samples. The state transition splicing is different from the supervised learning, and the overall effect of an action sequence can be optimized through reinforcement learning, so that in the embodiment of the invention, in the process of training the action determination model, the state transition splicing is carried out on the training samples, the overall effect of an action instruction sequence can be optimized through reinforcement learning optimization, and therefore, the action instruction output by the action determination model can be improved, the function which is more in line with the user requirement can be realized, and the accuracy of the action instruction generated by the action determination model is improved.

After the state transition sample stream is obtained, the state transition sample stream may then be trained to obtain a motion determination model.

That is, in the embodiment of the present invention, two training samples that belong to the same user and are adjacent in time may be spliced into one state transition sample stream. For example, a first training sample, a second training sample, and a third training sample for user a are obtained, the three training samples having a temporal ordering of: the first training sample, the second training sample, and the third training sample, the first training sample and the second training sample may be spliced into one state transition sample stream, and the second training sample and the third training sample may be spliced into one state transition sample stream.

Where the time-sequential difference method is to simulate (or go through) a sequence, one step (or several steps) per action, based on the value of the new state, and then estimate the value of the state before execution. Therefore, the action determining model is obtained by training the state transition sample by adopting a time sequence difference method, so that the action instruction output by the action determining model can be matched with the characteristic information of the service, namely the accuracy of the action generating instruction can be improved.

storing the state transition sample stream into a data warehouse hive;

storing the sample to be trained into a distributed file system hdfs;

The data format matched with the motion determination model may be, for example, a TFRecord data format. The TFRecord uses a Protocol Buffer binary data coding scheme, only occupies one memory block, only needs a mode of loading a binary file at one time, is simple and quick, and is particularly friendly to large-scale training data. When the amount of training data is large, the data can be divided into a plurality of TFRecord files to improve the processing efficiency.

Therefore, in the embodiment of the present invention, when the state transition sample stream is trained by using the time sequence difference method, a series of standardized processes are required, for example, the state transition sample stream is firstly stored into hive, then is converted into a format required by the action determination model and is stored into hdfs, so as to read the data on hdfs to train the action determination model. In the training process, the algorithm and the service are separated, and strong learning algorithms such as dqn (a depth state action value function network), ddpg (a depth determination strategy gradient network) and the like and corresponding evaluation indexes are supported on the aspect of the algorithm, so that the method is universal to all services; on the service level, each service has its own feature configuration and service index. Therefore, for a brand-new service, only the relevant part of the service needs to be defined, and the model can be trained.

The Hive is a data warehouse tool based on Hadoop, can map the structured data file into a database table, and provides a query function similar to SQL. The Hive is provided with: extensibility (for example, hive can freely extend the size of a cluster, and generally does not need to restart a service), extensibility (for example, hive supports a user-defined function, and a user can implement the function according to the requirement), and good fault tolerance (for example, when a node has a problem, SQL can still complete execution). Therefore, in the embodiment of the invention, in the process of training the motion determination model, the state transition sample stream is stored in Hive, so that the query and analysis of data are facilitated.

In addition, hdfs is characterized by high fault tolerance and is designed for deployment on inexpensive hardware. And it provides high throughput to access data of application programs, suitable for application programs with very large data sets, and can realize stream mode to access data in file system. Thus, in training the action determination model, the stream of state transition samples converted into the data format matched by the action determination model is stored into hdfs, facilitating persistence and distributed training.

Optionally, the method further includes:

Therefore, the embodiment of the invention can also update the action determination model by the action instruction output by the action determination model and the feedback of the action instruction by the user, so that the accuracy of generating the action instruction is further improved, and the user experience is further improved.

When the online service is performed by using the action determining model, there is often a delay between the output time of the action instruction and the acquisition time of the feedback information, for example, when the action instruction is a push video and the feedback information is a view video, the user views the video at a long interval after determining that the video needs to be pushed for the user. Therefore, the corresponding feature information and the corresponding action command are required to be stored in the data queue according to the time interval between the adjacent feedback information, so that the corresponding relationship between the feature information and the action command and the feedback information can be determined according to the interval time of the feedback information and the delay time between two adjacent feature information in the data queue, the non-correspondence between the feature information and the action command and the feedback information is avoided, and the correctness of data is ensured.

In addition, in the process of updating the motion model by using the motion instruction output by the motion model and the feature information and the feedback information corresponding to the motion instruction, the feature information, the motion instruction and the feedback information need to be spliced to obtain a training sample, wherein one training sample comprises the feature information of one service, the motion instruction corresponding to the service and the feedback information of the function corresponding to the motion instruction by the user. After the training samples are obtained, state transition splicing can be performed on the training samples to obtain a state transition sample stream, then a time sequence difference method is adopted to train the state transition sample stream, and finally updating of the action determination model is achieved.

To sum up, a schematic diagram of a specific implementation of the system for determining action execution according to the embodiment of the present invention may be as shown in fig. 3, that is, the sender may be a client, and the feature invocation unit may serve as rpc (remote procedure invocation).

As shown in fig. 3, the online service and the reinforcement learning data stream can be divided into two parts.

Wherein, for the online service part: the client carries some necessary information to access rpc (remote procedure call) service, rpc service collects the characteristic information of the service, inputs the characteristic information into the action determination model, outputs the result, stores the result in the data queue and returns the result to the client.

For the reinforcement learning data stream portion: the method is mainly used for providing training data for a machine learning model, and the whole process comprises the following modules:

a client log module: for recording feedback information after the generation of the action command. For example, a push is sent to the user, whether the user clicks on behavior. And in the form of stream, a part of the log of the client is taken as a history behavior record. On the other hand, the feedback as the last action instruction, that is, the reward of reinforcement learning, is saved in the sample stitching unit.

Note that, when it is initially stored that the action determination model is not yet established, the client side feeds back information of an action command (for example, a behavior of popping up a prompt window at regular intervals, a page displayed when the application is started and exited last time, and the like) generated by the existing scheme. And along with the completion of the offline training of the subsequent action determination model, the client log module stores the feedback information of the function corresponding to the action instruction output by the user to the action determination model.

And (3) data queue: for storing characteristic information of the service. The data queue initially stores characteristic information of a service corresponding to an action command (for example, a behavior of popping up a prompt window at a predetermined time interval, a page displayed at the time of last exit when an application is started, or the like) generated by a conventional scheme when an action determination model is not yet established. With the completion of the off-line training of the model determined by the follow-up action, rpc serves to save a piece of feature information in a data queue manner when receiving the on-line request.

A sample splicing unit: and the characteristic information and the action instruction in the data queue are spliced with the feedback information in the client log module to form a training sample. In the aspect of storing the characteristic information, the characteristic information is stored in a data queue because the characteristic information always arrives earlier than the feedback information. And the characteristic information input into the action determination model is stored in a data queue, so that the consistency of the model training and the data of the model service is fundamentally ensured, and the possibility of data inconsistency at any stage is avoided from the engineering aspect.

A data stream unit: the method is used for splicing the training samples adjacent in time of the same user into a state transition, and the reinforcement learning model is conveniently trained in a time sequence difference mode. After this step is completed, a stream of state transition samples is formed.

It should be noted that when the action-determining model is not yet established, which is initially stored, the action command generated according to the conventional scheme, and the corresponding traffic characteristic information and feedback information are output from the data streaming unit, and therefore in this case, the purpose of training the state transition sample is to obtain the action-determining model. The specific training process can be described as follows.

A training module: the state transition sample flow is processed through a series of standardized processes, for example, the state transition sample flow is firstly stored into hive, then the state transition sample flow is converted into a format required by the action determination model and is stored into hdfs, and therefore the model is trained by using data read from hdfs. And importing the model into a model service after the training is finished.

And determining a model by the off-line trained action, and carrying out on-line service.

After that, along with the action determination model getting online, the action instruction output according to the action determination model, and the corresponding service characteristic information and feedback information thereof can be directly read from the data flow unit, so as to form a state transition sample flow, and the action determination model is real-time, so that the real-time updated model can be used for providing services to the outside.

In summary, in the system for determining action execution according to the embodiment of the present invention, the feature calling unit responds to the received access request with account information, and acquires feature information of a target service corresponding to the account information, so as to send the feature information to the action determining model, the action determining model outputs an action instruction corresponding to the target service, and returns the action instruction to the feature calling unit, so that the feature calling unit returns the action instruction to the sender of the access request, where the action determining model is obtained by training a training sample that executes feedback information of the action instruction according to feature information including multiple services, an action instruction corresponding to the feature information, and a corresponding account. Therefore, the system of the embodiment of the invention can generate the action instruction to indicate whether to execute the service action corresponding to the target service on the basis of training and learning the characteristic information and the action instruction of a plurality of services and the feedback information of the function corresponding to the execution action instruction of the user, so that the generated action instruction can better accord with the characteristic information of each service, the generated action instruction has higher accuracy, and the requirement of the user is better met, thereby improving the user experience. In addition, the embodiment of the invention can also update the action determination model by utilizing the action instruction output by the action determination model and the feedback information and the characteristic information corresponding to the action instruction, thereby further improving the accuracy of the generated action instruction result and further improving the user experience.

EXAMPLE III

Referring to fig. 4, a flowchart of steps of a method for determining execution of an action according to a fourth embodiment of the present invention is shown. The method for determining the execution of an action may comprise the steps of:

step 401: and responding to the received access request carrying the account information, and acquiring the characteristic information of the target service corresponding to the account information.

In the embodiment of the invention, the target service is a certain service which can be realized by the terminal, such as a service for watching a video by using a video application program, a service for searching information by using a browser, a service for switching a network by the terminal, and the like. It should be understood that the target service is only illustrated here, and the content included in the target service is not limited to the description here.

The feature information includes at least one of a user representation, a scene feature, a historical behavior, and a content feature. The specific description of these items of content included in the feature information is as follows:

Step 402: and inputting the characteristic information of the target service into a predetermined action determination model, and outputting an action instruction corresponding to the target service.

The action determination model is obtained by training a training sample of feedback information for executing the action instruction according to characteristic information containing multiple services, the action instruction corresponding to the characteristic information and a corresponding account.

The multiple services for obtaining the action determination model are services that can be implemented by the terminal, such as a service for watching a video using a video application program, a service for searching information using a browser, a service for switching networks by the terminal, and the like. It should be understood that the service is only illustrated here, and the content included in the service is not limited to the description here. The action instruction corresponding to the characteristic information is a control instruction for controlling the sender to execute the service corresponding to the characteristic information, so that the corresponding function is realized. The feedback information is feedback of the function realized by the sender after executing the action command.

acquiring feature information of a plurality of services, action instructions corresponding to the feature information and feedback information of corresponding accounts for executing the action instructions;

splicing the training samples into a state transition sample stream;

In the embodiment of the invention, the training action determination model needs mass data, namely needs the characteristic information and the action instruction of mass services and the feedback information of the function corresponding to the action instruction by the user, and is trained. In the training process, firstly, feature information, action instructions and feedback information of a service need to be spliced to obtain training samples, wherein one training sample comprises feature information of one service, action instructions corresponding to the service and feedback information of functions corresponding to the action instructions by a user.

Specifically, the data format of a training sample may be as follows:

and (3) action instructions: a dictionary-like structure, each entry corresponding to an action instruction or supplementary information given by the model;

Action command sequence number (epicode id): for identifying to which sequence of action instructions the training samples belong. For example, a sequence of motion instructions for a user within a day may be an epsilon. In this case, the user id may be referred to as the epadiode id.

For example, a first training sample, a second training sample, and a third training sample for user a are obtained, the three training samples having a temporal ordering of: the first training sample and the second training sample may be spliced into one state transition sample stream, and the second training sample and the third training sample may be spliced into one state transition sample stream.

Optionally, training the state transition sample stream includes:

storing the state transition sample stream into a data warehouse hive;

storing the sample to be trained in a distributed file system hdfs;

The data format matched with the motion determination model may be, for example, a TFRecord data format. The TFRecord uses a Protocol Buffer binary data coding scheme, only occupies one memory block, only needs a mode of loading a binary file once, is simple and quick, and is particularly friendly to large training data. When the amount of training data is large, the data can be divided into a plurality of TFRecord files to improve the processing efficiency.

Therefore, in the embodiment of the present invention, when the state transition sample stream is trained by using the time sequence difference method, a series of standardized processes are required, for example, the state transition sample stream is firstly stored into hive, then is converted into a format required by the action determination model and is stored into hdfs, so as to read the data on hdfs to train the action determination model. In the training process, the algorithm and the services are separated, and strong learning algorithms such as dqn (a deep state action value function network), ddpg (a depth determination strategy gradient network) and the like and corresponding evaluation indexes are supported on the algorithm level, so that the algorithm is universal to all services; on the service level, each service has its self-defined feature configuration and service index. Therefore, for a brand-new service, only the relevant part of the service needs to be defined, and the model can be trained.

In addition, hdfs is characterized by high fault tolerance and is designed to be deployed on inexpensive hardware. And it provides high throughput to access data of application programs, suitable for application programs with very large data sets, and can realize stream mode to access data in file system. Thus, in training the action determination model, the stream of state transition samples converted into the data format matched by the action determination model is stored into hdfs, facilitating persistence and distributed training.

Therefore, the embodiment of the invention can also update the action determination model by the action instruction output by the action determination model and the feedback of the action instruction by the user, so that the accuracy of generating the action instruction result is further improved, and the user experience is further improved.

When the online service is performed by using the action determining model, there is often a delay between the output time of the action instruction and the acquisition time of the feedback information, for example, the action instruction is a push video, and when the feedback information is a view video, the user often views the video after determining that the video needs to be pushed for the user. Therefore, the corresponding characteristic information and the corresponding action command are stored in the data queue according to the time interval between the adjacent feedback information, so that the corresponding relationship between the characteristic information and the action command and the feedback information can be determined according to the interval time of the feedback information and the delay time between two adjacent characteristic information in the data queue, the non-correspondence between the characteristic information and the action command and the feedback information is avoided, and the correctness of data is ensured.

Step 403: and sending the action instruction to a sender of the access request.

The operation state of a sender executing a service action according to an action instruction may change, for example, a target service is to watch a video by using a video application program, and the action instruction is to push the video, so that after receiving the action instruction, a terminal may display a pushed video list when refreshing a page.

In summary, in the method for determining action execution according to the embodiment of the present invention, by responding to a received access request carrying account information and acquiring feature information of a target service corresponding to the account information, the feature information is input to an action determination model, an action instruction corresponding to the target service is output, and the action instruction is returned to a sender of the access request, where the action determination model is obtained by training a training sample including feature information of multiple services, an action instruction corresponding to the feature information, and feedback information of the action instruction executed by a corresponding account. Therefore, the method of the embodiment of the invention can be established on the basis of training and learning the characteristic information and the action instruction of a plurality of services and the feedback information of the function corresponding to the execution action instruction of the user, and the action instruction is generated to indicate whether to execute the service action corresponding to the target service, so that the generated action instruction can better accord with the characteristic information of each service, the generated action instruction has higher accuracy, and the method can better accord with the requirement of the user, thereby improving the user experience.

Example four

Referring to fig. 5, a block diagram of a terminal according to a fourth embodiment of the present invention is shown.

The terminal of the embodiment of the invention can comprise: the computer program product comprises a memory, a processor and a program stored on the memory and executable on the processor for determining action execution, wherein the program for determining action execution realizes any one of the steps of the method for determining action execution in the invention when the program for determining action execution is executed by the processor.

Fig. 5 is a block diagram illustrating a terminal 500 according to an example embodiment. For example, the terminal 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, terminal 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, input/output (I/O) interface 512, sensor component 514, and communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the terminal 500. Examples of such data include instructions for any application or method operating on terminal 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power components 506 provide power to the various components of the terminal 500. The power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal 500.

The multimedia component 508 includes a screen providing an output interface between the terminal 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the terminal 500 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the terminal 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the terminal 500. For example, sensor assembly 514 can detect an open/closed state of terminal 500, the relative positioning of components, such as a display and keypad of terminal 500, the change in position of terminal 500 or a component of terminal 500, the presence or absence of user contact with terminal 500, the orientation or acceleration/deceleration of apparatus 500, and the change in temperature of terminal 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communications between the terminal 500 and other devices in a wired or wireless manner. The terminal 500 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing a method for determining the execution of an action, in particular a method for determining the execution of an action comprising:

sending the action instruction to a sender of the access request;

splicing the training samples into a state transition sample stream;

Optionally, training the state transition sample stream includes:

storing the state transition sample stream into a data warehouse hive;

storing the sample to be trained in a distributed file system hdfs;

acquiring an action instruction which is output by the action determination model and corresponds to the characteristic information;

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the terminal 500 to perform the above-described method for determining an action execution is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The instructions in the storage medium, when executed by a processor of the terminal, enable the terminal to perform the steps of any of the methods for determining the performance of an action described in the present invention.

According to the method for determining action execution, provided by the embodiment of the invention, the received access request carrying account information is responded, and the characteristic information of the target service corresponding to the account information is acquired, so that the characteristic information is input into an action determination model to output an action instruction corresponding to the target service and is returned to a sender of the access request, wherein the action determination model is obtained by training a training sample for executing feedback information of the action instruction according to the characteristic information containing a plurality of services, the action instruction corresponding to the characteristic information and the corresponding account. Therefore, the method of the embodiment of the invention can be established on the basis of training and learning the characteristic information and the action instruction of a plurality of services and the feedback information of the function corresponding to the action instruction executed by the user, and the action instruction is generated to indicate whether to execute the service action corresponding to the target service, so that the generated action instruction can better accord with the characteristic information of each service, the generated action instruction has higher accuracy, and the requirement of the user is better met, thereby improving the user experience.

For the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The approaches provided herein for determining the execution of an action are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The structure required to construct a system incorporating aspects of the present invention will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the scheme for determining the execution of an action according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A system for determining performance of an action, comprising:

the action determining model is obtained by training according to a plurality of state transition sample streams, one state transition sample stream comprises two training samples which belong to the same user and are adjacent in time, and one training sample comprises characteristic information of one service, an action instruction corresponding to the characteristic information and feedback information of a corresponding account for executing the action instruction;

further comprising:

the sample splicing unit is used for acquiring the characteristic information, the action instruction corresponding to the characteristic information and feedback information of the corresponding account for executing the action instruction, and splicing the feedback information into a training sample;

wherein, the process of splicing the characteristic information, the action instruction and the feedback information into the training sample by the sample splicing unit comprises:

2. The system for determining action execution of claim 1, further comprising: a data stream unit and a model training unit;

the sample splicing unit is further configured to send the spliced training samples to the data stream unit;

3. The system for determining action execution according to claim 2, wherein the data stream unit comprises, in the process of splicing the training samples into a state transition sample stream:

4. The system for determining action execution according to claim 2, wherein the process of training the state transition sample stream by the model training unit comprises:

5. The system for determining action execution according to claim 4, wherein the model training unit adopts a time-series difference method, and the process of training the state transition sample stream comprises:

storing the state transition sample stream into a data warehouse hive;

storing the sample to be trained in a distributed file system hdfs;

6. A method for determining performance of an action, comprising:

responding to a received access request carrying account information, and collecting characteristic information of a target service corresponding to the account information;

inputting the characteristic information of the target service into a predetermined action determining model, and outputting an action instruction corresponding to the target service;

sending the action instruction to a sender of the access request;

the motion determination model is obtained by training the following process:

training according to the training sample to obtain the action determination model;

wherein, obtaining the action instruction corresponding to the feature information comprises:

and splicing the characteristic information, the action instruction and the feedback information into the training sample according to the time interval between two adjacent pieces of feedback information and the stored delay time between two adjacent pieces of characteristic information.

7. The method for determining the performance of a motion according to claim 6, wherein training the motion determination model based on the training samples comprises:

splicing the training samples into a state transition sample stream;

8. The method for determining action execution of claim 7, wherein said stitching the training samples into a stream of state transition samples comprises:

9. The method for determining action execution of claim 7, wherein training the state transition sample stream comprises:

10. The method for determining action execution according to claim 9, wherein the training the state transition sample stream using a timing difference method comprises:

storing the state transition sample stream into a data warehouse hive;

storing the sample to be trained in a distributed file system hdfs;

11. A terminal, comprising: memory, processor and program for determining the execution of an action stored on the memory and executable on the processor, which when executed by the processor implements the steps of a method for determining the execution of an action as claimed in any one of claims 6 to 10.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a program for determining the execution of an action, which when executed by a processor implements the steps of the method for determining the execution of an action according to any one of claims 6 to 10.