CN111651226A - Virtual button sorting method and device, electronic equipment and storage medium - Google Patents

Virtual button sorting method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111651226A
CN111651226A CN202010357419.8A CN202010357419A CN111651226A CN 111651226 A CN111651226 A CN 111651226A CN 202010357419 A CN202010357419 A CN 202010357419A CN 111651226 A CN111651226 A CN 111651226A
Authority
CN
China
Prior art keywords
sorting
weight
sequencing
model
weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010357419.8A
Other languages
Chinese (zh)
Other versions
CN111651226B (en
Inventor
徐佳威
黄璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010357419.8A priority Critical patent/CN111651226B/en
Publication of CN111651226A publication Critical patent/CN111651226A/en
Application granted granted Critical
Publication of CN111651226B publication Critical patent/CN111651226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A virtual button sorting method, comprising: acquiring environmental characteristic data and acquiring a plurality of groups of sorting weights of virtual buttons of a target application program; determining the current environment state according to the environment characteristic data; acquiring the priority corresponding to each group of sorting weights in the current environment state; determining a first ranking weight from the plurality of sets of ranking weights according to the priority; acquiring first operation data corresponding to the first sequencing weight; determining a target reward value corresponding to the first sequencing weight according to the first operation data; according to the target reward value and the preset reward value, the priority of the first sequencing weight in the pre-trained sequencing model is adjusted, and the target sequencing weight is determined from multiple groups of sequencing weights through the adjusted sequencing model; and sorting the virtual buttons according to the target sorting weight. The invention also comprises a virtual button sequencing device, electronic equipment and a medium. The invention can improve the sequencing effect of the virtual buttons. In addition, the application also relates to an artificial intelligence technology.

Description

Virtual button sorting method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of intelligent terminals, in particular to a virtual button sorting method and device, electronic equipment and a storage medium.
Background
Currently, virtual buttons of an Application (APP) can be ordered by an ordering algorithm. However, in practice, it is found that the sorting algorithm does not consider the consistency of the operation behaviors of the virtual buttons and the time series characteristics, so that the sorting effect of the virtual buttons is poor.
Therefore, how to improve the sorting effect of the virtual buttons is a technical problem which needs to be solved urgently.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a virtual button sorting method, device, electronic device and storage medium, which can improve the sorting effect of virtual buttons.
A first aspect of the present invention provides a method of virtual button sorting, the method comprising:
acquiring environmental characteristic data of a target application program, and acquiring a plurality of groups of sorting weights of virtual buttons of the target application program, wherein each group of sorting weights comprises one sorting weight of each virtual button;
determining the current environment state according to the environment characteristic data;
acquiring the priority corresponding to each group of the sorting weight in the current environment state;
determining a first ranking weight from the plurality of sets of ranking weights according to the priority;
acquiring first operation data corresponding to the first sequencing weight, wherein the first operation data refer to the number of times of clicking and the service time of the virtual buttons after the virtual buttons are sequenced according to the first sequencing weight;
determining a target reward value corresponding to the first sequencing weight according to the first operation data, wherein the higher the target reward value is, the better the sequencing effect of the virtual buttons in sequencing according to the first sequencing weight is indicated;
according to the target reward value and a preset reward value, adjusting the priority of the first sequencing weight in a pre-trained sequencing model;
determining a target sorting weight from the plurality of sets of sorting weights through the adjusted sorting model;
and sorting the virtual buttons according to the target sorting weight.
In one possible implementation, before the obtaining the environmental characteristic data of the target application and obtaining the plurality of sets of sorting weights of the virtual buttons of the target application, the method further includes:
acquiring historical operation data corresponding to the virtual button;
determining an active weight of the virtual button according to the historical operation data;
and sorting the virtual buttons according to the active weights to obtain a first sorting order.
In one possible implementation manner, after the sorting all the virtual buttons according to the active weights and obtaining a first sorting order, the method further includes:
acquiring multiple groups of random weights of the virtual buttons;
determining a plurality of second arrangement orders of the virtual buttons according to the plurality of groups of random weights and the active weights;
acquiring second operation data corresponding to the second arrangement sequence;
determining a first prize value for each of the second ranking orders based on the second operational data;
judging whether the first reward value is larger than a preset reward value threshold value or not;
if the first reward value is larger than a preset reward value threshold value, initializing parameters of the ranking model to be trained according to the second ranking order to obtain an initial ranking model, wherein the initial ranking model comprises an action value function.
In a possible implementation manner, after initializing parameters of the ranking model to be trained according to the second ranking order and obtaining an initial ranking model, the method further includes:
acquiring third operation data, wherein the third operation data is the number of times the virtual buttons are clicked and the service time after the virtual buttons are sorted according to the sorting weight output by the initial sorting model;
and training the initial ranking model according to a second reward value corresponding to the third operation data to obtain the trained ranking model.
In a possible implementation manner, the adjusting the priority of the first ranking weight in the pre-trained ranking model according to the target reward value and a preset reward value includes:
judging whether the target reward value is larger than a preset reward value or not;
if the target reward value is larger than a preset reward value, upgrading and adjusting the priority of the first sequencing weight in the sequencing model; or
And if the target reward value is smaller than or equal to a preset reward value, performing degradation adjustment on the priority of the first sequencing weight in the sequencing model.
In one possible implementation, the method further includes:
resetting the first parameter of the trained sequencing model every other preset time period to obtain a first pre-training model;
and retraining the first pre-training model to obtain a first sequencing model so as to reorder the virtual buttons through the first sequencing model.
In one possible implementation, the method further includes:
receiving a feedback value for the trained ranking model;
judging whether the feedback value is smaller than a preset feedback value threshold value or not;
if the feedback value is smaller than a preset feedback value threshold value, resetting a second parameter of the trained sequencing model to obtain a second pre-training model;
and retraining the second pre-training model to obtain a second sequencing model.
A second aspect of the present invention provides a virtual button sorting apparatus, the apparatus comprising:
the system comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring environmental characteristic data of a target application program and acquiring a plurality of groups of sorting weights of virtual buttons of the target application program, and each group of sorting weights comprises one sorting weight of each virtual button;
the determining module is used for determining the current environment state according to the environment characteristic data;
the obtaining module is further configured to obtain a priority corresponding to each group of the ranking weights in the current environment state;
the determining module is further configured to determine a first sorting weight from the plurality of sets of sorting weights according to the priority;
the obtaining module is further configured to obtain first operation data corresponding to the first sorting weight, where the first operation data refers to the number of times that the virtual button is clicked and the service time after being sorted according to the first sorting weight;
the determining module is further configured to determine a target reward value corresponding to the first sorting weight according to the first operation data, wherein the higher the target reward value is, the better the sorting effect of the virtual buttons sorted according to the first sorting weight is indicated;
the adjusting module is used for adjusting the priority of the first sequencing weight in the pre-trained sequencing model according to the target reward value and a preset reward value;
the determining module is further configured to determine a target ranking weight from the plurality of sets of ranking weights through the adjusted ranking model;
and the sorting module is used for sorting the virtual buttons according to the target sorting weight.
A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the virtual button sorting method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the virtual button sorting method.
By the technical scheme, different environment states of the target application program can be determined according to different environment characteristic data of the target application program, the priorities of the plurality of groups of sequencing weights corresponding to the different environment states are different, namely the sequencing of the virtual buttons is different under different use environments, and the priorities of the sequencing weights can be adjusted according to the current operation data so as to continuously update the sequencing model, so that the sequencing model can always output the optimal sequencing, and the sequencing effect of the virtual buttons is improved.
Drawings
FIG. 1 is a flowchart illustrating a method for sorting virtual buttons according to a preferred embodiment of the present invention.
FIG. 2 is a functional block diagram of a preferred embodiment of a virtual button sorting apparatus according to the present disclosure.
FIG. 3 is a schematic structural diagram of an electronic device implementing a virtual button sorting method according to a preferred embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The virtual button sorting method provided by the embodiment of the invention is applied to electronic equipment, and can also be applied to a hardware environment formed by the electronic equipment and a server connected with the electronic equipment through a network, and the server and the electronic equipment are jointly executed. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network.
A server may refer to a computer system that provides services to other devices (e.g., electronic devices) in a network. A personal computer may also be called a server if it can externally provide a File Transfer Protocol (FTP) service. In a narrow sense, a server refers to a high-performance computer, which can provide services to the outside through a network, and compared with a common personal computer, the server has higher requirements on stability, security, performance and the like, and therefore, hardware such as a CPU, a chipset, a memory, a disk system, a network and the like is different from that of the common personal computer.
The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network device, a server group consisting of a plurality of network devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network devices, wherein the Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.
Referring to fig. 1, fig. 1 is a flowchart illustrating a virtual button sorting method according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted. The execution subject of the virtual button sorting method may be an electronic device.
S11, obtaining environment characteristic data of the target application program, and obtaining a plurality of groups of sorting weights of the virtual buttons of the target application program, wherein each group of sorting weights comprises one sorting weight of each virtual button.
The environment characteristic data may include user basic information, virtual button scheduling click data, usage time of a function corresponding to the virtual button, and the like (i.e., historical behavior information).
Wherein the plurality of sets of ordering weights may be used to represent various ordering orders. That is, the virtual buttons may be sorted according to their corresponding weights, and the virtual buttons with larger sorting weights may be sorted in the front. Wherein a set of ordering weights can be represented by a set of vectors, such as U (U1, U2. cndot.).
In the embodiment of the invention, the virtual buttons have a plurality of groups of arrangement sequences, and the arrangement sequences are stored in a sequencing weight mode.
As an optional implementation, before the obtaining the environmental characteristic data of the target application, and obtaining the multiple sets of sorting weights of the virtual buttons of the target application, obtaining the environmental characteristic data, and obtaining the multiple sets of sorting weights, the method may further include:
acquiring historical operation data corresponding to the virtual button;
determining an active weight of the virtual button according to the historical operation data;
and sorting the virtual buttons according to the active weights to obtain a first sorting order.
In this alternative embodiment, an active weight may be set for each virtual button according to the historical operation data corresponding to the virtual button, and the more used virtual buttons correspond to the greater active weight. Then, the virtual buttons with large active weights are arranged in front, so that the sorting effect can be improved.
As an optional implementation, the method further comprises:
judging whether a repeated arrangement sequence exists in the first arrangement sequence;
and if the first arrangement sequence has repeated arrangement sequences, deleting the repeated arrangement sequences.
In this optional embodiment, because different users have different habits and correspond to different historical operation data, a plurality of first arrangement sequences are obtained, so that repeated first arrangement sequences can be deleted, and the time for later reinforcement learning can be reduced.
As an optional implementation, after all the virtual buttons are sorted according to the active weight and a first sorting order is obtained, the method further includes:
acquiring multiple groups of random weights of the virtual buttons;
determining a plurality of second arrangement orders of the virtual buttons according to the plurality of groups of random weights and the active weights;
acquiring second operation data corresponding to the second arrangement sequence;
determining a first prize value for each of the second ranking orders based on the second operational data;
judging whether the first reward value is larger than a preset reward value threshold value or not;
if the first reward value is larger than a preset reward value threshold value, initializing parameters of the ranking model to be trained according to the second ranking order to obtain an initial ranking model, wherein the initial ranking model comprises an action value function.
The second operation data may be data of the number of times the virtual button is clicked, the time when the function corresponding to the virtual button is used, and the like, in the second arrangement order of the virtual button.
The ranking model can determine the action with the highest income under a certain state by using an action cost function; i.e. determining the optimal ordering of the virtual buttons in the current context state of the target application such that the prize value earned is highest in this optimal ordering.
The action cost function is also called as a Q function, and the action cost function is mainly to construct a table (Q-table) with the environment states and the ranking weights, and store the priority of each group of ranking weights in each environment state. A set of ranking weights for the maximum prize value earnings may be determined according to the priority and the virtual buttons may then be ranked according to the set of ranking weights.
Wherein, assuming that the rewarding decay coefficient is gamma, the learning rate is alpha, and the update equation of the action value function is as follows:
NewQ(st,at)=Q(st,at)+α[rt+1+γmaxQ(st+1,at)-Q(st,at)]。
in the embodiments of the present invention, for example: if the virtual button is clicked once, the reward value is increased by 5, if the function corresponding to the virtual button is completed once, the reward value is increased by 20, the function use time is converted into the reward value according to a preset proportion, for example, the use time is 10 minutes, and the converted reward value is 2 according to a proportion of 20%. If the first reward value is larger than a preset reward value threshold value, the second arrangement sequence is a better sequence, and parameters of the sequence model to be trained can be initialized according to the second arrangement sequence to obtain an initial sequence model.
As an optional implementation manner, according to the second arrangement order, after initializing parameters of the ranking model to be trained and obtaining an initial ranking model, the method further includes:
acquiring third operation data, wherein the third operation data is the number of times the virtual buttons are clicked and the service time after the virtual buttons are sorted according to the sorting weight output by the initial sorting model;
and training the initial ranking model according to a second reward value corresponding to the third operation data to obtain a pre-trained ranking model.
The third operation data may be operation data of the target application program, which is obtained by the user after the virtual button is sorted according to the sorting weight output by the initial sorting model, and includes, but is not limited to, the number of times the virtual button is clicked, the time when the function corresponding to the virtual button is used, and the like.
In this optional embodiment, based on machine learning and model training technologies in the field of artificial intelligence, parameters of the initial ranking model may be trained according to the second reward value, priority parameters of the ranking in the initial ranking model may be adjusted, if the reward value is high, parameters of the model with priority of the second ranking may be increased, and if the reward value is low, parameters of the model with priority of the second ranking may be decreased.
And S12, determining the current environment state according to the environment characteristic data.
Wherein the current environmental state (S) includes user basic information and historical behavior information.
And S13, acquiring the priority corresponding to the sorting weight in each group in the current environment state.
Wherein, the same group of sorting weights can correspond to different priorities in different environment states.
And S14, determining a first sorting weight from the plurality of groups of sorting weights according to the priority.
In this embodiment of the present invention, the ranking weight with the highest priority may be determined as the first ranking weight from the plurality of sets of ranking weights, or a set of ranking weights may be randomly selected from the plurality of sets of ranking weights as the first ranking weight, where the higher the priority, the higher the random selection probability corresponding to the ranking weight is. Therefore, the button sequencing of the operation habits of the user can be ensured in most cases, but the operation habits of the user can be changed, a random button sequencing is output occasionally, the change data of the operation of the user can be acquired more quickly, and the parameters can be adjusted more quickly to adapt.
S15, acquiring first operation data corresponding to the first sequencing weight, wherein the first operation data refer to the number of times of clicks and the service time of the virtual buttons after the virtual buttons are sequenced according to the first sequencing weight.
The first operation data may be operation data of the target application program, which is obtained by the user after the virtual buttons are sorted according to the first sorting weight, and includes, but is not limited to, the number of times the virtual buttons are clicked, the time when the function corresponding to the virtual buttons is used, and the like.
S16, determining a target reward value corresponding to the first sorting weight according to the first operation data, wherein the higher the target reward value is, the better the sorting effect of the virtual buttons sorted according to the first sorting weight is indicated.
In the embodiments of the present invention, for example: if the virtual button is clicked once, the reward value is increased by 5, if the function corresponding to the virtual button is completed once, the reward value is increased by 20, the function use time is converted into the reward value according to a preset proportion, for example, the use time is 10 minutes, and the converted reward value is 2 according to a proportion of 20%.
S17, adjusting the priority of the first sequencing weight in the pre-trained sequencing model according to the target reward value and a preset reward value.
As an optional implementation manner, the adjusting the priority of the first ranking weight in the pre-trained ranking model according to the target reward value and the preset reward value includes:
judging whether the target reward value is larger than a preset reward value or not;
if the target reward value is larger than a preset reward value, upgrading and adjusting the priority of the first sequencing weight in the sequencing model; or
And if the target reward value is smaller than or equal to a preset reward value, performing degradation adjustment on the priority of the first sequencing weight in the sequencing model.
In this optional embodiment, if the target bonus value is greater than the preset bonus value, it indicates that the first sorting weight is better, the priority of the first sorting weight may be increased, and if the target bonus value is less than or equal to the preset bonus value, it indicates that the first sorting weight is not good enough, the priority of the first sorting weight may be decreased.
And S18, determining a target sorting weight from the plurality of groups of sorting weights through the adjusted sorting model.
Wherein the ranking model outputs the ranking weights probabilistically according to different priorities of different ranking weights, such as: if the priority of the a-ranking weight is 7, the a-ranking weight may be output with a 70% probability.
And S19, sorting the virtual buttons according to the target sorting weight.
Optionally, a set of importance vectors W (W1, W2, …) may be set according to the importance of the virtual buttons, and assuming that the vector corresponding to the target sorting weight is U, an inner product of the vector W and the vector U may be calculated to obtain an inner product weight, and sorting is performed according to the size of the inner product weight corresponding to each virtual button from large to small. And if the importance vectors are not set, directly sequencing the virtual buttons according to the arrangement sequence corresponding to the target sequencing weight.
As an optional implementation, the method further comprises:
resetting the first parameter of the trained sequencing model every other preset time period to obtain a first pre-training model;
and retraining the first pre-training model to obtain a first sequencing model so as to reorder the virtual buttons through the first sequencing model.
In this alternative embodiment, because the function requirements may be different in different time periods, that is, the virtual buttons used may be different, the ranking model may be retrained periodically according to the latest usage data, some parameters of the ranking model may be retained, and only another part of the parameters need to be adjusted, so that the retraining speed is faster.
As an optional implementation, the method further comprises:
receiving a feedback value for the trained ranking model;
judging whether the feedback value is smaller than a preset feedback value threshold value or not;
if the feedback value is smaller than a preset feedback value threshold value, resetting a second parameter of the trained sequencing model to obtain a second pre-training model;
and retraining the second pre-training model to obtain a second sequencing model.
In this alternative embodiment, as the group of users changes or the operating habits of the users change, the order of the virtual buttons output by the ranking model may be better before and not needed by the users at the back, so that a part of the parameters of the ranking model need to be reset according to the feedback of the user operation or at intervals, and the latest ranking model (the second ranking model) can be obtained faster by retraining while keeping part of the parameters.
In the method flow described in fig. 1, different environment states of the target application program may be determined according to different environment feature data of the target application program, and priorities of multiple sets of sorting weights corresponding to the different environment states are different, that is, in different usage environments, the sorting of the virtual buttons is different, and the priorities of the sorting weights may be adjusted according to the current operation data to continuously update the sorting model, so that the sorting model may always output an optimal sorting, and the sorting effect of the virtual buttons is improved.
Referring to fig. 2, fig. 2 is a functional block diagram of a virtual button sorting apparatus according to a preferred embodiment of the present invention.
In some embodiments, the virtual button sorting apparatus operates in an electronic device. The virtual button sorting apparatus may include a plurality of functional modules composed of program code segments. Program code for various program segments in the virtual button sorting apparatus may be stored in a memory and executed by at least one processor to perform some or all of the steps of the virtual button sorting method described in fig. 1.
In this embodiment, the virtual button sorting apparatus may be divided into a plurality of functional modules according to the functions executed by the virtual button sorting apparatus. The functional module may include: an acquisition module 201, a determination module 202, an adjustment module 203, and a sorting module 204. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.
The obtaining module 201 is configured to obtain environment feature data of a target application program, and obtain a plurality of sets of sorting weights of virtual buttons of the target application program, where each set of sorting weights includes one sorting weight of each virtual button.
The environment characteristic data may include user basic information, virtual button scheduling click data, usage time of a function corresponding to the virtual button, and the like (i.e., historical behavior information).
Wherein the plurality of sets of ordering weights may be used to represent various ordering orders. That is, the virtual buttons may be sorted according to their corresponding weights, and the virtual buttons with larger sorting weights may be sorted in the front. Wherein a set of ordering weights can be represented by a set of vectors, such as U (U1, U2. cndot.).
In the embodiment of the invention, the virtual buttons have a plurality of groups of arrangement sequences, and the arrangement sequences are stored in a sequencing weight mode.
A determining module 202, configured to determine a current environment state according to the environment feature data.
Wherein the current environmental state (S) includes user basic information and historical behavior information.
The obtaining module 201 is further configured to obtain a priority corresponding to each group of the ranking weights in the current environment state.
Wherein, the same group of sorting weights can correspond to different priorities in different environment states.
The determining module 202 is further configured to determine a first sorting weight from the plurality of sets of sorting weights according to the priority.
In this embodiment of the present invention, the ranking weight with the highest priority may be determined as the first ranking weight from the plurality of sets of ranking weights, or a set of ranking weights may be randomly selected from the plurality of sets of ranking weights as the first ranking weight, where the higher the priority, the higher the random selection probability corresponding to the ranking weight is. Therefore, the button sequencing of the operation habits of the user can be ensured in most cases, but the operation habits of the user can be changed, a random button sequencing is output occasionally, the change data of the operation of the user can be acquired more quickly, and the parameters can be adjusted more quickly to adapt.
The obtaining module 201 is further configured to obtain first operation data corresponding to the first sorting weight, where the first operation data refers to the number of times that the virtual button is clicked and the usage time after being sorted according to the first sorting weight.
The first operation data may be operation data of the target application program, which is obtained by the user after the virtual buttons are sorted according to the first sorting weight, and includes, but is not limited to, the number of times the virtual buttons are clicked, the time when the function corresponding to the virtual buttons is used, and the like.
The determining module 202 is further configured to determine a target reward value corresponding to the first sorting weight according to the first operation data, where the higher the target reward value is, the better the sorting effect of the virtual buttons sorted according to the first sorting weight is.
In the embodiments of the present invention, for example: if the virtual button is clicked once, the reward value is increased by 5, if the function corresponding to the virtual button is completed once, the reward value is increased by 20, the function use time is converted into the reward value according to a preset proportion, for example, the use time is 10 minutes, and the converted reward value is 2 according to a proportion of 20%.
And the adjusting module 203 is configured to adjust the priority of the first ranking weight in the pre-trained ranking model according to the target reward value and a preset reward value.
The determining module 202 is further configured to determine a target ranking weight from the plurality of sets of ranking weights through the adjusted ranking model.
Wherein the ranking model outputs the ranking weights probabilistically according to different priorities of different ranking weights, such as: if the priority of the a-ranking weight is 7, the a-ranking weight may be output with a 70% probability.
A sorting module 204, configured to sort the virtual buttons according to the target sorting weight.
Optionally, a set of importance vectors W (W1, W2, …) may be set according to the importance of the virtual buttons, and assuming that the vector corresponding to the target sorting weight is U, an inner product of the vector W and the vector U may be calculated to obtain an inner product weight, and sorting is performed according to the size of the inner product weight corresponding to each virtual button from large to small. And if the importance vectors are not set, directly sequencing the virtual buttons according to the arrangement sequence corresponding to the target sequencing weight.
As an optional implementation manner, the obtaining module 201 is further configured to obtain historical operation data corresponding to the virtual button;
the determining module 202 is further configured to determine an active weight of the virtual button according to the historical operation data;
the sorting module 204 is further configured to sort the virtual buttons according to the active weights, so as to obtain a first sorting order.
In this alternative embodiment, an active weight may be set for each virtual button according to the historical operation data corresponding to the virtual button, and the more used virtual buttons correspond to the greater active weight. Then, the virtual buttons with large active weights are arranged in front, so that the sorting effect can be improved.
As an optional implementation manner, the obtaining module 201 is further configured to, by the sorting module 204, sort all the virtual buttons according to the active weights, and obtain multiple sets of random weights of the virtual buttons after obtaining a first sorting order;
the determining module 202 is further configured to determine a plurality of second arrangement orders of the virtual buttons according to the plurality of sets of random weights and the active weights;
the obtaining module 201 is further configured to obtain second operation data corresponding to the second arrangement order;
the determining module 202 is further configured to determine, according to the second operation data, a first prize value of each of the second permutation sequences;
the virtual button sorting apparatus may further include:
the first judgment module is used for judging whether the first reward value is larger than a preset reward value threshold value or not;
and the initialization module is used for initializing the parameters of the sequencing model to be trained according to the second arrangement sequence to obtain an initial sequencing model if the first reward value is larger than a preset reward value threshold, wherein the initial sequencing model comprises an action value function.
The second operation data may be data of the number of times the virtual button is clicked, the time when the function corresponding to the virtual button is used, and the like, in the second arrangement order of the virtual button.
The ranking model can determine the action with the highest income under a certain state by using an action cost function; i.e. determining the optimal ordering of the virtual buttons in the current context state of the target application such that the prize value earned is highest in this optimal ordering.
The action cost function is also called as a Q function, and the action cost function is mainly to construct a table (Q-table) with the environment states and the ranking weights, and store the priority of each group of ranking weights in each environment state. A set of ranking weights for the maximum prize value earnings may be determined according to the priority and the virtual buttons may then be ranked according to the set of ranking weights.
Wherein, assuming that the rewarding decay coefficient is gamma, the learning rate is alpha, and the update equation of the action value function is as follows:
NewQ(st,at)=Q(st,at)+α[rt+1+γmaxQ(st+1,at)-Q(st,at)]。
in the embodiments of the present invention, for example: if the virtual button is clicked once, the reward value is increased by 5, if the function corresponding to the virtual button is completed once, the reward value is increased by 20, the function use time is converted into the reward value according to a preset proportion, for example, the use time is 10 minutes, and the converted reward value is 2 according to a proportion of 20%. If the first reward value is larger than a preset reward value threshold value, the second arrangement sequence is a better sequence, and parameters of the sequence model to be trained can be initialized according to the second arrangement sequence to obtain an initial sequence model.
As an optional implementation manner, the obtaining module 201 is further configured to initialize parameters of a ranking model to be trained according to the second ranking order by the initializing module, and obtain third operation data after obtaining the initial ranking model, where the third operation data is the number of times that the virtual buttons are clicked and the use time after being ranked according to the ranking weight output by the initial ranking model;
the virtual button sorting apparatus may further include:
and the first training module is used for training the initial ranking model according to a second reward value corresponding to the third operation data to obtain the trained ranking model.
The third operation data may be operation data of the target application program, which is obtained by the user after the virtual button is sorted according to the sorting weight output by the initial sorting model, and includes, but is not limited to, the number of times the virtual button is clicked, the time when the function corresponding to the virtual button is used, and the like.
In this alternative embodiment, the parameters of the initial ranking model may be trained according to the second reward value, the priority parameter of the ranking in the initial ranking model may be adjusted, if the reward value is high, the parameters of the model with the second ranking may be increased, and if the reward value is low, the parameters of the model with the second ranking may be decreased.
As an optional implementation manner, the manner of adjusting the priority of the first ranking weight in the pre-trained ranking model by the adjusting module 203 according to the target reward value and the preset reward value is specifically:
judging whether the target reward value is larger than a preset reward value or not;
if the target reward value is larger than a preset reward value, upgrading and adjusting the priority of the first sequencing weight in the sequencing model; or
And if the target reward value is smaller than or equal to a preset reward value, performing degradation adjustment on the priority of the first sequencing weight in the sequencing model.
In this optional embodiment, if the target bonus value is greater than the preset bonus value, it indicates that the first sorting weight is better, the priority of the first sorting weight may be increased, and if the target bonus value is less than or equal to the preset bonus value, it indicates that the first sorting weight is not good enough, the priority of the first sorting weight may be decreased.
As an optional implementation, the virtual button sorting apparatus may further include:
the first resetting module is used for resetting the first parameter of the trained sequencing model every other preset time period to obtain a first pre-training model;
and the second training module is used for retraining the first pre-training model to obtain a first sequencing model so as to reorder the virtual buttons through the first sequencing model.
In this alternative embodiment, because the function requirements may be different in different time periods, that is, the virtual buttons used may be different, the ranking model may be retrained periodically according to the latest usage data, some parameters of the ranking model may be retained, and only another part of the parameters need to be adjusted, so that the retraining speed is faster.
As an optional implementation, the virtual button sorting apparatus may further include:
a receiving module, configured to receive a feedback value for the trained ranking model;
the second judgment module is used for judging whether the feedback value is smaller than a preset feedback value threshold value or not;
the second resetting module is used for resetting a second parameter of the trained sequencing model to obtain a second pre-training model if the feedback value is smaller than a preset feedback value threshold;
and the third training module is used for retraining the second pre-training model to obtain a second sequencing model.
In this alternative embodiment, as the group of users changes or the operating habits of the users change, the order of the virtual buttons output by the ranking model may be better before and not needed by the users at the back, so that a part of the parameters of the ranking model need to be reset according to the feedback of the user operation or at intervals, and the latest ranking model (the second ranking model) can be obtained faster by retraining while keeping part of the parameters.
In the virtual button sorting apparatus described in fig. 2, different environment states of the target application program may be determined according to different environment characteristic data of the target application program, and priorities of multiple sets of sorting weights corresponding to the different environment states are different, that is, in different usage environments, the virtual buttons may be sorted differently, and the priorities of the sorting weights may be adjusted according to the current operation data to continuously update the sorting model, so that the sorting model may always output an optimal sorting, and the sorting effect of the virtual buttons is improved.
As shown in fig. 3, fig. 3 is a schematic structural diagram of an electronic device implementing a virtual button sorting method according to a preferred embodiment of the invention. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.
The electronic device 3 may also include, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like. The Network where the electronic device 3 is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a discrete hardware component, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. In addition, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), at least one disk storage device, a flash memory device, and the like.
With reference to fig. 1, the memory 31 of the electronic device 3 stores a plurality of instructions to implement a virtual button ordering method, and the processor 32 executes the plurality of instructions to implement:
acquiring environmental characteristic data of a target application program, and acquiring a plurality of groups of sorting weights of virtual buttons of the target application program, wherein each group of sorting weights comprises one sorting weight of each virtual button;
determining the current environment state according to the environment characteristic data;
acquiring the priority corresponding to each group of the sorting weight in the current environment state;
determining a first ranking weight from the plurality of sets of ranking weights according to the priority;
acquiring first operation data corresponding to the first sequencing weight, wherein the first operation data refer to the number of times of clicking and the service time of the virtual buttons after the virtual buttons are sequenced according to the first sequencing weight;
determining a target reward value corresponding to the first sequencing weight according to the first operation data, wherein the higher the target reward value is, the better the sequencing effect of the virtual buttons in sequencing according to the first sequencing weight is indicated;
according to the target reward value and a preset reward value, adjusting the priority of the first sequencing weight in a pre-trained sequencing model;
determining a target sorting weight from the plurality of sets of sorting weights through the adjusted sorting model;
and sorting the virtual buttons according to the target sorting weight.
Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the electronic device 3 depicted in fig. 3, different environment states of the target application program may be determined according to different environment feature data of the target application program, and priorities of multiple sets of sorting weights corresponding to the different environment states are different, that is, in different usage environments, the virtual buttons are sorted differently, and the priorities of the sorting weights may be adjusted according to the current operation data to continuously update the sorting model, so that the sorting model may always output an optimal sorting, and the sorting effect of the virtual buttons is improved.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program code may be in source code form, object code form, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A virtual button sorting method, comprising:
acquiring environmental characteristic data of a target application program, and acquiring a plurality of groups of sorting weights of virtual buttons of the target application program, wherein each group of sorting weights comprises one sorting weight of each virtual button;
determining the current environment state according to the environment characteristic data;
acquiring the priority corresponding to each group of the sorting weight in the current environment state;
determining a first ranking weight from the plurality of sets of ranking weights according to the priority;
acquiring first operation data corresponding to the first sequencing weight, wherein the first operation data refer to the number of times of clicking and the service time of the virtual buttons after the virtual buttons are sequenced according to the first sequencing weight;
determining a target reward value corresponding to the first sequencing weight according to the first operation data, wherein the higher the target reward value is, the better the sequencing effect of the virtual buttons in sequencing according to the first sequencing weight is indicated;
according to the target reward value and a preset reward value, adjusting the priority of the first sequencing weight in a pre-trained sequencing model;
determining a target sorting weight from the plurality of sets of sorting weights through the adjusted sorting model;
and sorting the virtual buttons according to the target sorting weight.
2. The virtual button sorting method according to claim 1, wherein before the obtaining the environmental characteristic data of the target application and the obtaining the plurality of sets of sorting weights for the virtual buttons of the target application, the virtual button sorting method further comprises:
acquiring historical operation data corresponding to the virtual button;
determining an active weight of the virtual button according to the historical operation data;
and sorting the virtual buttons according to the active weights to obtain a first sorting order.
3. The virtual button sorting method according to claim 2, wherein said sorting all the virtual buttons according to the active weights, after obtaining a first sorting order, further comprises:
acquiring multiple groups of random weights of the virtual buttons;
determining a plurality of second arrangement orders of the virtual buttons according to the plurality of groups of random weights and the active weights;
acquiring second operation data corresponding to the second arrangement sequence;
determining a first prize value for each of the second ranking orders based on the second operational data;
judging whether the first reward value is larger than a preset reward value threshold value or not;
if the first reward value is larger than a preset reward value threshold value, initializing parameters of the ranking model to be trained according to the second ranking order to obtain an initial ranking model, wherein the initial ranking model comprises an action value function.
4. The virtual button sorting method according to claim 3, wherein after initializing parameters of the sorting model to be trained according to the second arrangement order and obtaining an initial sorting model, the virtual button sorting method further comprises:
acquiring third operation data, wherein the third operation data is the number of times the virtual buttons are clicked and the service time after the virtual buttons are sorted according to the sorting weight output by the initial sorting model;
and training the initial ranking model according to a second reward value corresponding to the third operation data to obtain the trained ranking model.
5. The virtual button sorting method according to any one of claims 1 to 4, wherein the adjusting the priority of the first sorting weight in a pre-trained sorting model according to the target bonus value and a preset bonus value comprises:
judging whether the target reward value is larger than a preset reward value or not;
if the target reward value is larger than a preset reward value, upgrading and adjusting the priority of the first sequencing weight in the sequencing model; or
And if the target reward value is smaller than or equal to a preset reward value, performing degradation adjustment on the priority of the first sequencing weight in the sequencing model.
6. The virtual button sorting method according to any one of claims 1 to 4, further comprising:
resetting the first parameter of the trained sequencing model every other preset time period to obtain a first pre-training model;
and retraining the first pre-training model to obtain a first sequencing model so as to reorder the virtual buttons through the first sequencing model.
7. The virtual button sorting method according to any one of claims 1 to 4, further comprising:
receiving a feedback value for the trained ranking model;
judging whether the feedback value is smaller than a preset feedback value threshold value or not;
if the feedback value is smaller than a preset feedback value threshold value, resetting a second parameter of the trained sequencing model to obtain a second pre-training model;
and retraining the second pre-training model to obtain a second sequencing model.
8. A virtual button sorting apparatus, comprising:
the system comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring environmental characteristic data of a target application program and acquiring a plurality of groups of sorting weights of virtual buttons of the target application program, and each group of sorting weights comprises one sorting weight of each virtual button;
the determining module is used for determining the current environment state according to the environment characteristic data;
the obtaining module is further configured to obtain a priority corresponding to each group of the ranking weights in the current environment state;
the determining module is further configured to determine a first sorting weight from the plurality of sets of sorting weights according to the priority;
the obtaining module is further configured to obtain first operation data corresponding to the first sorting weight, where the first operation data refers to the number of times that the virtual button is clicked and the service time after being sorted according to the first sorting weight;
the determining module is further configured to determine a target reward value corresponding to the first sorting weight according to the first operation data, wherein the higher the target reward value is, the better the sorting effect of the virtual buttons sorted according to the first sorting weight is indicated;
the adjusting module is used for adjusting the priority of the first sequencing weight in the pre-trained sequencing model according to the target reward value and a preset reward value;
the determining module is further configured to determine a target ranking weight from the plurality of sets of ranking weights through the adjusted ranking model;
and the sorting module is used for sorting the virtual buttons according to the target sorting weight.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the virtual button sorting method according to any of claims 1 to 7.
10. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements a virtual button sorting method as recited in any one of claims 1-7.
CN202010357419.8A 2020-04-29 2020-04-29 Virtual button ordering method and device, electronic equipment and storage medium Active CN111651226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010357419.8A CN111651226B (en) 2020-04-29 2020-04-29 Virtual button ordering method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010357419.8A CN111651226B (en) 2020-04-29 2020-04-29 Virtual button ordering method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111651226A true CN111651226A (en) 2020-09-11
CN111651226B CN111651226B (en) 2023-10-20

Family

ID=72349415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010357419.8A Active CN111651226B (en) 2020-04-29 2020-04-29 Virtual button ordering method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111651226B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087575A (en) * 2009-12-03 2011-06-08 鸿富锦精密工业(深圳)有限公司 Electronic device and method for dynamic arrangement of icons
CN103309661A (en) * 2013-05-27 2013-09-18 深圳市金立通信设备有限公司 Method and terminal for controlling interface application icons
US20180314972A1 (en) * 2017-04-26 2018-11-01 Citrix Systems, Inc. Application display and discovery by predicting behavior through machine-learning
CN109614022A (en) * 2018-11-26 2019-04-12 Oppo广东移动通信有限公司 Icon sort method, device, electronic equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087575A (en) * 2009-12-03 2011-06-08 鸿富锦精密工业(深圳)有限公司 Electronic device and method for dynamic arrangement of icons
CN103309661A (en) * 2013-05-27 2013-09-18 深圳市金立通信设备有限公司 Method and terminal for controlling interface application icons
US20180314972A1 (en) * 2017-04-26 2018-11-01 Citrix Systems, Inc. Application display and discovery by predicting behavior through machine-learning
CN109614022A (en) * 2018-11-26 2019-04-12 Oppo广东移动通信有限公司 Icon sort method, device, electronic equipment and medium

Also Published As

Publication number Publication date
CN111651226B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN109902706B (en) Recommendation method and device
CN109062919B (en) Content recommendation method and device based on deep reinforcement learning
CN108763494B (en) Knowledge sharing method between conversation systems, conversation method and device
CN110012060B (en) Information pushing method and device of mobile terminal, storage medium and server
CN109814933B (en) Service data processing method and device
CN106250403A (en) Customer loss Forecasting Methodology and device
CN110647921B (en) User behavior prediction method, device, equipment and storage medium
US20150213360A1 (en) Crowdsourcing system with community learning
CN110520871A (en) Training machine learning model
CN108021708B (en) Content recommendation method and device and computer readable storage medium
CN109086742A (en) scene recognition method, scene recognition device and mobile terminal
CN109359217B (en) User interest degree calculation method, server and readable storage medium
CN109218769B (en) Recommendation method for live broadcast room and related equipment
CN110489641A (en) A kind of information recommendation data processing method and device
CN112768056A (en) Disease prediction model establishing method and device based on joint learning framework
CN108369538A (en) Download vision assets
CN109582581A (en) A kind of result based on crowdsourcing task determines method and relevant device
Gummadi et al. Mean field analysis of multi-armed bandit games
US11188035B2 (en) Continuous control of attention for a deep learning network
CN111651226B (en) Virtual button ordering method and device, electronic equipment and storage medium
CN111009299A (en) Similar medicine recommendation method and system, server and medium
CN107943535B (en) Application cleaning method and device, storage medium and electronic equipment
CN113038242B (en) Method, device and equipment for determining display position of live broadcast card and storage medium
CN113536111B (en) Recommendation method and device for insurance knowledge content and terminal equipment
CN113806077A (en) Data center server regulation and control method and device based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant