CN111950733A - Information flow sorting method and device and computer storage medium - Google Patents
Information flow sorting method and device and computer storage medium Download PDFInfo
- Publication number
- CN111950733A CN111950733A CN201910407187.XA CN201910407187A CN111950733A CN 111950733 A CN111950733 A CN 111950733A CN 201910407187 A CN201910407187 A CN 201910407187A CN 111950733 A CN111950733 A CN 111950733A
- Authority
- CN
- China
- Prior art keywords
- information
- information flow
- current time
- recommendation list
- sorting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 239000000463 material Substances 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 53
- 238000013507 mapping Methods 0.000 claims description 37
- 238000011156 evaluation Methods 0.000 claims description 25
- 230000009471 action Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 10
- 238000009499 grossing Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 4
- 238000009825 accumulation Methods 0.000 abstract description 3
- 230000007774 longterm Effects 0.000 abstract description 3
- 238000012163 sequencing technique Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 8
- 230000014509 gene expression Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a method and a device for sorting information streams and a computer storage medium. Wherein, the method comprises the following steps: obtaining an information flow recommendation list according to the user characteristics and the current time state; obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics; and sorting the information streams in the information stream recommendation list according to the scores. According to the embodiment of the invention, long-term income can be considered in the information flow sequencing process, and the history accumulation is kept to be maximized, so that the user experience is improved.
Description
Technical Field
The present invention relates to the field of machine learning technologies, and in particular, to a method and an apparatus for sorting information streams, and a computer storage medium.
Background
In the information flow short video scene, a user can see a plurality of short videos arranged in sequence every time the user refreshes an information flow short video list. In general, it is desirable to display short videos that are more likely to be of interest to the user at positions further forward in the short video list to attract the user's click. If the short video of most interest to the user cannot be ranked to the front, the user needs to pull down the short video list to find the short video of most interest, which increases the user cost. It can be seen that the ordering of the information streams is very important.
In the prior art, in the information stream sorting stage, a click rate estimation model is usually used to sort the information streams. However, the click rate prediction model mainly considers the maximum click probability in the current scene, that is, the click rate prediction model is used for ranking the information streams, which can only keep short-term maximization, and the given information stream ranking is not necessarily the most interesting information of the user, so that the user experience is poor.
Accordingly, the inventors have determined that there is a need for improvement in at least one of the problems of the prior art described above.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a new technical solution for ordering information streams.
According to a first aspect of the embodiments of the present invention, there is provided a method for sorting information streams, the method including:
obtaining an information flow recommendation list according to the user characteristics and the current time state;
obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics;
and sorting the information streams in the information stream recommendation list according to the scores.
Optionally, obtaining an information flow recommendation list according to the user characteristics and the current time state, including:
calculating to obtain an evaluation value according to the current moment state, the return value of the current moment, the next moment state, the user action and a first mapping function;
and calculating to obtain the information flow recommendation list according to the user characteristics, the current time state, the evaluation value and a second mapping function.
Optionally, the reward value reward at the current time is calculated by the following formula:
wherein click + β read _ time is the user action, β is a weighting factor,adjusting a weight factor for the location; n isThe number of information streams.
Optionally, the expression of the loss function critic _ loss of the first mapping function is:
critic_loss=reward+gamma*vt+1-vt;
wherein, reward is the return value of the current time; gamma is a smoothing factor; v. oftThe current time state is obtained; v. oft+1The state of the next moment.
Optionally, the expression of the loss function actor _ loss of the second mapping function is:
actor_loss=reward_gain*td_error;
wherein td _ error is the evaluation value; the rewarded _ gain is a rewarded-origin _ rewarded value gain of the current time, the origin _ rewarded is an original rewarded value, and the rewarded is the rewarded value of the current time.
Optionally, the method further comprises:
acquiring log information;
updating the first mapping function according to the current time state, the return value of the current time, the next time state and the user action in the log information;
and updating the second mapping function according to the user characteristics, the current time state and the evaluation value in the log information.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for sorting information streams, the apparatus including:
the acquisition module is used for acquiring an information flow recommendation list according to the user characteristics and the current time state;
the scoring module is used for obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics;
and the sorting module is used for sorting the information streams in the information stream recommendation list according to the scores.
Optionally, the obtaining module is specifically configured to:
calculating to obtain an evaluation value according to the current moment state, the return value of the current moment, the next moment state, the user action and a first mapping function;
and calculating to obtain the information flow recommendation list according to the user characteristics, the current time state, the evaluation value and a second mapping function.
According to a third aspect of the embodiments of the present invention, there is provided an apparatus for sorting information streams, the apparatus including: a memory for storing instructions and a processor; the instructions are configured to control the processor to operate so as to perform the method of ordering information streams according to any one of the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method of ordering of information streams according to any one of the first aspect of embodiments of the present invention.
The method has the advantages that long-term income can be considered in the information flow sequencing process, and the history accumulation is kept to be maximized, so that the user experience is improved.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a block diagram showing a hardware configuration of a client 1000 that can be used to implement an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a method for sorting information streams according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of building an AC model according to an embodiment of the invention.
Fig. 4 is a schematic diagram of position adjustment of an information flow according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of AC model updating according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a structure of an information flow sorting apparatus 600 according to the present invention.
Fig. 7 is a schematic hardware configuration diagram of an information flow sorting apparatus 700 according to another embodiment.
Fig. 8 is a schematic illustration of an information flow ordering obtained without using the method of an embodiment of the invention.
Fig. 9 is a schematic diagram of the information flow ordering obtained by the method according to the embodiment of the invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Various embodiments and examples according to embodiments of the present invention are described below with reference to the accompanying drawings.
< hardware configuration >
Fig. 1 is a block diagram showing a hardware configuration of a client 1000 that can be used to implement an embodiment of the present invention.
As shown in fig. 1, the client 1000 of the embodiment may be a portable computer, a desktop computer, a mobile phone, a tablet computer, or the like.
As shown in fig. 1, client 1000 may include a processor 1010, memory 1020, interface device 1030, communication device 1040, display device 1050, input device 1060, speaker 1070, microphone 1080, and the like. The processor 1010 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1020 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1030 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1040 can perform wired or wireless communication, for example. The display device 1050 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 1060 may include, for example, a touch screen, a keyboard, and the like. A user can input/output voice information through the speaker 1070 and the microphone 1080.
In this embodiment, the memory 1020 of the client 1000 is configured to store instructions for controlling the processor 1010 to operate at least to perform the method of ordering information streams according to any embodiment of the invention. It should be understood by those skilled in the art that although a plurality of devices of the client 1000 are shown in fig. 1, the present invention may relate only to some of the devices, for example, the client 1000 relates only to the memory 1020, the processor 1010 and the display device 1050. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
< method >
Fig. 2 is a flowchart illustrating a method for sorting information streams according to an embodiment of the present invention. The method may be implemented by the client 1000.
As shown in fig. 2, the method for sorting information streams in this embodiment may include the following steps 2100 to 2300:
The user characteristics refer to preference characteristic information of the user, and for example, the user characteristics may be user interest information, user preference information, a user history browsing record, and the like. The current time state refers to the state of the user at the current time, such as browsing a list of information streams, watching a certain information stream, etc.
The information flow recommendation list comprises a plurality of information flows, and the information flows are obtained by calculation according to the user characteristics and the current time state.
Specifically, when the client 1000 obtains the information flow recommendation list through calculation, an AC model in a reinforcement learning algorithm may be used as a basic model, and a model of the information flow recommendation list obtained through calculation in this embodiment is innovatively constructed on the basic model. Wherein, a in the AC model is an actor neural network, i.e. Policy in fig. 3, and C in the AC model is a criticc neural network, i.e. Value in fig. 3.
As shown in fig. 3, the client 1000 may specifically be configured to determine the current time state StThe reported value r of the current timetNext time state St+1User action atAnd a first mapping function Value for calculating the evaluation Value td. According to the user characteristics and the current time state StThe evaluation value tdAnd a second mapping function Policy, calculating to obtain the information flow recommendation list Environment.
The second mapping function Policy is mainly used for decision behavior, and the first mapping function Value is used for evaluating the quality of the decision behavior and feeding back the result to the second mapping function Policy for correction. User action atThe difference in the number of the information streams in the information stream recommendation list may cause the information streams to be sorted differently, and after browsing the information streams, the user may use a click operation, where we define a return value, for example, the more clicks the user clicks, the higher the return value, and the like. After the user clicks or does not click on the information stream, the current time state S of the usertChange to the next time state St+1The Value of the first mapping function is based on the current time status StThe reported value r of the current timetNext time state St+1User action atAn evaluation value t is calculateddThe evaluation value is used for evaluating the return value of the current timertAnd the expected return value. In practical applications, the goal of the first mapping function Value is to minimize the evaluation Value tdWhile the evaluation value t is simultaneously setdTo a second mapping function Policy, such that the second mapping function Policy is based on the evaluation value tdAnd (6) correcting.
In this embodiment, it is defined that 8 information streams are recommended for each refresh when a user accesses an information stream service, where the reward value reward is to consider whether the user clicks on an information stream and a viewing duration, and there is a position weighting factor, that is, an information stream at a position before the user clicks is larger than the position weighting factor, so that the total reward value at each position in the 8 information streams is added up to be an overall benefit, where the overall benefit is defined as an original reward value origin _ reward. Here, the client 1000 needs to adjust the information stream with long time when there is a click and viewed to a more advanced position by calculation, for example, as shown in fig. 4, the position of the information stream of R0.3 is adjusted from the original third position to the first position, the position of the information stream of R0.2 is adjusted from the original first position to the second position, and the position of the information stream of R0 is adjusted from the original second position to the third position.
Specifically, the expression of the loss function criticc _ loss adopted by the first mapping function Value is as follows:
critic_loss=reward+gamma*vt+1-vt(ii) a Wherein, reward is the return value of the current time; gamma is a smoothing factor, and can take a value of, for example, 0.8; v. oftThe current time state is obtained; v. oft+1The state of the next moment. The expression of the loss function operator _ loss adopted by the second mapping function Policy is as follows: actor _ loss ═ reward _ gain ═ td _ error; wherein td _ error is the evaluation value; the rewarded _ gain is a rewarded-origin _ rewarded value gain of the current time, the origin _ rewarded is an original rewarded value, and the rewarded is the rewarded value of the current time.
The reward value reward at the current moment can be calculated by the following formula:
wherein click + beta read _ time is the user action, and the user action comprises the user clicking information stream and watching duration; beta is a weight-adjusting factor, and the value of beta can be 0.1 for example;the weighting factors are adjusted for the position of the object,can take, for example, a value of 0.9; n is the number of information streams, and pos is the position of the information streams in the information stream list. For example, if the position of the information stream in the information stream recommendation list is 1, the corresponding position weighting factor is 0.9; if the position of the information stream in the information stream recommendation list is 2, the corresponding position weighting factor is 0.92=0.81。
The information flow characteristics are characteristic information in the information flow, such as subject information, content information, and the like of the information flow. In this step, the client 1000 fuses the information flow recommendation list and the information flow characteristics to score each information flow material in the information flow recommendation list.
In practical applications, the information streams in the information stream recommendation list may be sorted according to the scores, and displayed on the display screen of the client 1000.
For example, when the information streams are not sorted using the method of the present embodiment, the information streams displayed on the display screen of the client 1000 are sorted as shown in fig. 8, and the information stream 1, the information stream 2, the information stream 3, the information stream 4, and the information stream 5 in the information stream recommendation list are sequentially sorted and displayed. According to the method of this embodiment, after the information flow recommendation list is obtained, each information flow material in the information flow recommendation list is scored according to the information flow recommendation list and the information flow characteristics, as shown in fig. 9, the score of the information flow 1 is 0.1, the score of the information flow 2 is 0.3, the score of the information flow 3 is 0.8, the score of the information flow 4 is 0.2, and the score of the information flow 5 is 0.6, the information flows in the information flow recommendation list are sorted according to the scores by the client 1000, and the information flows displayed on the display screen of the client 1000 are the information flow 3, the information flow 5, the information flow 2, the information flow 4, and the information flow 1. Therefore, by using the method of the embodiment, the obtained information flow sequence is most interesting for the user, so that the user can be attracted to click, and the user experience is improved.
Further, the method for sorting information streams of this embodiment further includes: acquiring log information; updating the first mapping function according to the current time state, the return value of the current time, the next time state and the user action in the log information; and updating the second mapping function according to the user characteristics, the current time state and the evaluation value in the log information.
As shown in fig. 5, the client 1000 obtains the Log information Log, and updates the first mapping function and the second mapping function after extracting, converting, and loading (ETL) the Log information Log, so as to update the AC model (AC Train) of this embodiment. After the AC model update is completed, access is provided to the online portion.
According to the method for sorting the information streams, the client obtains an information stream recommendation list according to the user characteristics and the current time state; obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics; and sorting the information streams in the information stream recommendation list according to the scores. According to the embodiment of the invention, long-term income can be considered in the information flow sequencing process, and the history accumulation is kept to be maximized, so that the user experience is improved.
< apparatus >
Fig. 6 is a schematic diagram of a structure of an information flow sorting apparatus 600 according to the present invention.
As shown in fig. 6, the information flow ranking apparatus 600 may include an obtaining module 610, a scoring module 620, and a ranking module 630.
The obtaining module 610 is configured to obtain an information flow recommendation list according to the user characteristics and the current time state.
The scoring module 620 is configured to obtain a score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics.
The sorting module 630 is configured to sort the information streams in the information stream recommendation list according to the scores.
Specifically, the obtaining module 610 may specifically be configured to: calculating to obtain an evaluation value according to the current moment state, the return value of the current moment, the next moment state, the user action and a first mapping function; and calculating to obtain the information flow recommendation list according to the user characteristics, the current time state, the evaluation value and a second mapping function.
Wherein the expression of the loss function critic _ loss of the first mapping function is as follows:
critic_loss=reward+gamma*vt+1-vt(ii) a Wherein, reward is the return value of the current time; gamma is a smoothing factor; v. oftThe current time state is obtained; v. oft+1The state of the next moment.
The expression of the loss function actor _ loss of the second mapping function is:
actor _ loss ═ reward _ gain ═ td _ error; wherein td _ error is the evaluation value; the rewarded _ gain is a rewarded-origin _ rewarded value gain of the current time, the origin _ rewarded is an original rewarded value, and the rewarded is the rewarded value of the current time.
In practical application, the reward value reward at the current time can be calculated by the following formula:wherein click + β read _ time is the user action, β is a weighting factor,adjusting a weight factor for the location; and N is the number of information streams.
Further, the obtaining module 610 may be further configured to: acquiring log information; updating the first mapping function according to the current time state, the return value of the current time, the next time state and the user action in the log information; and updating the second mapping function according to the user characteristics, the current time state and the evaluation value in the log information.
Fig. 7 is a schematic hardware configuration diagram of an information flow sorting apparatus 700 according to another embodiment.
As shown in fig. 7, the apparatus 700 for sorting information flow of the present embodiment may include a memory 710 and a processor 720.
Memory 710 is configured to store instructions for controlling processor 720 to operate to perform the method of sorting information streams of any of the embodiments of the present invention.
The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
The information flow sorting apparatus of this embodiment may be configured to execute the technical solutions of the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and are not described herein again.
< computer storage Medium >
In this embodiment, a computer storage medium is further provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for sorting information streams according to any embodiment of the present invention.
Those skilled in the art should understand that, in the field of electronic technology, the above method can be embodied in products by software, hardware and a combination of software and hardware, and those skilled in the art can easily generate an information processing apparatus including modules for performing respective operations in the information processing method according to the above embodiment based on the method of the above embodiment of the invention.
It is well known to those skilled in the art that with the development of electronic information technology such as large scale integrated circuit technology and the trend of software hardware, it has been difficult to clearly divide the software and hardware boundaries of a computer system. As any of the operations may be implemented in software or hardware. Execution of any of the instructions may be performed by hardware, as well as by software. Whether a hardware implementation or a software implementation is employed for a certain machine function depends on non-technical factors such as price, speed, reliability, storage capacity, change period, and the like. A software implementation and a hardware implementation are equivalent for the skilled person. The skilled person can choose software or hardware to implement the above described scheme as desired. Therefore, specific software or hardware is not limited herein.
The present invention may be an apparatus, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.
Claims (10)
1. A method for ordering information streams, the method comprising:
obtaining an information flow recommendation list according to the user characteristics and the current time state;
obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics;
and sorting the information streams in the information stream recommendation list according to the scores.
2. The method of claim 1, wherein obtaining the information flow recommendation list according to the user characteristics and the current time state comprises:
calculating to obtain an evaluation value according to the current moment state, the return value of the current moment, the next moment state, the user action and a first mapping function;
and calculating to obtain the information flow recommendation list according to the user characteristics, the current time state, the evaluation value and a second mapping function.
4. The method of claim 3, wherein the loss function critic _ loss of the first mapping function is expressed as:
critic_loss=reward+gamma*vt+1-vt;
wherein, reward is the return value of the current time; gamma is a smoothing factor; v. oftThe current time state is obtained; v. oft+1The state of the next moment.
5. The method of claim 3, wherein the loss function of the second mapping function, operator loss, is expressed as:
actor_loss=reward_gain*td_error;
wherein td _ error is the evaluation value; the rewarded _ gain is a rewarded-origin _ rewarded value gain of the current time, the origin _ rewarded is an original rewarded value, and the rewarded is the rewarded value of the current time.
6. The method of claim 2, further comprising:
acquiring log information;
updating the first mapping function according to the current time state, the return value of the current time, the next time state and the user action in the log information;
and updating the second mapping function according to the user characteristics, the current time state and the evaluation value in the log information.
7. An apparatus for sorting a stream of information, the apparatus comprising:
the acquisition module is used for acquiring an information flow recommendation list according to the user characteristics and the current time state;
the scoring module is used for obtaining the score of each information flow material in the information flow recommendation list according to the information flow recommendation list and the information flow characteristics;
and the sorting module is used for sorting the information streams in the information stream recommendation list according to the scores.
8. The apparatus of claim 7, wherein the obtaining module is specifically configured to:
calculating to obtain an evaluation value according to the current moment state, the return value of the current moment, the next moment state, the user action and a first mapping function;
and calculating to obtain the information flow recommendation list according to the user characteristics, the current time state, the evaluation value and a second mapping function.
9. An apparatus for sorting a stream of information, the apparatus comprising: a memory for storing instructions and a processor; the instructions are for controlling the processor to operate so as to carry out the method of sorting of information streams according to any one of claims 1 to 6.
10. A computer storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the method of sorting of information flows according to any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910407187.XA CN111950733B (en) | 2019-05-15 | 2019-05-15 | Method and device for ordering information streams and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910407187.XA CN111950733B (en) | 2019-05-15 | 2019-05-15 | Method and device for ordering information streams and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111950733A true CN111950733A (en) | 2020-11-17 |
CN111950733B CN111950733B (en) | 2024-06-11 |
Family
ID=73335858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910407187.XA Active CN111950733B (en) | 2019-05-15 | 2019-05-15 | Method and device for ordering information streams and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111950733B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090171932A1 (en) * | 2007-12-27 | 2009-07-02 | Sihem Amer Yahia | System and method for annotation and ranking of reviews personalized to prior user experience |
WO2014089776A1 (en) * | 2012-12-12 | 2014-06-19 | Google Inc. | Ranking search results based on entity metrics |
US20170286860A1 (en) * | 2016-03-29 | 2017-10-05 | Microsoft Corporation | Multiple-action computational model training and operation |
CN107346138A (en) * | 2017-06-16 | 2017-11-14 | 武汉理工大学 | A kind of unmanned boat method for lateral control based on enhancing learning algorithm |
CN107463701A (en) * | 2017-08-15 | 2017-12-12 | 北京百度网讯科技有限公司 | Method and apparatus based on artificial intelligence pushed information stream |
CN108345630A (en) * | 2017-12-27 | 2018-07-31 | 北京字节跳动网络技术有限公司 | Method, apparatus, intelligent terminal and the readable storage medium storing program for executing of digital content push |
CN108491529A (en) * | 2018-03-28 | 2018-09-04 | 百度在线网络技术(北京)有限公司 | Information recommendation method and device |
CN109033460A (en) * | 2018-08-30 | 2018-12-18 | 优视科技新加坡有限公司 | Sort method, device and equipment/terminal/server in a kind of information flow |
CN109246450A (en) * | 2018-08-06 | 2019-01-18 | 上海大学 | A kind of video display preferentially recommender system and method based on implicit information scoring |
CN109471963A (en) * | 2018-09-13 | 2019-03-15 | 广州丰石科技有限公司 | A kind of proposed algorithm based on deeply study |
CN109598403A (en) * | 2018-10-23 | 2019-04-09 | 阿里巴巴集团控股有限公司 | A kind of resource allocation methods, device, equipment and medium |
WO2019081778A1 (en) * | 2017-10-27 | 2019-05-02 | Deepmind Technologies Limited | Distributional reinforcement learning for continuous control tasks |
-
2019
- 2019-05-15 CN CN201910407187.XA patent/CN111950733B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090171932A1 (en) * | 2007-12-27 | 2009-07-02 | Sihem Amer Yahia | System and method for annotation and ranking of reviews personalized to prior user experience |
WO2014089776A1 (en) * | 2012-12-12 | 2014-06-19 | Google Inc. | Ranking search results based on entity metrics |
US20170286860A1 (en) * | 2016-03-29 | 2017-10-05 | Microsoft Corporation | Multiple-action computational model training and operation |
CN107346138A (en) * | 2017-06-16 | 2017-11-14 | 武汉理工大学 | A kind of unmanned boat method for lateral control based on enhancing learning algorithm |
CN107463701A (en) * | 2017-08-15 | 2017-12-12 | 北京百度网讯科技有限公司 | Method and apparatus based on artificial intelligence pushed information stream |
WO2019081778A1 (en) * | 2017-10-27 | 2019-05-02 | Deepmind Technologies Limited | Distributional reinforcement learning for continuous control tasks |
CN108345630A (en) * | 2017-12-27 | 2018-07-31 | 北京字节跳动网络技术有限公司 | Method, apparatus, intelligent terminal and the readable storage medium storing program for executing of digital content push |
CN108491529A (en) * | 2018-03-28 | 2018-09-04 | 百度在线网络技术(北京)有限公司 | Information recommendation method and device |
CN109246450A (en) * | 2018-08-06 | 2019-01-18 | 上海大学 | A kind of video display preferentially recommender system and method based on implicit information scoring |
CN109033460A (en) * | 2018-08-30 | 2018-12-18 | 优视科技新加坡有限公司 | Sort method, device and equipment/terminal/server in a kind of information flow |
CN109471963A (en) * | 2018-09-13 | 2019-03-15 | 广州丰石科技有限公司 | A kind of proposed algorithm based on deeply study |
CN109598403A (en) * | 2018-10-23 | 2019-04-09 | 阿里巴巴集团控股有限公司 | A kind of resource allocation methods, device, equipment and medium |
Non-Patent Citations (2)
Title |
---|
M KHAMASSI ET AL: "Actor–Critic models of reinforcement learning in the basal ganglia: from natural to artificial rats", 《ADAPTIVE BEHAVIOR》 * |
刘全;翟建伟;章宗长;钟珊;周倩;章鹏;徐进;: "深度强化学习综述", 计算机学报, no. 01 * |
Also Published As
Publication number | Publication date |
---|---|
CN111950733B (en) | 2024-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190364123A1 (en) | Resource push method and apparatus | |
CN110781391B (en) | Information recommendation method, device, equipment and storage medium | |
US10693981B2 (en) | Provisioning personalized content recommendations | |
CN112163159B (en) | Resource recommendation and parameter determination methods, devices, equipment and media | |
US10733254B2 (en) | Ranking of news feeds of content including consideration of specific content clicks by users | |
US11403006B2 (en) | Configurable machine learning systems through graphical user interfaces | |
US20170109642A1 (en) | Particle Thompson Sampling for Online Matrix Factorization Recommendation | |
CN107330715B (en) | Method and device for selecting picture advertisement material | |
US20190102695A1 (en) | Generating machine learning systems using slave server computers | |
US20190102675A1 (en) | Generating and training machine learning systems using stored training datasets | |
US20210374356A1 (en) | Conversation-based recommending method, conversation-based recommending apparatus, and device | |
CN111695695B (en) | Quantitative analysis method and device for user decision behaviors | |
US11736422B2 (en) | Systems and methods for updating creatives generation models | |
US11100116B2 (en) | Recommendation systems implementing separated attention on like and dislike items for personalized ranking | |
EP3882791A1 (en) | Method, apparatus for content recommendation, electronic device and storage medium | |
CN111738414A (en) | Recommendation model generation method, recommendation model generation device, recommendation content method, recommendation content device and recommendation content medium | |
US10121187B1 (en) | Generate a video of an item | |
KR20130014581A (en) | Selecting content based on interest tags that are included in an interest cloud | |
CN111460384B (en) | Policy evaluation method, device and equipment | |
CN111859114A (en) | Recommendation system optimization method, device, equipment and computer storage medium | |
US20200007937A1 (en) | Ranking carousels of on-line recommendations of videos | |
US20220171823A1 (en) | Interest tapering for topics | |
CN108121581B (en) | User interface for self-learning | |
US8838509B1 (en) | Site flow optimization | |
CN110275779B (en) | Resource acquisition method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |