CN110648182A

CN110648182A - Method, system, medium and computing device for automatic pricing of commodities

Info

Publication number: CN110648182A
Application number: CN201910937462.9A
Authority: CN
Inventors: 董家骥
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2020-01-03

Abstract

The embodiment of the invention provides an automatic commodity pricing method. The automatic commodity pricing method comprises the following steps: obtaining environmental state information from a plurality of historical data, wherein the environmental state information comprises response information indicating a user's historical pricing behavior; inputting the environmental state information into the reinforced deep learning neural network, so that the reinforced deep learning neural network determines a pricing strategy based on the environmental state information; and determining a pricing price of the commodity based on the pricing strategy, wherein the reinforcement deep learning neural network is configured to determine the pricing strategy of the commodity based on the environmental state information and the pricing strategy model, and score the pricing strategy based on the reward model, so that the reinforcement deep learning neural network updates the pricing strategy model according to the score. In addition, the embodiment of the invention provides an automatic commodity pricing system, a computer readable medium and a computing device.

Description

Method, system, medium and computing device for automatic pricing of commodities

Technical Field

The embodiment of the invention relates to the technical field of Internet, in particular to an automatic commodity pricing method, an automatic commodity pricing system, a medium and a computing device.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

The pricing of the commodities is an important activity of commodity sales management, and directly influences the value of the commodities and the rationality of the price. Therefore, before or during the sale of a product, the price of the product is priced or adjusted according to factors such as market conditions and the cost of the product.

Currently, some automatic pricing models for goods have emerged, but these automatic pricing models for goods are typically determined empirically. The automatic commodity pricing model determined based on experience requires a person to check and maintain the automatic commodity pricing model regularly, consumes a large amount of labor cost, and is dependent on experience, so that optimal pricing is not easy to explore.

Disclosure of Invention

In the prior art, therefore, automatic pricing of goods consumes a lot of labor cost and it is a very annoying process to find optimal pricing.

For this reason, there is a strong need for an improved method for automatically pricing commodities, so that the labor cost for automatically pricing commodities is low and the pricing price is more reasonable and accurate.

In this context, embodiments of the present invention are intended to provide a method of automatic pricing of commodities and an automatic pricing system, medium, and computing device for commodities.

In a first aspect of embodiments of the present invention, there is provided a method comprising: obtaining environmental state information from a plurality of historical data, wherein the environmental state information comprises response information indicating a user's historical pricing behavior; inputting the environmental state information into an intensified deep learning neural network, so that the intensified deep learning neural network determines a pricing strategy based on the environmental state information; and determining a pricing price of the commodity based on the pricing strategy, wherein the reinforcement deep learning neural network is configured to determine the pricing strategy of the commodity based on the environmental state information and the pricing strategy model, and score the pricing strategy based on a reward model, so that the reinforcement deep learning neural network updates the pricing strategy model according to the score.

In one embodiment of the invention, the determining the pricing strategy by the reinforcement deep learning neural network based on the environmental state information and the pricing strategy model comprises: a first determination mode and a second determination mode, the first determination mode including: determining a price adjusting proportion of the pricing price relative to the current price based on the environment state information and a first pricing strategy model; the second determination mode includes: the environment state information comprises a plurality of state data, and action vectors of the state data are determined based on the environment state information and a second pricing strategy model, wherein elements in the action vectors correspond to the state data in a one-to-one mode; and performing point multiplication on the state vector formed by the action vector and the plurality of state data, and using a point multiplication result obtained through the point multiplication as a price adjusting proportion of the pricing price relative to the current price.

In another embodiment of the present invention, the method further comprises: in the second determination mode: determining an action vector for the plurality of state data for which the second pricing policy model is in a converged state; and deleting the state data corresponding to the elements of which the numerical values are smaller than a preset threshold value in the motion vector from the environment state information so as to update the environment state information.

In yet another embodiment of the present invention, the augmented deep learning neural network switches from the second determination mode to the first determination mode in response to the environmental status information being updated.

In yet another embodiment of the present invention, the method further comprises: obtaining a reward function for calculating an impact of the pricing price on sales of the good; acquiring the environmental state information characteristics of the commodities priced according to the pricing price; calculating an actual reward value generated by the pricing price for the sale of the commodity based on the reward function and the environmental status information characteristics; and inputting the actual reward value and the environmental status information characteristic to the reinforcement deep learning neural network, so that the reinforcement deep learning neural network updates the reward model and the pricing strategy model based on the actual reward value.

In yet another embodiment of the present invention, determining the pricing price of the good based on the pricing strategy includes: acquiring a preset maximum value and a preset minimum value of the commodity price; determining a predetermined price for the good based on the pricing strategy; determining to set a pricing price of the commodity as a maximum value of the preset minimum value and the preset price under the condition that the preset price is smaller than the current price of the commodity; and determining to set the pricing price of the commodity as the minimum value of the preset maximum value and the preset price under the condition that the preset price is larger than the current price of the commodity.

In yet another embodiment of the present invention, the reinforcement Deep learning neural network includes a Deep Deterministic Policy Gradient algorithm module.

In a second aspect of embodiments of the present invention, there is provided an automatic commodity pricing system, comprising: the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining environment state information from a plurality of historical data, and the environment state information comprises response information indicating a user to historical pricing behaviors; the reinforced deep learning module is used for inputting the environment state information into a reinforced deep learning neural network so that the reinforced deep learning neural network determines a pricing strategy based on the environment state information; and a first determination module for determining a pricing price of the commodity based on the pricing policy, wherein the reinforcement deep learning module is configured to determine the pricing policy of the commodity based on the environmental status information and the pricing policy model, and score the pricing policy based on a reward model, so that the reinforcement deep learning neural network updates the pricing policy model according to the score.

In one embodiment of the invention, the reinforcement deep learning module comprises a first sub-module and a second sub-module, wherein the first sub-module is used for executing a first determination mode, and the first determination mode comprises the steps of determining a price adjusting proportion of the pricing price relative to the current price based on the environment state information and a first pricing strategy model; the second sub-module is configured to execute a second determination mode, where the second determination mode includes that the environmental status information includes a plurality of status data, and determine, based on the environmental status information and a second pricing policy model, an action vector of the plurality of status data, where elements in the action vector correspond to the plurality of status data one-to-one; and performing point multiplication on the state vector formed by the action vector and the plurality of state data, and using a point multiplication result obtained through the point multiplication as a price adjusting proportion of the pricing price relative to the current price.

In yet another embodiment of the present invention, the system further comprises: a second determining module, configured to determine, in the second determining mode, action vectors of the plurality of state data for which the second pricing policy model is in a converged state; and the updating module is used for deleting the state data corresponding to the elements of which the values are smaller than a preset threshold value in the motion vector from the environment state information so as to update the environment state information.

In yet another embodiment of the present invention, the system further comprises: an adjustment module, configured to switch the reinforcement deep learning neural network from the second determination mode to the first determination mode in response to the environmental status information being updated.

In yet another embodiment of the present invention, the system further comprises: the second obtaining module is used for obtaining a reward function, and the reward function is used for calculating a selling effect brought by the influence of the pricing price on the selling of the commodity; the third acquisition module is used for acquiring the environmental state information characteristics of the commodities after pricing according to the pricing price; a calculation module for calculating an actual reward value generated by the pricing price for the sale of the commodity based on the reward function and the environmental status information characteristics; and the input module is used for inputting the actual reward value and the environmental state information characteristic into the reinforced deep learning neural network, so that the reinforced deep learning neural network updates the reward model and the pricing strategy model based on the actual reward value.

In yet another embodiment of the present invention, the first determining module includes: the acquisition submodule is used for acquiring a preset maximum value and a preset minimum value of the commodity price; a first determining sub-module for determining a predetermined price of the commodity based on the pricing strategy; the second determining submodule is used for determining that the pricing price of the commodity is the maximum value of the preset minimum value and the preset price under the condition that the preset price is smaller than the current price of the commodity; and a third determining sub-module, configured to determine to set the pricing price of the commodity as a minimum value of the preset maximum value and the predetermined price, if the predetermined price is greater than the current price of the commodity.

In a third aspect of embodiments of the present invention, there is provided a medium storing computer-executable instructions that, when executed by a processing unit, are configured to implement the above-described information processing method.

In a fourth aspect of embodiments of the present invention, there is provided a computing device comprising: a processing unit; and a storage unit storing computer-executable instructions for implementing the above-described information processing method when executed by the processing unit.

According to the automatic commodity pricing method and the automatic commodity pricing system, the commodities can be automatically priced according to the environment state information and the pricing strategy model, the pricing strategy model can be automatically adjusted, manual inspection and maintenance of the automatic commodity pricing model are not needed, accordingly, the labor cost is remarkably reduced, and the automatic commodity pricing is more reasonable and accurate.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 schematically illustrates an exemplary system architecture for a method for automatic pricing of goods and a system thereof, according to an embodiment of the invention;

FIG. 2 schematically illustrates a flow chart of a method for automatic pricing of goods according to an embodiment of the invention;

FIG. 3 schematically illustrates an architecture diagram of an augmented deep learning neural network according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method for automatic pricing of goods according to another embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a method of determining a pricing price for a good according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a system architecture that may be applied to a method of automatic pricing of goods according to another embodiment of the present disclosure;

FIG. 7A schematically illustrates a block diagram of an automatic merchandise pricing system according to an embodiment of the invention;

FIG. 7B schematically illustrates a block diagram of an enhanced deep learning module according to an embodiment of the invention;

FIG. 7C schematically illustrates a block diagram of an automatic merchandise pricing system according to another embodiment of the invention;

FIG. 7D schematically illustrates a block diagram of a first determination module in accordance with an embodiment of the present invention;

FIG. 8 schematically shows a schematic view of a computer-readable storage medium product according to an embodiment of the invention; and

FIG. 9 schematically shows a block diagram of a computing device according to an embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to embodiments of the present invention, a method, medium, system, and computing device for automatic pricing of goods are presented.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.

Summary of The Invention

The inventor finds that in the related art, the automatic pricing of the commodities is generally realized by a dynamic pricing model based on empirical rules. However, the pricing method based on the rule-of-thumb is poor in generalization and scalability, and requires manpower to check and maintain the rule-of-thumb regularly. On the other hand, the dynamic pricing model based on the empirical rule often cannot explore the optimal pricing strategy, and directly influences the value of the commodity and the rationality of the price.

The embodiment of the invention provides an automatic commodity pricing method. The method inputs the environmental state information into the reinforcement deep learning neural network, and determines the pricing strategy by using the reinforcement deep learning neural network, thereby further determining the pricing price according to the pricing strategy. The reinforcement deep learning neural network may include a pricing strategy model and a reward model. The pricing strategy model is used for determining a pricing strategy according to the environment state information, and the reward model is used for scoring the pricing strategy, so that the pricing strategy model optimizes the pricing strategy model according to the scoring result of the reward model. Therefore, the automatic commodity pricing method does not need manpower to maintain the pricing strategy model, and can optimize the pricing strategy model according to the grading result of the reward model, so that the price of automatic commodity pricing is more reasonable and accurate.

Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.

Application scene overview

An exemplary system architecture of a method for automatic pricing of goods and a system thereof according to an embodiment of the present invention will be first explained in detail with reference to fig. 1.

As shown in fig. 1, the system architecture may include at least an intelligent pricing data warehouse, a data filtering and cleaning module, a data transformation module, and an enhanced deep learning module, for example.

According to the embodiment of the disclosure, the intelligent pricing data warehouse may store relevant data for multiple automatic pricing of commodities, for example. The relevant data may include, for example, time of each pricing, price of each pricing, sales of the good after each pricing over different time periods, click rate of the good after each pricing, rating of the good after each pricing, inventory of the good, and so forth.

According to the embodiment of the disclosure, the related data in the intelligent pricing data warehouse can be input into the data filtering and cleaning module, and the data related to the automatic pricing of the commodity is filtered out by the data filtering and cleaning module. The data related to the automatic pricing of the commodity can be, for example, the price of the last pricing, the stock of the last time, the sales volume of the commodity in different time periods after the last pricing, the click volume of the commodity and the like. For example, Hive (a data warehouse tool based on Hadoop) can be used for filtering and cleaning the relevant data in the intelligent pricing data warehouse.

According to the embodiment of the disclosure, after the data is filtered by using Hive, the obtained data can be input into the data transformation module. And the data conversion module performs format conversion on the data from the data filtering and cleaning module according to a preset conversion rule so as to generate the environmental state information. For example, the environmental status information may be a vector, and the data transformation module transforms the data from the data filtering and cleansing module into a vector.

The environmental state information includes user response information to historical pricing behavior. The environment state information s can be represented by, for example, s ═ (pageprice, cost, sales1, sales3, sales7, store, pv, comments). Wherein pageprice is page price, cost is cost price, sales of sales1 in one day, sales of sales3 in three days, sales of sales7 in seven days, store is foreground stock, pv is commodity click rate, and comments are commodity comments in one week.

And then, inputting the obtained environment state information into an enhanced deep learning module, and determining a pricing strategy by an enhanced deep learning neural network based on the environment state information and a pricing strategy model of the enhanced deep learning neural network.

According to the embodiment of the disclosure, the pricing price of the commodity is determined according to the pricing strategy, and the intelligent pricing data warehouse can collect and store the pricing price and the environment state information after pricing.

Exemplary method

A method for automatic pricing of items according to an exemplary embodiment of the invention is described below with reference to fig. 2-6 in conjunction with the system architecture of fig. 1. It should be noted that the above-described system architecture is only shown for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in any way in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

Fig. 2 schematically shows a flowchart of an automatic commodity pricing method according to an embodiment of the present invention.

As shown in fig. 2, the method for automatically pricing goods may include operations S210 to S230.

In operation S210, environmental status information is acquired from a plurality of historical data, wherein the environmental status information includes response information indicating a user' S response to historical pricing behavior.

For example, in the system architecture shown in fig. 1, relevant data related to the automatic pricing may be obtained from an intelligent pricing data warehouse, and the environmental state information is generated after the relevant data related to the automatic pricing is subjected to data filtering and cleaning and data transformation.

According to an embodiment of the present disclosure, the environmental status information may be, for example, a vector. Specifically, the environmental status information s may be defined as:

s＝(pageprice，cost，sales1，sales3，sales7，store，pv，comments)

wherein pageprice is page price, cost is cost price, sales of sales1 in one day, sales of sales3 in three days, sales of sales7 in seven days, store is foreground stock, pv is commodity click rate, and comments are commodity comments in one week.

Or, in order to better distinguish different commodities and avoid mutual influence between similar or like commodities, the core long-term characteristics of the commodities can be added into the environmental state information. For example, if the product a and the product B are similar products, and the response information of the product B or the reward model for the product B is used by mistake when the product a is automatically priced, the product automatic pricing is inaccurate, and the model training of the deep learning neural network is inaccurate. The core long term characteristic of the article may be a characteristic of the article that does not change over time. For example, the characteristics of the primary category, the secondary category, the commodity grade and the like of the commodity can be included. In addition, considering that the change of the environmental state information of the goods may be related to the previous two pricing, the response information of the user in two pricing periods may be used as the environmental state information. Specifically, the environmental status information s may be defined as:

wherein, Cat1id is the primary category id, Cat2id is the secondary category id, the recent _ experienced _ count is the latest upcoming expired quantity, the competitive _ product _ price is the price of the competitive product, grade is the commodity grade, Pageprice1, cost1, sales11, sales31, sales71, store1, pv1 and sales1 are the response information of the user in the first pricing period respectively, and Pageprice2, cost2, sales12, sales32, sales72, store2, pv2 and sales 2 are the response information of the user in the second pricing period respectively.

When the specific algorithm is implemented, due to the fact that the amplitudes of different dimensions in the environment state information are different, the characteristic values of all the dimension characteristics can be normalized to be in a range of [0, 1], and then follow-up processing is carried out. For example, page price, cost price, sales volume in one day, sales volume in three days, sales volume in seven days, front desk inventory, commodity click volume, number of reviews for a commodity in one week, etc. may be normalized to the [0, 1] interval.

In operation S220, the environmental state information is input into the deep learning neural network, so that the deep learning neural network determines a pricing strategy based on the environmental state information. Wherein the reinforcement deep learning neural network is configured to determine a pricing strategy for the commodity based on the environmental status information and the pricing strategy model, and score the pricing strategy based on the incentive model, such that the reinforcement deep learning neural network updates the pricing strategy model according to the score.

According to the embodiment of the disclosure, single-step updating and round updating are generally included in the intensive deep learning. The single step update represents that the model in the reinforcement deep learning neural network is learned once every pricing action is performed, and the round update represents that the model in the reinforcement deep learning neural network is learned once after a series of pricing actions is performed. Therefore, according to the embodiments of the present disclosure, the single-step updated robust Deep Learning neural network model is more suitable for automatic commodity pricing, such as ddpg (Deep Deterministic Policy) dpg (Deterministic Policy gradient), DQN (Deep Q-Learning), and the like. Action strategies of the reinforcement deep learning neural network can be generally divided into random strategies and deterministic strategies. The random strategy has low convergence speed and high requirement on data volume, the deterministic strategy can converge in a good direction more quickly, and the requirement on the data volume is obviously lower than that of the random strategy. According to an embodiment of the present disclosure, an enhanced deep learning neural network with an action policy being a deterministic policy may be selected, which may be, for example, DPG, DDPG, or the like.

According to embodiments of the present disclosure, automated pricing of goods may be achieved using the DDPG algorithm. According to embodiments of the present disclosure, DQN may be used to achieve automatic pricing of goods when DDPG stability is found to be unsatisfactory.

One skilled in the art will appreciate that any robust deep learning neural network may be applied to the embodiments of the present disclosure, and the present disclosure does not limit the specific algorithm of the robust deep learning neural network.

Fig. 3 schematically illustrates an architecture diagram of an augmented deep learning neural network according to an embodiment of the present disclosure.

As shown in fig. 3, the reinforcement deep learning neural network may include a pricing strategy model and a reward model. The pricing strategy model can determine a pricing strategy according to a last pricing strategy (for example, the previous adjustment proportion) and environment state information generated after price adjustment, inputs the pricing strategy into the reward model to score the pricing strategy through the reward model, and feeds the scoring result back to the pricing strategy model so that the pricing strategy model optimizes itself.

According to the embodiment of the disclosure, the reward model can obtain the actual reward value after pricing according to the pricing strategy, update the reward model according to the actual reward value, and update the pricing strategy model. The actual prize Value may be characterized, for example, by the GMV (Gross Merchance Value) generated after pricing.

According to the embodiment of the disclosure, the automatic commodity pricing method does not need manpower to maintain the pricing strategy model, and can optimize the pricing strategy model according to the grading result of the reward model, so that the price of automatic commodity pricing is more reasonable and accurate.

According to an embodiment of the present disclosure, determining the pricing policy based on the environmental state information and a pricing policy model by the reinforcement deep learning neural network includes: a first determination mode and a second determination mode. The first determination mode includes: and determining the price adjusting proportion of the pricing price relative to the current price based on the environment state information and the first pricing strategy model. The second determination mode includes: the environment state information includes a plurality of state data, an action vector of the plurality of state data is determined based on the environment state information and the second pricing policy model, elements in the action vector correspond to the plurality of state data one to one, and a state vector formed by the action vector and the plurality of state data is point-multiplied, and a point-multiplied result obtained through the point-multiplication is taken as a price adjusting proportion of the pricing price relative to the current price.

According to an embodiment of the present disclosure, in the first determination mode, the pricing strategy output by the reinforcement deep learning neural network according to the first pricing strategy model may be a price adjustment proportion of a pricing price relative to a current price. For example, if the reinforced deep learning neural network outputs a according to the first pricing strategy model, the pricing price of the commodity is determined based on the pricing strategy in operation S230, wherein the determined pricing price may be (a +1) × pageprice₀Wherein pageprice₀Is the current price.

According to an embodiment of the disclosure, in the second determination mode, the second pricing policy model of the reinforcement deep learning neural network may determine an action vector of a plurality of state data, where elements in the action vector correspond to the plurality of state data one to one. The plurality of state data are state data included in the environment state information. For example, the environment state information s ═ (pageprice, cost, sales1, sales3, sales7, store, pv, comments), and the action vector may be a ═ (a ═ a₀，a₁，a₂，……，a₈). Next, the reinforcement deep learning neural network performs point multiplication on the motion vector s and a state vector s formed by a plurality of state data to obtain a price adjustment ratio of the pricing price to the current price. The valorization ratio can be represented, for example, by p, which is then a. A pricing price of the goods is determined based on the pricing policy, and the determined pricing price may be (p +1) × pageprice in operation S230₀Wherein pageprice₀Is the current price.

According to the embodiment of the disclosure, the second determination mode can control the pricing strategy to be more optimal by changing the action vector and updating the action vector.

According to an embodiment of the present disclosure, the method for automatically pricing commodities may further include: in a second determination mode, determining action vectors of a plurality of state data of the second pricing strategy model in a convergence state, and determining the action vectorsAnd deleting the state data corresponding to the element with the numerical value smaller than the preset threshold value from the environment state information so as to update the environment state information. For example, when the second pricing strategy model is in a convergence state, the element a in the action vector a of the plurality of state data is determined₈Is much smaller than the values of the other elements in the motion vector a, the sum a in the environment state information vector s can be compared with the sum a₈The corresponding state data is deleted. Specifically, for example, in the example where the environment state information s is represented by a vector (pages, cost, sales1, sales3, sales7, store, pv, comments) in the above embodiment, and a₈The corresponding status data may be comments, and the updated environment status information s' may be represented by (pageprice, cost, sales1, sales3, sales7, store, pv).

According to the embodiment of the disclosure, the method can delete the state data which has less influence on the pricing price in the environment state information, thereby simplifying the environment state information and reducing the calculation complexity.

According to the embodiment of the disclosure, after the environmental state information is updated, the reinforcement deep learning neural network can be switched from the second determination mode to the first determination mode to accelerate the convergence speed of the model. According to the embodiment of the disclosure, after the deep learning neural network is switched to the first determination mode, the deep learning neural network determines the price adjusting proportion of the pricing price relative to the current price according to the environmental state information and the first pricing strategy model.

FIG. 4 schematically illustrates a flow chart of a method for automatic pricing of items according to another embodiment of the disclosure.

As shown in fig. 4, the method for automatically pricing commodities may further include operations S410 to S440 based on the operations S210 to S230 described in fig. 2.

In operation S410, a reward function for calculating an impact of a pricing price on sales of the goods is acquired.

According to an embodiment of the present disclosure, after the commodity is priced, the priced price may have an influence on, for example, the sales volume, sales amount, attention, and the like of the commodity. The impact of the pricing price on the sale of the good may be calculated by a reward function.

According to embodiments of the present disclosure, the reward function may be, for example, the GMV of the good for one pricing period. One pricing cycle may be the time interval between two adjacent pricing actions. One pricing period may be, for example, 7 days, 3 days, etc. According to the embodiment of the disclosure, when the reinforcement deep learning neural network does not converge, the pricing period can be set to be longer, and in the process that the reinforcement deep learning neural network gradually tends to converge, the pricing period can be gradually shortened.

For example, where a pricing period is 7 days, the reward function R may be calculated as follows:

wherein i denotes the day, sales7_iSales on day i are indicated and pageprice is the price after pricing.

According to the embodiment of the disclosure, the reward function may also add a priori knowledge on the basis of the GMV, for example, so that the pricing action may be planned in a lump among multiple targets, flexibly coping with different requirements of each business scenario. For example, when inventory of goods needs to be cleared, the pricing action may be orchestrated between inventory clearing and sales. In this embodiment, the reward function R may be calculated as follows:

wherein the content of the first and second substances,

λ is an empirically determined ratio for a function containing a priori knowledge.

For example, may be equal to-store, or may be equal to b × (pageprice-cost) -a × store. Wherein the store may be an inventory, and a and b may be empirically determinedThe placed scale parameter, pageprice, may be a pricing price.

In operation S420, environmental status information characteristics of the goods priced according to the pricing price are acquired.

For example, the environmental status information characteristic of the priced goods may be obtained according to a reward function. If the reward function is:

the environmental status information characteristics obtained may then include sales, pricing, inventory per day.

In operation S430, an actual bonus value that the pricing price generates for the sale of the goods is calculated based on the bonus function and the environmental status information characteristics.

In operation S440, the actual reward value and the environmental status information characteristic are input to the augmented deep learning neural network, such that the augmented deep learning neural network updates the reward model and the pricing strategy model based on the actual reward value.

Referring to fig. 3 to illustrate operation S440, as shown in the reinforced deep learning neural network architecture shown in fig. 3, the priced actual reward value may be used to optimize a reward model, and the reward model performs its own optimization according to the priced actual reward value of each turn, so that the pricing policy of the pricing policy model can be scored more accurately, and the pricing policy model optimizes its own model according to the score.

According to the embodiment of the disclosure, the automatic commodity pricing method can calculate the actual reward value generated by the pricing price for the sale of the commodity according to the environmental state information characteristics of the priced commodity, and the intensive deep learning neural network can update the reward model and the pricing strategy model according to the actual reward value, so that the rationality and the accuracy of automatic commodity pricing are further improved.

FIG. 5 schematically illustrates a flow chart of a method of determining a pricing price for a good according to an embodiment of the present disclosure.

As shown in fig. 5, the method may include operations S231 to S234.

In operation S231, a preset maximum value and a preset minimum value of the commodity price are acquired.

According to an embodiment of the present disclosure, the preset maximum value and the preset minimum value of the commodity price may be set according to the needs of the salesperson.

In operation S232, a predetermined price of the goods is determined based on the pricing policy.

For example, if the tuning rate of the output of the reinforcement deep learning neural network relative to the current price is a, the predetermined price of the commodity determined based on the pricing strategy may be (a +1) × pageprice₀Wherein pageprice₀Is the current price.

In operation S233, in the case where the predetermined price is less than the current price of the commodity, it is determined that the pricing price of the commodity is set to the maximum value of the preset minimum value and the predetermined price.

According to an embodiment of the present disclosure, in a case where the predetermined price is less than the current price of the commodity, the pricing price of the commodity is determined to be the maximum value of the preset minimum value and the predetermined price. In other words, when the price of the commodity is reduced, the pricing price is not less than the preset minimum value, and when the pricing price is less than the preset minimum value, the pricing price of the commodity is determined to be the preset minimum value.

In operation S234, in case that the predetermined price is greater than the current price of the goods, it is determined that the pricing price of the goods is set to the minimum value of the preset maximum value and the predetermined price. In other words, when the price of the commodity is increased, the pricing price is not larger than the preset maximum value, and when the pricing price is larger than the preset minimum value, the pricing price of the commodity is determined to be the preset maximum value.

According to the embodiment of the disclosure, the pricing method can determine the final pricing price according to the actual demand on the basis of the pricing strategy, so that the automatic pricing of the commodities is more humanized.

Fig. 6 schematically shows a system architecture that can be applied to an automatic commodity pricing method according to another embodiment of the present disclosure.

As shown in fig. 6, the system architecture may include a data processing module 610, an augmented deep learning neural network module 630, a pricing module 650, and a reward calculation module 670.

The data processing module 610 may, for example, utilize a spark calculation engine to perform data processing on the relevant data obtained from the intelligent pricing data store. The data processing may, for example, include processing the associated data into data conforming to an environmental status information format to obtain environmental status information.

The environmental state information is input into the reinforced deep learning neural network module 630, and a pricing strategy, which may be a price ratio, is determined and output by the reinforced deep learning neural network module 630 according to the environmental state information and the pricing strategy model.

According to an embodiment of the present disclosure, the pacing proportion determined by the reinforcement deep learning neural network module 630 may be artificially changed. For example, when a person skilled in the art searches whether the price adjustment ratio output by the pricing strategy model is optimal, the price adjustment ratio can be adjusted to other values manually.

According to embodiments of the present disclosure, the pricing price may be calculated by the pricing module 650 after the price adjustment proportion is obtained.

According to the embodiment of the present disclosure, the pricing price may be artificially interfered, for example, operations S231 to S234 described above with reference to fig. 5 may be performed, and are not described herein again.

According to the embodiment of the present disclosure, after implementing the pricing price obtained by the current calculation, the incentive calculation module 670 may calculate the incentive generated by the current pricing according to the actual pricing price, for example, the incentive may be GMV, so that the deep learning neural network module 630 learns online and optimizes the pricing policy model and the incentive model.

According to an embodiment of the present disclosure, the reward calculation module 670 may further obtain environmental status information characteristics generated by past pricing prices from the data processing module 610, and calculate a reward value according to the environmental status information characteristics, so that the deep learning neural network module 630 can learn online and optimize the pricing strategy model and the reward model.

Exemplary System

Having described the method of an exemplary embodiment of the present invention, an automatic merchandise pricing system of an exemplary embodiment of the present invention is next described with reference to fig. 7A-7D.

Fig. 7A schematically illustrates a block diagram of an automatic merchandise pricing system 700 according to an embodiment of the invention.

As shown in fig. 7A, the automatic merchandise pricing system 700 may include a first obtaining module 710, an enhanced deep learning module 720, and a first determining module 730.

The first obtaining module 710, for example, may perform operation S210 described above with reference to fig. 2, for obtaining environmental status information from a plurality of historical data, wherein the environmental status information includes response information indicating a user' S response to historical pricing behavior.

The deep learning module 720, for example, may perform operation S220 described above with reference to fig. 2 for inputting the environmental state information into the deep learning neural network, so that the deep learning neural network determines the pricing strategy based on the environmental state information. The reinforcement deep learning module is configured to determine a pricing strategy of the commodity based on the environmental state information and the pricing strategy model, and score the pricing strategy based on the reward model, so that the reinforcement deep learning neural network updates the pricing strategy model according to the score.

The first determining module 730, for example, may perform operation S230 described above with reference to fig. 2 for determining a pricing price of the good based on the pricing policy.

FIG. 7B schematically shows a block diagram of the reinforcement deep learning module 720 according to an embodiment of the present invention.

As shown in fig. 7B, the enhanced deep learning module 720 may include a first sub-module 721 and a second sub-module 722.

The first sub-module 721 is used to perform a first determination mode, which includes determining a price adjustment ratio of the pricing price with respect to the current price based on the environmental status information and the first pricing strategy model.

The second sub-module 722 is configured to execute a second determination mode, where the second determination mode includes that the environment status information includes a plurality of status data, and determine, based on the environment status information and the second pricing policy model, an action vector of the plurality of status data, and elements in the action vector correspond to the plurality of status data one to one; and performing point multiplication on the state vector formed by the action vector and the plurality of state data, and using a point multiplication result obtained through the point multiplication as a price adjusting proportion of the pricing price relative to the current price.

According to an embodiment of the present disclosure, the automatic commodity pricing system 700 may further include a second determining module, configured to determine, in a second determining mode, motion vectors of the plurality of state data of the second pricing policy model in a converged state, and the updating module is configured to delete the state data corresponding to the element of which the value in the motion vector is smaller than the preset threshold from the environmental state information to update the environmental state information.

According to an embodiment of the present disclosure, the automatic merchandise pricing system 700 may further include an adjustment module for switching the reinforcement deep learning neural network from the second determination mode to the first determination mode in response to the environmental status information being updated.

Fig. 7C schematically illustrates a block diagram of an automatic merchandise pricing system 800 according to another embodiment of the invention.

As shown in fig. 7C, the automatic commodity pricing system 800 may further include a second obtaining module 810, a third obtaining module 820, a calculating module 830 and an input module 840 on the basis of the automatic commodity pricing system 700 according to the foregoing embodiment.

The second obtaining module 810, for example, may perform operation S410 described above with reference to fig. 4, for obtaining a reward function for calculating a sales effect resulting from an influence of the pricing price on the sales of the goods.

The third obtaining module 820, for example, may perform operation S420 described above with reference to fig. 4, for obtaining the environmental status information characteristic of the goods priced according to the pricing price.

The calculation module 830, for example, may perform operation S430 described above with reference to fig. 4, for calculating an actual reward value generated by the pricing price for the sale of the goods based on the reward function and the environmental status information characteristics.

The input module 840, for example, may perform operation S440 described above with reference to fig. 4, for inputting the actual reward value and the environmental status information feature to the reinforcement deep learning neural network, such that the reinforcement deep learning neural network updates the reward model and the pricing policy model based on the actual reward value.

Fig. 7D schematically illustrates a block diagram of the first determination module 730 according to an embodiment of the invention.

As shown in fig. 7D, the first determination module 730 may include an acquisition sub-module 731, a first determination sub-module 732, a second determination sub-module 733, and a third determination sub-module 734.

The obtaining sub-module 731, for example, may perform the operation S231 described above with reference to fig. 5, for obtaining the preset maximum value and the preset minimum value of the commodity price.

The first determining sub-module 732 may, for example, perform operation S232 described above with reference to fig. 5 for determining a predetermined price of the goods based on the pricing policy.

The second determining sub-module 733, for example, may perform operation S233 described above with reference to fig. 5 for determining to set the pricing price of the commodity to the maximum value of the preset minimum value and the predetermined price in a case where the predetermined price is less than the current price of the commodity.

The third determining sub-module 734, for example, may perform operation S234 described above with reference to fig. 5, for determining to set the pricing price of the commodity to the minimum value of the preset maximum value and the predetermined price in case that the predetermined price is greater than the current price of the commodity.

Exemplary Medium

Having described the system of an exemplary embodiment of the present invention, a program product for implementing the method for automatic pricing of merchandise of any of the above method embodiments of the present invention is described next with reference to fig. 8.

In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product including program code for causing a computing device to perform the steps in the method for automatic pricing of goods according to various exemplary embodiments of the present invention described in the above section "exemplary method" of this specification, when the program product is run on the computing device, for example, the computing device may perform operation S210 as shown in fig. 2: obtaining environmental state information from a plurality of historical data, wherein the environmental state information comprises response information indicating a user's historical pricing behavior; operation S220: inputting the environmental state information into an intensified deep learning neural network, so that the intensified deep learning neural network determines a pricing strategy based on the environmental state information; step S230: based on the pricing strategy, a pricing price of the commodity is determined.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As shown in fig. 8, a program product 80 for data processing of charts according to an embodiment of the present invention is depicted, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary computing device

Having described the method, medium, and apparatus of exemplary embodiments of the present invention, a computing device of exemplary embodiments of the present invention is described next with reference to FIG. 9.

The embodiment of the invention also provides the computing equipment. As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible embodiments, a computing device according to the present invention may include at least one processing unit, and at least one memory unit. Wherein the storage unit stores program code that, when executed by the processing unit, causes the processing unit to perform the steps of the method for automatic pricing of merchandise according to various exemplary embodiments of the present invention described in the above section "exemplary method" of the present specification. For example, the processing unit may perform operation S210 as shown in fig. 2: obtaining environmental state information from a plurality of historical data, wherein the environmental state information comprises response information indicating a user's historical pricing behavior; operation S220: inputting the environmental state information into an intensified deep learning neural network, so that the intensified deep learning neural network determines a pricing strategy based on the environmental state information; step S230: based on the pricing strategy, a pricing price of the commodity is determined.

A computing device 90 for automatic pricing of items according to this embodiment of the invention is described below with reference to fig. 9. The computing device 90 shown in FIG. 9 is only one example and should not be taken to limit the scope of use and functionality of embodiments of the present invention.

As shown in fig. 9, computing device 90 is embodied in the form of a general purpose computing device. Components of computing device 90 may include, but are not limited to: the at least one processing unit 901, the at least one memory unit 902, and the bus 903 connecting the various system components (including the memory unit 902 and the processing unit 901).

Bus 903 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.

The storage unit 902 may include readable media in the form of volatile memory, such as a Random Access Memory (RAM)9021 and/or a cache memory 9022, and may further include a Read Only Memory (ROM) 9023.

Storage unit 902 may also include a program/utility 9025 having a set (at least one) of program modules 9024, such program modules 9024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Computing device 90 may also communicate with one or more external devices 904 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with computing device 90, and/or with any devices (e.g., router, modem, etc.) that enable computing device 90 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/0) interface 905. Moreover, computing device 90 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 906. As shown, network adapter 906 communicates with the other modules of computing device 90 via bus 903. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 90, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the system are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An automatic commodity pricing method, comprising:

obtaining environmental state information from a plurality of historical data, wherein the environmental state information comprises response information indicating a user's historical pricing behavior;

inputting the environmental state information into an intensified deep learning neural network, so that the intensified deep learning neural network determines a pricing strategy based on the environmental state information; and

determining a pricing price for the good based on the pricing policy,

wherein the reinforcement deep learning neural network is configured to determine a pricing strategy for the commodity based on the environmental status information and a pricing strategy model, and score the pricing strategy based on a reward model, such that the reinforcement deep learning neural network updates the pricing strategy model according to the score.

2. The method of claim 1, wherein the reinforced deep learning neural network determining the pricing strategy based on the environmental state information and a pricing strategy model comprises: a first determination mode and a second determination mode,

the first determination mode includes:

determining a price adjusting proportion of the pricing price relative to the current price based on the environment state information and a first pricing strategy model;

the second determination mode includes:

the environment state information comprises a plurality of state data, and action vectors of the state data are determined based on the environment state information and a second pricing strategy model, wherein elements in the action vectors correspond to the state data in a one-to-one mode; and performing point multiplication on the state vector formed by the action vector and the plurality of state data, and using a point multiplication result obtained through the point multiplication as a price adjusting proportion of the pricing price relative to the current price.

3. The method of claim 2, further comprising, in the second determination mode:

determining an action vector for the plurality of state data for which the second pricing policy model is in a converged state; and

and deleting the state data corresponding to the elements of which the numerical values are smaller than a preset threshold value in the motion vector from the environment state information so as to update the environment state information.

4. The method of claim 3, wherein the augmented deep learning neural network switches from the second determined mode to the first determined mode in response to the environmental status information being updated.

5. The method of claim 1, further comprising:

obtaining a reward function for calculating an impact of the pricing price on sales of the good;

acquiring the environmental state information characteristics of the commodities priced according to the pricing price;

calculating an actual reward value generated by the pricing price for the sale of the commodity based on the reward function and the environmental status information characteristics; and

inputting the actual reward value and the environmental status information characteristic to the deep learning neural network, such that the deep learning neural network updates the reward model and the pricing strategy model based on the actual reward value.

6. The method of claim 1, wherein the determining a pricing price for the good based on the pricing policy comprises:

acquiring a preset maximum value and a preset minimum value of the commodity price;

determining a predetermined price for the good based on the pricing strategy;

determining to set a pricing price of the commodity as a maximum value of the preset minimum value and the preset price under the condition that the preset price is smaller than the current price of the commodity; and

and determining to set the pricing price of the commodity as the minimum value of the preset maximum value and the preset price when the preset price is larger than the current price of the commodity.

7. The method of claim 1, wherein the reinforcement Deep learning neural network comprises a Deep Deterministic Policy Gradient algorithm module.

8. An automatic merchandise pricing system, comprising:

the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining environment state information from a plurality of historical data, and the environment state information comprises response information indicating a user to historical pricing behaviors;

the reinforced deep learning module is used for inputting the environment state information into a reinforced deep learning neural network so that the reinforced deep learning neural network determines a pricing strategy based on the environment state information; and

a first determination module to determine a pricing price for the good based on the pricing strategy,

wherein the reinforcement deep learning module is configured to determine a pricing strategy for the commodity based on the environmental status information and a pricing strategy model, and score the pricing strategy based on a reward model, such that the reinforcement deep learning neural network updates the pricing strategy model according to the score.

9. A computer-readable medium storing computer-executable instructions for implementing the information processing method of any one of claims 1 to 7 when executed by a processing unit.

10. A computing device, comprising:

a processing unit; and

a storage unit storing computer-executable instructions for implementing the information processing method of any one of claims 1 to 7 when executed by the processing unit.