WO2022160826A1 - 一种页面布局方法及装置 - Google Patents
一种页面布局方法及装置 Download PDFInfo
- Publication number
- WO2022160826A1 WO2022160826A1 PCT/CN2021/127122 CN2021127122W WO2022160826A1 WO 2022160826 A1 WO2022160826 A1 WO 2022160826A1 CN 2021127122 W CN2021127122 W CN 2021127122W WO 2022160826 A1 WO2022160826 A1 WO 2022160826A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- page
- layout
- laid out
- strategy
- reward
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 230000002787 reinforcement Effects 0.000 claims abstract description 49
- 230000006870 function Effects 0.000 claims description 39
- 238000003860 storage Methods 0.000 claims description 19
- 230000001186 cumulative effect Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 12
- 238000012804 iterative process Methods 0.000 claims description 5
- 239000003795 chemical substances by application Substances 0.000 description 30
- 230000009471 action Effects 0.000 description 25
- 230000000875 corresponding effect Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 23
- 230000008569 process Effects 0.000 description 19
- 238000012545 processing Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 7
- 230000007613 environmental effect Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000007704 transition Effects 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 230000003252 repetitive effect Effects 0.000 description 3
- 238000010845 search algorithm Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/38—Creation or generation of source code for implementing user interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
Definitions
- the present application relates to the technical field of artificial intelligence (AI), and in particular, to a page layout method and apparatus.
- AI artificial intelligence
- Page layout is the core link of design work, and it is also the most time-consuming work link for designers in the process of making pages. Page layout needs to deploy page elements with different numbers, shapes, and sizes in a limited number of pages. Some specific rules and certain aesthetics need to be met between page elements and pages, and between page elements and page elements.
- Automated layout is to use computer technology to replace human designers for page layout, and use artificial intelligence technology to meet the requirements of page layout. Automated layout usually does not require trained staff to operate, which can greatly save designers' labor consumption and training time.
- multiple page elements can be deployed on the canvas in a random layout, and then the page after the layout is converted into image data, and then the evaluation model is used to score the image data, and finally the user can select the one with higher score. Page Layout.
- an embodiment of the present application provides a page layout method, the method includes:
- element information of at least one page element to be laid out, and a layout rule use a reinforcement learning algorithm to obtain at least one candidate page layout strategy
- a target page layout strategy is determined from the at least one candidate page layout strategy using an imitation learning algorithm.
- the above method first obtains a candidate page layout strategy through a reinforcement learning algorithm, and then uses an imitation learning algorithm to determine the target page layout strategy, so as to obtain a page layout that is more in line with the layout rules and the user's aesthetic habits.
- the layout rule includes a first priority layout rule and a second priority layout rule, the at least one candidate page layout policy satisfies the first priority layout rule, and The reward obtained under the constraints of the second priority placement rule is greater than a preset reward threshold.
- the layout rules are divided into a first priority layout rule and a second priority layout rule, the first priority layout rule has a higher priority than the second priority layout rule, and all The first priority layout rule is used as a constraint condition when the agent performs actions, and the second priority layout rule is used as the basis for the environmental subject to determine the reward value.
- the generated candidate page layout strategy can be made to fully satisfy the hard rules and satisfy the soft rules as much as possible.
- the at least one candidate page layout strategy is obtained by using a reinforcement learning algorithm based on the page information, the element information of the at least one page element to be laid out, and the layout rule, including: :
- Step 1 According to the page information and the element information of the first to-be-layout page element, according to the layout rule, the first to-be-layout page element is laid out on the page, and the page status of the page is generated ;
- Step 2 according to the page state and the element information of the next page element to be laid out, according to the layout rule, determine the layout position of the next page element to be laid out in the page;
- Step 3 determining the page state and the reward value corresponding to the layout position, and updating the page state;
- Steps 2 and 3 are iterated until the at least one page element to be laid out is laid out on the page, and a page layout strategy is generated;
- the reward value determined in each iteration is accumulated, and the page layout strategy for which the accumulated reward value satisfies the preset condition is used as the candidate page layout strategy.
- the decision-making process of page layout is modeled as a reinforcement learning decision-making process, and in each iteration process, the layout rule only constrains the layout position of one page element to be laid out, so that the final candidate page layout strategy can be completely The layout rules are satisfied.
- step 2 and step 3 are iterated until the at least one page element to be laid out is laid out on the page, and a page layout strategy is generated, including:
- the steps 2 and 3 are continued to iterate until the at least one page element to be laid out is laid out on the page, and a page layout policy is generated.
- the Monte Carlo tree search algorithm is used in the reinforcement learning process to reduce the search space of the page layout, which can greatly reduce the search complexity when there are many page elements to be laid out.
- a page layout strategy with a higher reward value can be obtained while reducing the search space.
- the use of an imitation learning algorithm to determine a target page layout strategy from the at least one candidate page layout strategy includes:
- the score of each candidate page layout strategy is determined according to the reward function obtained by pre-training the imitation learner, wherein the reward function is determined by the ranking loss of positive samples and negative samples used to train the imitation learner, and the positive samples include Page layout strategy that conforms to user aesthetics;
- a target page layout strategy is determined from the at least one page layout strategy layout according to the score.
- the imitation learning algorithm is used to learn the page layout strategy of the human expert, so that the aesthetics of the imitation learner is more similar to the aesthetics of the human expert.
- the method further includes:
- the adjusted target page layout strategy is used as a positive sample for training the imitation learner, and the imitation learner is trained to obtain an optimized reward function.
- a mechanism for imitating the interaction between the learner and the user is provided. If the optimal page layout strategy output by the learner is imitated, the user feels dissatisfied and makes adjustments. Then, the imitation learner can also learn the adjusted page layout strategy, thereby optimizing the reward function, and can continuously optimize its own performance according to the user's preference, and enhance the imitation learner's ability to continuously enhance learning.
- the acquiring page information, element information of at least one page element to be laid out, and layout rules include:
- Receive page information configured by the user in the user interface, element information of at least one page element to be laid out, and layout rules.
- the user in the interactive interface between the page layout and the user, the user can configure various information in the interactive interface.
- the at least one page element to be laid out is set to be laid out on the page in descending order of size.
- the at least one page element to be laid out is laid out in descending order of size, which can improve the layout efficiency and speed up the convergence speed of the reinforcement learning algorithm.
- the layout of the first page element to be laid out on the page according to the layout rule includes:
- the size of the grid is set according to the size of the smallest element in the at least one page element
- the first page element to be laid out is laid out on the page according to the layout rule, and at least one vertex of the minimum bounding box of the first page element to be laid out coincides with the vertex of the grid.
- the grid layout method can make it easier to align the elements of the page to be laid out, and also speed up the efficiency of the page layout.
- the page state includes a non-layoutable area and a layoutable area in the page, wherein a grid corresponding to the non-layoutable area is provided with a shielding flag.
- the determining of the page state and the reward value corresponding to the layout position includes:
- the soft rules are quantified as reward values, so that the generated layout strategy satisfies the soft rules as much as possible.
- the determining the page state and the reward value corresponding to the layout position includes:
- the reward value is determined based on the standard deviation.
- the soft rules are quantified as reward values, so that the generated layout strategy satisfies the soft rules as much as possible.
- an embodiment of the present application provides a page layout device, characterized in that it includes:
- an initial information acquisition module used to acquire page information, element information of at least one page element to be laid out, and layout rules
- a reinforcement learning module configured to obtain at least one candidate page layout strategy by using a reinforcement learning algorithm based on the page information, the element information of the at least one page element to be laid out, and the layout rule;
- the imitation learning module is used for determining the target page layout strategy from the at least one candidate page layout strategy by using the imitation learning algorithm.
- the layout rule includes a first priority layout rule and a second priority layout rule, the at least one candidate page layout policy satisfies the first priority layout rule, and The reward obtained under the constraints of the second priority placement rule is greater than a preset reward threshold.
- the reinforcement learning module is specifically used for:
- Step 1 According to the page information and the element information of the first to-be-layout page element, according to the layout rule, the first to-be-layout page element is laid out on the page, and the page status of the page is generated ;
- Step 2 according to the page state and the element information of the next page element to be laid out, according to the layout rule, determine the layout position of the next page element to be laid out in the page;
- Step 3 determining the page state and the reward value corresponding to the layout position, and updating the page state;
- Steps 2 and 3 are iterated until the at least one page element to be laid out is laid out on the page, and a page layout strategy is generated;
- the reward value determined in each iteration is accumulated, and the page layout strategy for which the accumulated reward value satisfies the preset condition is used as the candidate page layout strategy.
- the reinforcement learning module is further used for:
- the steps 2 and 3 are continued to iterate until the at least one page element to be laid out is laid out on the page, and a page layout policy is generated.
- the imitation learning module is specifically used for:
- the score of each candidate page layout strategy is determined according to the reward function obtained by pre-training the imitation learner, wherein the reward function is determined by the ranking loss of positive samples and negative samples used to train the imitation learner, and the positive samples include Page layout strategy that conforms to user aesthetics;
- a target page layout strategy is determined from the at least one page layout strategy layout according to the score.
- the imitation learning module is further used for:
- the adjusted target page layout strategy is used as a positive sample for training the imitation learner, and the imitation learner is trained to obtain an optimized reward function.
- the initial information acquisition module is specifically used for:
- Receive page information configured by the user in the user interface, element information of at least one page element to be laid out, and layout rules.
- an embodiment of the present application provides a computing device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above-mentioned first when executing the instructions Aspect or one or more page layout methods in multiple possible implementation manners of the first aspect.
- embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, characterized in that, when the computer program instructions are executed by a processor, the above-mentioned first aspect is implemented Or one or more page layout methods in multiple possible implementation manners of the first aspect.
- embodiments of the present application provide a computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in an electronic
- the processor in the electronic device executes the first aspect or one or more of the page layout methods in the multiple possible implementation manners of the first aspect.
- FIG. 1 shows a schematic structural diagram of a page layout apparatus 100 according to an embodiment of the present application.
- FIG. 2 shows a schematic diagram of a page layout scenario according to an embodiment of the present application.
- FIG. 3 shows a schematic flowchart of a page layout method according to an embodiment of the present application.
- FIG. 4 shows a schematic structural diagram of a Markov model according to an embodiment of the present application.
- FIG. 5 shows a schematic flowchart of generating a page layout strategy using a reinforcement learning algorithm according to an embodiment of the present application.
- FIG. 6 shows a schematic diagram of a scenario according to an embodiment of the present application.
- FIG. 7 shows a schematic diagram of a scenario according to an embodiment of the present application.
- FIG. 8 shows a schematic diagram of a scenario according to an embodiment of the present application.
- FIG. 9 shows a schematic diagram of a scenario according to an embodiment of the present application.
- FIG. 10 shows a schematic flowchart of generating a page layout strategy by using a reinforcement learning algorithm according to an embodiment of the present application.
- FIG. 11 shows a schematic structural diagram of another page layout apparatus 100 provided according to an embodiment of the present application.
- FIG. 12 shows a schematic diagram of a module structure of a computing device 1200 according to an embodiment of the present application.
- the core of page layout is to arrange multiple page elements with different numbers, shapes and sizes on a limited page.
- To measure whether a page layout is an excellent page layout it is necessary to judge whether the page layout meets the following conditions: first, whether it meets the hard rules; second, whether it meets the soft rules; third, whether it conforms to human aesthetics or conforms to user usage habit etc.
- the rigid rules may include basic rules to be followed by page layout, for example, different page elements do not overlap, and page elements are not placed in the edge area of the page.
- the soft rules may include rules that do not have to be followed but can improve the aesthetics of the page layout, for example, including page elements being aligned in the horizontal and vertical directions, uniform spacing between page elements, and the like.
- the embodiments of the present application provide a page layout method.
- the method can obtain at least one candidate page layout strategy by using a reinforcement learning algorithm based on page information of the page, element information of at least one page element to be laid out, and layout rules. Then, a target page layout strategy is determined from the at least one candidate page layout strategy using an imitation learning algorithm.
- the layout rules may include first-priority layout rules and second-priority layout rules, the first-priority layout rules may include hard rules, and the second-priority rules may include soft rules.
- the first priority layout rule can be used as the action constraint condition of reinforcement learning, so that the generated candidate page layout policy completely satisfies the hard rules; on the other hand, the second priority The layout rules are used as the reward function of reinforcement learning, so that the generated candidate page layout strategies satisfy the soft rules as much as possible.
- the imitation learner can be trained to continuously interact with the user and learn the user's experience knowledge, so that the page layout obtained by the imitation learner decision is more in line with the user's aesthetics and habits. It can be seen that the target page layout strategy obtained by using this method can not only fully satisfy the hard rules, but also satisfy the soft rules as much as possible, but also more in line with the user's aesthetics and habits, meeting the above requirements for excellent page layout.
- the page layout method provided by the embodiment of the present application can be applied to various application scenarios such as graphic design, typesetting of printed matter, web page design, typesetting of industrial design drawings, typesetting of PPT, self-media typesetting, and chip back-end design.
- the embodiment of the present application provides an exemplary application scenario, and the scenario may include a page layout apparatus 100 , and the page layout apparatus 100 may include a reinforcement learning module 101 and an imitation learning module 103 .
- the user may configure page information, element information of at least one page element to be laid out, and layout rules in the user interface.
- the page information may include page size, background color, background image, etc.
- the page elements to be laid out may include images, videos, texts, or a combination of any of the above elements, and the element information may include page elements to be laid out. dimensions, RGBA channel values of image pixels (for bitmaps), descriptions of vector elements (for vector graphics), source of the image or video, text content, etc.
- the layout rules may include first priority rules (ie hard rules) and second priority rules (ie soft rules) and the like.
- a first priority rule and a second priority rule matching the application scenario may be set, which is not limited in this application.
- Users can also configure other information such as the cumulative reward threshold of reinforcement learning, the threshold of policy value and other information in the user interface.
- FIG. 2 shows the internal execution flow of the reinforcement learning module 101 and the imitation learning module 103 .
- a reinforcement learning model environment can be constructed first, for example, a Markov bureau decision model can be constructed, which specifically includes the definition of parameters such as state, action, reward, and state transition of the model.
- a page layout strategy may be generated based on the constructed reinforcement learning model environment and according to the page information, element information and layout rules provided by the user.
- step 209 the page layout strategy that satisfies the reward threshold is used as the candidate page layout strategy; otherwise, in step 207, layout rules (including hard rules and soft rules) can be used. rules) to generate the next page layout strategy.
- layout rules including hard rules and soft rules
- a reward function can be used to determine the score of the at least one candidate page layout strategy 105, and according to the score, at least one page layout strategy that is most in line with human aesthetics is determined. Then, in step 213, the determined at least one page layout strategy may be sent to the user to determine whether it conforms to the user's aesthetics. If it does not conform to the user's aesthetics, the user may adjust the page layout strategy, so that the adjusted page layout strategy conforms to the user's aesthetics.
- the imitation learning module 103 in step 215 may receive the user's adjustment to the page layout strategy, and in step 217, learn the adjusted page layout strategy, so that the imitation learning module 103 is more in line with the user's aesthetics. Through the iteration of steps 211-217, a page layout strategy 107 that conforms to the user's aesthetics can be finally generated.
- FIG. 3 an embodiment of the page layout method provided by the present application is shown in FIG. 3 .
- the method may be executed by the aforementioned page layout apparatus 100 , and the method may include:
- S301 Acquire page information, element information of at least one page element to be laid out, and a layout rule.
- the page information of the page can be obtained.
- the page information may include the size of the page (including width and height), the position description information of the layoutable area and the non-layoutable area of the page, the background information of the page (including background color, background transparency, background image) and so on.
- the page elements to be laid out may include images, videos, texts, page controls, or a combination of any of the foregoing elements.
- the element information may include the size of the page element to be laid out, for example, the size of an image is 640 pixels ⁇ 480 pixels. When the page element to be laid out has an irregular shape, the size of the smallest circumscribed rectangle of the page element to be laid out may be used as the size of the page element to be laid out.
- the element information may also include RGBA channel values of image pixels (for bitmaps), descriptions of vector elements (for vector graphics), styles of video playback windows, sources of images or videos, and the like.
- the element information may also include the size, style, text content, text style and the like of the text box.
- the layout rule may be used as a rule to be followed when using reinforcement learning for page layout. Due to the specific reinforcement learning algorithm involved, it is impossible to apply layout rules to page layouts simply by using text.
- the layout rules may be quantified.
- the hard rule it can be determined whether two page elements overlap or whether the page element exceeds the range of the layoutable area according to the coordinate values of each side in the minimum circumscribed rectangle of the page element.
- whether the page elements are aligned can be determined according to the distance values between the page elements and the four sides of the page, and then according to the number of repeated distance values along the four directions in the page.
- MDP Markov Decision Process
- FIG. 4 is a schematic diagram of the MDP model provided by the present application.
- the MDP involves two interacting subjects, an agent (Agent) and the environment, wherein the Agent is the subject making decisions, and the environment is the information feedback main body.
- Agent is the subject making decisions
- environment is the information feedback main body.
- MDP can be represented by a quadruple ⁇ S,A,R,T>, where,
- S is the state space (State Space), which can contain the set of environmental states that the Agent may perceive;
- A is the Action Space, which can contain the set of actions that the Agent can take on each environmental state;
- R is the reward function (Reward Function), R(s, a, s') can represent the reward obtained by the agent from the environment when the action a is performed on the state s and transferred to the state s';
- T is the environmental state transition function (State Transition Function), and T(s, a, s') can represent the probability of executing action a on state s and transferring to state s'.
- the Agent perceives that the environmental state at time t is s t , and based on the environmental state s t , the Agent can select the action a t from the action space A to execute; After the environment receives the action selected by the agent, it feeds back the corresponding reward signal r t+1 to the agent, and transfers to the new environment state s t+1 , and waits for the agent to make a new decision.
- the agent's goal is to find an optimal policy ⁇ * , so that ⁇ * can obtain the maximum long-term cumulative reward in any state s and any time step t.
- ⁇ * can be defined as formula (1):
- ⁇ represents a certain strategy of the Agent (that is, the probability distribution from state to action)
- E ⁇ represents the expected value under the policy ⁇
- ⁇ is the discount rate (Discount Rate)
- k is the future time step
- r t+k represents the Agent in the Instant reward earned at time step (t+k).
- the reinforcement learning model environment in the embodiments of the present application may be constructed based on the above MDP model. Specifically, it can be set to:
- State s t the state of the page at time t, which can include the position description of the layoutable area and the non-layoutable area;
- Action at the layout position of the page element to be laid out in the page at time t , which can be expressed as the position (x t , y t ) of the upper left corner vertex of the minimum circumscribed matrix of the page element to be laid out in the page;
- s t ) the probability that the next state s t+1 occurs after the action a t is performed;
- Reward rt Convert the selected soft rules into rewards in a quantitative way.
- an agent can act as an object to implement page layout decisions.
- the agent perceives the page status, it can perform corresponding actions according to the page status under the constraints of layout rules (mainly including hard rules), and the executed actions can include the layout of the page elements to be laid out in the page.
- layout rules mainly including hard rules
- the executed actions can include the layout of the page elements to be laid out in the page.
- the environment subject can give the agent a reward signal according to the layout rules (mainly including soft rules), and transfer to a new page state.
- FIG. 5 shows a schematic flowchart of the method.
- the element information of the at least one page element to be laid out, and the layout rule, using a reinforcement learning algorithm to obtain at least one candidate page layout strategy includes:
- the page may be laid out in a manner of laying out one page element to be laid out at a time.
- the layout rule can only constrain the layout position of one page element to be laid out, so as to realize the final candidate page layout
- the policy can fully satisfy the layout rules.
- the at least one page element to be laid out may be laid out in descending order of size. As shown in the upper figure of Figure 6, a total of 5 page elements to be laid out need to be laid out on the page. After sorting these 5 page elements to be laid out in descending order of size, the layout sequence of the lower figure in Figure 6 is shown.
- the five page elements to be laid out can be laid out in the order of 4-3-1-5-2.
- Arranging the at least one page element to be laid out in descending order of size can improve the layout efficiency and speed up the convergence speed of the reinforcement learning algorithm.
- the at least one page element to be laid out may also be laid out in any other order, which is not limited in this application.
- the first page element to be laid out may be placed according to the layout rule according to the page information and the element information of the first page element to be laid out. Layout in the page, and generate the page state of the page.
- a page 701 is obtained, and the five page elements to be laid out shown in FIG. 6 are laid out on the page 701 .
- the to-be-layout page element 4 is taken as the first to-be-layout page element, and the layout of the to-be-layout page element 4 in the page 701 needs to be constrained by the layout rule.
- the layout rules may include, for example, that different page elements to be laid out do not overlap, the page elements to be laid out do not occupy non-layoutable areas in the page, and the spacing of page elements to be laid out is uniform, and so on.
- the layout rules include maintaining 3 cm margins left and right and 5 cm top and bottom in page 701 . Based on this, in the process of setting the page elements to be laid out on the page 701, it is necessary to obtain the distances of the page elements to be laid out from the upper, lower, left and right sides of the page 701.
- FIG. 7 shows the page elements to be laid out. 4 and a schematic diagram of the distance between the four sides.
- both d1 and d3 are greater than or equal to 3 cm, and both d2 and d4 are greater than or equal to 5 cm, it is determined that the layout of the page element 4 to be laid out satisfies the layout rules.
- the page 701 may also be divided into multiple grids 703 .
- the size of the grid 703 is set according to the size of the smallest element in the at least one page element to be laid out. For example, among the five to-be-layout page elements, the to-be-layout page element 2 has the smallest size. Therefore, the side length of the grid 703 can be set to be equal to the minimum side length of the to-be-layout page element 2 .
- at least one vertex of the bounding box of the page element to be laid out may be coincident with the vertex of the grid. As shown in FIG.
- the upper left corner vertex of the page element 4 to be laid out is set to coincide with one of the grid vertices.
- the grid layout method can make it easier to align the elements of the page to be laid out, and also speed up the efficiency of the page layout.
- the size of the grid 703 can also be set to any other value, such as a fixed value, a minimum page margin, etc., which is not limited in this application.
- the page state of the page 701 can be generated.
- the page state may include non-layoutable areas and layoutable areas in the page 701 .
- the non-layoutable area and the layoutable area may be described by means of coordinates, for example, may be described by multiple key points.
- a shielding mark may be set on the unlayoutable area, and the shielding mark is used to indicate that the page element cannot be laid out at the corresponding position.
- the masking flag can be set to "1", and other layoutable areas are set to "0" by default.
- the page state of the page 701 can be described by using a matrix with a size of M ⁇ N. Using the combination of grid and matrix to describe the page status can not only simplify the description of the page status, but also reduce the search space for subsequent layouts, speed up the search, and improve the layout efficiency.
- S503 According to the page status and the element information of the next page element to be laid out, determine the layout position of the next page element to be laid out on the page according to the layout rule.
- the agent in the MDP model can obtain the above-mentioned page state, and perform the following actions according to the page state and the element information of the next page element to be laid out: determine the next page element according to the layout rule.
- the layout position of a page element to be laid out in the page For example, in FIG. 6 , the next page element to be laid out is page element 3 to be laid out.
- the page element 3 to be laid out can be laid out in the same way as in the above-mentioned embodiment. on page 701.
- S505 Determine the page state and the reward value corresponding to the layout position, and update the page state.
- the environment subject in the MDP model can obtain the action performed by the agent (ie, the layout position), and determine the page state and the reward value corresponding to the action.
- the reward value may be determined according to layout rules, for example, soft rules may be quantified as reward values. In the example shown in FIG.
- the distance values between the page elements already laid out and the four sides of the page 701 can be obtained respectively, for example, the distance between the page element 4 and the four sides
- the values are d1, d2, d3, and d4, respectively, and the distances between page element 3 and the four sides are d5, d6, d7, and d8, respectively, and count the number of distance values that are not repeated among the above distance values.
- the reward value corresponding to the page state and the action is determined. The more repeated distance values, the more aligned the page layout is. Then, the reward value can be calculated using the following formula:
- n is the number of layout page elements
- n u , n d , n l , and n r are the number of non-repetitive distance values along four directions of up, down, left, and right in the page 701 .
- d2 and d6 are repeated distance values
- the soft rule of uniform spacing between page elements can also be quantified as a reward value.
- the set of spacings (A h , A v ) between each laid out page element and the adjacent laid out page elements in the horizontal and vertical directions can be obtained, and each laid out page element can be obtained to the four sides of the page 701
- the set of distances (A u , A d , A l , A r ) can be calculated, the larger the standard deviation, the more uneven the spacing of the laid out page elements.
- the reward value can be calculated using the following formula:
- std(*) represents the standard deviation operation.
- the soft rules that can be used are not limited to the above examples.
- soft rules related to application scenarios can be set.
- a higher reward value can be set when the touch components are arranged on the right side of the page.
- the reward values contributed by different soft rules are not the same.
- the weights of the reward values quantified by different soft rules can be set, and the final reward value can be determined according to the weights.
- the reward value for example, the determined final reward value can be expressed as:
- ⁇ is the weight of the reward value quantized according to the alignment of page elements
- ⁇ is the weight of the reward value quantized according to the uniform spacing of page elements.
- the layoutable area and the non-layoutable area in the page 701 have changed.
- the environment subject can also update the page state, and calculate the calculated The reward value and the updated page state are sent to the agent.
- S507 Iterate S503 and S505 until the at least one page element to be laid out is laid out on the page, and a page layout policy is generated.
- the page layout strategy may include the layout positions of each to-be-layout page element in the page 701 .
- the number of page elements to be laid out for each iterative layout is not limited to one. In the case of a large number of page elements to be laid out, two or more page elements to be laid out can also be laid out each iteration. , this application does not limit it.
- the reward value determined in each iteration can be accumulated, and the corresponding page layout strategy when the accumulated reward value satisfies the preset condition is used as the candidate page layout strategy.
- the preset condition may include, for example, that the accumulated reward value is greater than a preset reward threshold.
- S305 Use an imitation learning algorithm to determine a target page layout strategy from the at least one candidate page layout strategy.
- Fig. 9 shows a flow chart of an implementation scenario.
- the imitation learner 901 can be obtained by training with a positive sample set 903 and a negative sample set 905.
- the positive sample set 903 can include many samples that conform to the user's aesthetics or habits.
- page layout strategy samples the users may include designers, ordinary consumers, etc.
- the negative sample set 905 may include page layout strategy samples obtained by using a reinforcement learning algorithm corresponding to the positive samples.
- the reward function of the imitation learner 901 may be determined by using the ranking loss between positive samples and negative samples.
- the set of state-action pairs D agent ⁇ (s j , a j ) ⁇ can also be obtained from the negative samples, and the set of state-action pairs D agent includes intelligence in the process of generating page layout strategies using reinforcement learning algorithms A complete history of the state and actions of the body.
- the imitation learner 901 can use D human and D agent to train to obtain the reward function r(s, a), so that the reward obtained by D human is greater than or equal to the reward obtained by D agent , that is to say, the reward obtained by the positive sample is always ranked in the The front of the reward obtained by the negative sample.
- the imitation learner 901 can use the state si as input data, and use the action a i corresponding to si as a label to perform imitation learning, and each state-action pair (s i , a i ) can be obtained A reward, then the reward obtained by D human is the cumulative reward obtained by the state action.
- the imitation learner 901 may include a multi-layer feedforward neural network, such as a convolutional neural network, which is not limited in this application.
- the at least one candidate page layout strategy 105 can be scored respectively, and one or more target pages with the highest score can be determined. Layout strategy 107. The one or more target page layout strategies 107 may then be presented to the user 905 . If the user 905 is satisfied with the one or more target page layout strategies 107 displayed, the one or more target page layout strategies 107 may be directly utilized.
- the adjusted target page layout strategy can be used as a positive sample, and the imitation learner 901 can be trained to learn the adjusted target page layout strategies
- the target page layout strategy is obtained, and an optimized reward function is obtained, so that the aesthetics of the imitation learner 901 is closer to the aesthetics of the user 907 .
- the positive sample set 901 and the negative sample set 903 may both be in image format.
- the candidate page layout strategy 105 may be converted into an image format, which is consistent with the format of the training data.
- the layout rules can be divided into first priority layout rules and second priority layout rules.
- the first priority placement rules may include hard rules
- the second priority placement rules may include soft rules.
- the at least one candidate page layout strategy satisfies the first priority layout rule, and the reward obtained under the constraints of the second priority layout rule is greater than a preset reward threshold.
- the first priority layout rule may be used as a constraint condition when the agent performs an action
- the second priority layout rule may be used as a basis for the environmental agent to determine the reward value
- the search space of the page layout strategy can be reduced by the Monte Carlo tree search algorithm, and the layout efficiency can be improved.
- it may include:
- S1001 Determine the current cumulative reward value, and determine the strategy value of the current page layout strategy according to the cumulative reward value.
- S1003 Determine whether the policy value of the current page layout policy is greater than or equal to a preset value threshold.
- each iteration in the reinforcement learning stage, can obtain a reward value, and according to the obtained reward value, the cumulative reward value of this iteration can be determined.
- the strategy value of the current page layout strategy (which may also be the current page state) can be determined.
- the relationship between the strategy value and the accumulated reward can be expressed as:
- m represents the current number of iterations, express The state-action pair is rewarded.
- step 507 may be continued.
- the current page layout policy can be excluded, and the process returns to step 501 to start a new one Layout.
- the Monte Carlo tree search algorithm is used in the reinforcement learning process to reduce the search space of the page layout, which can greatly reduce the search complexity when there are many page elements to be laid out.
- a page layout strategy with a higher reward value can be obtained while reducing the search space.
- the page layout apparatus 100 may include:
- an initial information acquisition module 1101, configured to acquire page information, element information of at least one page element to be laid out, and layout rules
- a reinforcement learning module 101 configured to obtain at least one candidate page layout strategy by using a reinforcement learning algorithm based on the page information, the element information of the at least one page element to be laid out, and the layout rule;
- the imitation learning module 103 is configured to use an imitation learning algorithm to determine a target page layout strategy from the at least one candidate page layout strategy.
- the layout rule includes a first priority layout rule and a second priority layout rule, and the at least one candidate page layout policy satisfies the first priority layout rule, And the reward obtained under the constraints of the second priority layout rule is greater than the preset reward threshold.
- the reinforcement learning module is specifically used for:
- Step 1 According to the page information and the element information of the first to-be-layout page element, according to the layout rule, the first to-be-layout page element is laid out on the page, and the page status of the page is generated ;
- Step 2 according to the page state and the element information of the next page element to be laid out, according to the layout rule, determine the layout position of the next page element to be laid out in the page;
- Step 3 determining the page state and the reward value corresponding to the layout position, and updating the page state;
- Steps 2 and 3 are iterated until the at least one page element to be laid out is laid out on the page, and a page layout strategy is generated;
- the reward value determined in each iteration is accumulated, and the page layout strategy for which the accumulated reward value satisfies the preset condition is used as the candidate page layout strategy.
- the reinforcement learning module is further used for:
- the steps 2 and 3 are continued to iterate until the at least one page element to be laid out is laid out on the page, and a page layout policy is generated.
- the imitation learning module is specifically used for:
- the score of each candidate page layout strategy is determined according to the reward function obtained by pre-training the imitation learner, wherein the reward function is determined by the ranking loss of positive samples and negative samples used to train the imitation learner, and the positive samples include Page layout strategy that conforms to user aesthetics;
- a target page layout strategy is determined from the at least one page layout strategy layout according to the score.
- the imitation learning module is also used for:
- the adjusted target page layout strategy is used as a positive sample for training the imitation learner, and the imitation learner is trained to obtain an optimized reward function.
- the initial information acquisition module is specifically used for:
- Receive page information configured by the user in the user interface, element information of at least one page element to be laid out, and layout rules.
- the page layout apparatus 100 may correspond to executing the methods described in the embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the page layout apparatus 100 are for the purpose of realizing FIG. 2 , FIG. 3 , For the sake of brevity, the corresponding flow of each method in FIG. 5 and FIG. 10 is not repeated here.
- connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- An embodiment of the present application further provides a device 1200 for implementing the functions of the page layout apparatus 100 in the system architecture diagram shown in FIG. 1 above.
- the device 1200 may be a physical device or a cluster of physical devices, or a virtualized device, such as at least one cloud virtual machine in a cloud computing cluster.
- the present application illustrates the structure of the device 1200 as an example.
- FIG. 12 provides a schematic structural diagram of a device 1200 .
- the device 1200 includes a bus 1201 , a processor 1202 , a communication interface 1203 and a memory 1204 . Communication between the processor 1202 , the memory 1204 and the communication interface 1203 is through the bus 1201 .
- the bus 1201 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
- the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is shown in FIG. 12, but it does not mean that there is only one bus or one type of bus.
- the communication interface 1203 is used for external communication. For example, the page information or the information of the at least one page element to be laid out is obtained, and so on.
- the processor 1202 may be a central processing unit (central processing unit, CPU).
- Memory 1204 may include volatile memory, such as random access memory (RAM).
- RAM random access memory
- Memory 1204 may also include non-volatile memory, such as read-only memory (ROM), flash memory, HDD, or SSD.
- Executable code is stored in the memory 1204, and the processor 1202 executes the executable code to perform the aforementioned page layout method.
- each module of the page layout apparatus 100 described in the embodiment of FIG. 1 is implemented by software
- the software or program code required for the functioning of the learning module 103 is stored in the memory 1204 .
- the processor 1202 executes the program code corresponding to each module stored in the memory 1204, such as the program code corresponding to the reinforcement learning module 101 and the imitation learning module 103, to obtain at least one candidate page layout strategy, and from the at least one candidate page layout strategy Determine the target page layout strategy in
- the processor 1202 may also execute program codes corresponding to the initial information acquisition module 1101 as described in FIG. 11 .
- Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the above method.
- Embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
- a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Electrically Programmable Read-Only-Memory, EPROM or flash memory), static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD - ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punch cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing .
- the computer readable program instructions or code described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network and/or a wireless network.
- the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
- the computer program instructions used to perform the operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
- the remote computer may be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, may be connected to an external computer (eg, use an internet service provider to connect via the internet).
- electronic circuits such as programmable logic circuits, Field-Programmable Gate Arrays (FPGA), or Programmable Logic Arrays (Programmable Logic Arrays), are personalized by utilizing state information of computer-readable program instructions.
- Logic Array, PLA the electronic circuit can execute computer readable program instructions to implement various aspects of the present application.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
- These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
- Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
- the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in hardware (eg, circuits or ASICs (Application) that perform the corresponding functions or actions. Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented with a combination of hardware and software, such as firmware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Document Processing Apparatus (AREA)
Abstract
本申请涉及一种页面布局方法及装置。所述方法包括:获取页面信息、至少一个待布局页面元素的元素信息以及布局规则;基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略;利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。
Description
本申请要求于2021年01月28日提交中国专利局、申请号为202110118584.2、发明名称为“一种页面布局方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能(artificial intelligence,AI)技术领域,尤其涉及一种页面布局方法及装置。
页面布局是设计工作的核心环节,也是设计人员在制作页面过程中耗时最大的工作环节。页面布局需要在有限的页面中部署数量不等、形状不同、尺寸不同的页面元素,页面元素和页面之间、页面元素和页面元素之间还需要满足一些特定的规则以及一定的美观度。
近年来,随着人工智能技术的发展,使用机器学习来替代人工已逐渐成为页面布局发展的重点。自动化布局即利用计算机技术替代人类设计师进行页面布局,使用人工智能技术达到页面布局的要求。自动化布局通常不需要训练有素的工作人员进行操作,从而能够大大节省设计师的人力消耗和培训时间。相关技术中,可以采用随机布局的方式将多个页面元素部署于画布中,然后将布局完成的页面转换成图像数据,再利用评估模型对图像数据进行评分,最后用户可以从中选取评分较高的页面布局。从相关技术中自动生成页面布局的方式可见,随机布局的方式很难生成符合规则的页面,尤其随着页面元素增多、页面尺寸增大,很容易生成页面元素交叠、页面元素部署至边界区域内的不符合规则的页面布局。
因此,相关技术中亟需一种符合布局规则的自动化页面布局方式。
发明内容
有鉴于此,提出了一种页面布局方法及装置。
第一方面,本申请的实施例提供了一种页面布局方法,所述方法包括:
获取页面信息、至少一个待布局页面元素的元素信息以及布局规则;
基于页面信息、至少一个待布局页面元素的元素信息、布局规则,利用强化学习算法获取至少一个候选页面布局策略;
利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。
上述方法基于布局规则先通过强化学习算法获取候选页面布局策略,再利用模仿学习算法确定目标页面布局策略,可以获得更加符合布局规则和用户的审美习惯的页面布局。
根据第一方面第一种可能的实现方式,所述布局规则包括第一优先级布局规则和第二优先级布局规则,所述至少一个候选页面布局策略满足所述第一优先级布局规则,且在所述第二优先级布局规则的约束下获取的奖赏大于预设奖赏阈值。
本实施例中,将所述布局规则划分为第一优先级布局规则和第二优先级布局规则,所述第一优先级布局规则的优先级大于所述第二优先级布局规则,可以将所述第一优先级布局规则作为智能体实施动作时的约束条件,将所述第二优先级布局规则作为所述环境主体确定奖赏值的依据。通过上述方式,可以使得生成的候选页面布局策略完全满足硬性规则,并尽可 能地满足软性规则。
根据第一方面第二种可能的实现方式,所述基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略,包括:
步骤1,根据所述页面信息和第一个待布局页面元素的元素信息,按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,并生成所述页面的页面状态;
步骤2,根据所述页面状态以及下一个待布局页面元素的元素信息,按照所述布局规则确定所述下一个待布局页面元素在所述页面中的布局位置;
步骤3,确定所述页面状态及所述布局位置所对应的奖赏值,并更新所述页面状态;
迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略;
累计每次迭代所确定的奖赏值,将累计得到的奖赏值满足预设条件的页面布局策略作为候选页面布局策略。
本申请实施例中,将页面布局的决策过程建模为强化学习决策过程,在每次迭代过程中通过布局规则只约束一个待布局页面元素的布局位置,实现最终完成的候选页面布局策略能够完全满足所述布局规则。
根据第一方面第三种可能的实现方式,所述迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略,包括:
计算已迭代的过程中每次页面状态及每次布局位置所对应的奖赏值的累计值,并根据所述奖赏值的累计值确定当前页面布局策略的策略价值;
在确定所述策略价值不小于预设价值阈值的情况下,继续迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略。
本申请实施例中,在强化学习过程中利用蒙特卡洛树搜索算法减小页面布局的搜索空间,可以在待布局页面元素较多的情况下大大降低搜索复杂度。另外,通过价值函数预估中间结点的长期收益,能够在减小搜索空间的同时获取到奖赏值更高的页面布局策略。
根据第一方面第四种可能的实现方式,所述利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略,包括:
根据预先训练模仿学习器得到的奖励函数确定每个候选页面布局策略的评分,其中,所述奖励函数由训练所述模仿学习器采用的正样本和负样本的排序损失确定,所述正样本包括符合用户审美的页面布局策略;
根据所述评分从所述至少一个页面布局策略布局中确定目标页面布局策略。
本申请实施例中,利用模仿学习算法学习到人类专家的页面布局策略,使得所述模仿学习器的审美与人类专家的审美更加相似。
根据第一方面第五种可能的实现方式,在根据所述评分从所述至少一个页面布局策略布局中确定目标页面布局策略之后,所述方法还包括:
发送所述目标页面布局策略;
接收用户对所述目标页面布局策略的调整;
将所述调整后的所述目标页面布局策略作为训练所述模仿学习器的正样本,训练所述模仿学习器,获得优化的奖励函数。
本申请实施例中,提供了模仿学习器与用户之间交互的机制,如果模仿学习器输出的最优页面布局策略,用户觉得不满意,且做了调整。那么,模仿学习器还可以对做了调整的页面布局策略进行学习,从而优化奖励函数,能够根据用户的偏好持续优化自身性能,增强模仿学习器持续增强学习的能力。
根据第一方面第六种可能的实现方式,所述获取页面信息、至少一个待布局页面元素的元素信息以及布局规则,包括:
接收用户在用户界面中配置的页面信息、至少一个待布局页面元素的元素信息以及布局规则。
本申请实施例中,页面布局与用户之间的交互接口,用户可以在交互接口中配置各种信息。
根据第一方面第七种可能的实现方式,所述至少一个待布局页面元素被设置为按照尺寸由大到小的顺序布局于所述页面中。
本申请实施例中,将所述至少一个待布局页面元素按照尺寸从大到小的顺序布局,可以提升布局效率,加快强化学习算法的收敛速度。
根据第一方面第八种可能的实现方式,所述按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,包括:
将所述页面划分成多个网格,所述网格的尺寸根据所述至少一个页面元素中最小元素的尺寸设置;
按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,且所述第一个待布局页面元素的最小外接框的至少一个顶点与所述网格的顶点重合。
本申请实施例中,通过网格化的布局方式,可以使得待布局页面元素之间更容易对齐,同时也会加快页面布局的效率。
根据第一方面第九种可能的实现方式,所述页面状态包括页面中的不可布局区域和可布局区域,其中,所述不可布局区域对应的网格设置有屏蔽标识。
本申请实施例中,通过上述利用网格化和矩阵相结合的方式描述页面状态,不仅可以简化页面状态的描述,还可以减小后续布局的搜索空间,加快搜索速度,提升布局效率。
根据第一方面第十种可能的实现方式,所述确定所述页面状态及所述布局位置所对应的奖赏值,包括:
确定所述页面中已布局页面元素分别距离所述页面四个边的距离值;
统计沿所述页面四个边的方向上无重复的距离值的个数;
根据所述个数确定所述奖赏值。
本申请实施例中,将软性规则量化为奖赏值,使得生成的布局策略尽量满足软性规则。
根据第一方面第十一种可能的实现方式,所述确定所述页面状态及所述布局位置所对应的奖赏值,包括:
确定所述页面中已布局页面元素分别距离所述画布四个边的距离集合以及相邻已布局页面元素之间的横向间距集合和纵向间距集合;
分别确定所述距离集合、所述横向间距集合、所述纵向间距集合中距离值的标准差;
根据所述标准差确定所述奖赏值。
本申请实施例中,将软性规则量化为奖赏值,使得生成的布局策略尽量满足软性规则。
第二方面,本申请的实施例提供了一种页面布局装置,其特征在于,包括:
初始信息获取模块,用于获取页面信息、至少一个待布局页面元素的元素信息以及布局规则;
强化学习模块,用于基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略;
模仿学习模块,用于利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。
根据第一方面第一种可能的实现方式,所述布局规则包括第一优先级布局规则和第二优先级布局规则,所述至少一个候选页面布局策略满足所述第一优先级布局规则,且在所述第二优先级布局规则的约束下获取的奖赏大于预设奖赏阈值。
根据第一方面第二种可能的实现方式,所述强化学习模块,具体用于:
步骤1,根据所述页面信息和第一个待布局页面元素的元素信息,按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,并生成所述页面的页面状态;
步骤2,根据所述页面状态以及下一个待布局页面元素的元素信息,按照所述布局规则确定所述下一个待布局页面元素在所述页面中的布局位置;
步骤3,确定所述页面状态及所述布局位置所对应的奖赏值,并更新所述页面状态;
迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略;
累计每次迭代所确定的奖赏值,将累计得到的奖赏值满足预设条件的页面布局策略作为候选页面布局策略。
根据第一方面第三种可能的实现方式,所述强化学习模块,还用于::
计算已迭代的过程中每次页面状态及每次布局位置所对应的奖赏值的累计值,并根据所述奖赏值的累计值确定当前页面布局策略的策略价值;
在确定所述策略价值不小于预设价值阈值的情况下,继续迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略。
根据第一方面第四种可能的实现方式,所述模仿学习模块,具体用于:
根据预先训练模仿学习器得到的奖励函数确定每个候选页面布局策略的评分,其中,所述奖励函数由训练所述模仿学习器采用的正样本和负样本的排序损失确定,所述正样本包括符合用户审美的页面布局策略;
根据所述评分从所述至少一个页面布局策略布局中确定目标页面布局策略。
根据第一方面第五种可能的实现方式,所述模仿学习模块,还用于:
发送所述目标页面布局策略;
接收用户对所述目标页面布局策略的调整;
将所述调整后的所述目标页面布局策略作为训练所述模仿学习器的正样本,训练所述模仿学习器,获得优化的奖励函数。
根据第一方面第六种可能的实现方式,所述初始信息获取模块,具体用于:
接收用户在用户界面中配置的页面信息、至少一个待布局页面元素的元素信息以及布局规则。
第三方面,本申请的实施例提供了一种计算设备,包括:处理器;用于存储处理器可执 行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的页面布局方法。
第四方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的页面布局方法。
第五方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的页面布局方法。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1示出根据本申请一实施例的页面布局装置100的结构示意图。
图2示出根据本申请一实施例的页面布局场景示意图。
图3示出根据本申请一实施例的页面布局方法的流程示意图。
图4示出根据本申请一实施例的马尔科夫模型的结构示意图。
图5示出根据本申请一实施例的利用强化学习算法生成页面布局策略的流程示意图。
图6示出根据本申请一实施例的场景示意图。
图7示出根据本申请一实施例的场景示意图。
图8示出根据本申请一实施例的场景示意图。
图9示出根据本申请一实施例的场景示意图。
图10示出根据本申请一实施例的利用强化学习算法生成页面布局策略的流程示意图。
图11示出根据本申请一实施例提供的另一种页面布局装置100的结构示意图。
图12示出根据本申请一实施例的计算设备1200的模块结构示意图。
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段未作详细描述,以便于凸显本申请的主旨。
为了方便本领域技术人员理解本申请实施例提供的技术方案,下面先对技术方案实现的技术环境进行说明。
页面布局的核心是将数量不等、形状不同、尺寸不同的多个页面元素布局于有限的页面中。衡量一个页面布局是否为优秀的页面布局,需要判断该页面布局是否满足下述条件:第一,是否满足硬性规则;第二,是否满足软性规则;第三,是否符合人类审美或者符合用户使用习惯等。其中,所述硬性规则可以包括页面布局所要遵循的基本规则,例如不同的页面元素不产生重叠、页面的边缘区域内不放置页面元素等。所述软性规则可以包括不需要必须遵循的但是可以提升页面布局美感的规则,例如包括页面元素在横向和纵向上对齐、页面元素之间的间距均匀等。
基于类似于上文所述的实际技术需求,本申请实施例提供了一种页面布局方法。所述方法能够基于页面的页面信息、至少一个待布局页面元素的元素信息、布局规则,利用强化学习算法获取至少一个候选页面布局策略。然后,利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。其中,所述布局规则可以包括第一优先级布局规则和第二优先级布局规则,所述第一优先级布局规则可以包括硬性规则,所述第二优先级规则可以包括软性规则。在强化学习阶段,一方面,可以将所述第一优先级布局规则作为强化学习的动作约束条件,使得生成的候选页面布局策略完全满足硬性规则;另一方面,可以将所述第二优先级布局规则作为强化学习的奖赏函数,使得生成的候选页面布局策略尽可能满足软性规则。在模仿学习阶段,可以训练模仿学习器不断地与用户交互,学习用户的经验知识,使得模仿学习器决策得到的页面布局更加符合用户的审美和习惯。由此可见,利用该方法获取的目标页面布局策略不仅可以完全满足硬性规则,尽可能满足软性规则,还可以更加符合用户的审美和习惯,满足了上述对于优秀页面布局的要求。
本申请实施例提供的页面布局方法可以应用于平面设计、印刷品排版、网页设计、工业设计图纸排版、PPT排版、自媒体排版、芯片后端设计等多种应用场景中。
如图1所示,本申请实施例提供了一种示例性的应用场景,该场景可以包括页面布局装置100,所述页面布局装置100可以包括强化学习模块101和模仿学习模块103。
具体实现时,用户可以在用户界面中配置页面信息、至少一个待布局页面元素的元素信息以及布局规则。所述页面信息可以包括页面的尺寸、背景色、背景图片等,所述待布局页面元素可以包括图像、视频、文本或者上述任意多种元素的组合等,所述元素信息可以包括待布局页面元素的尺寸、图像像素的RGBA通道值(针对位图)、矢量元素的描述(针对矢量图)、图像或者视频的来源、文本内容等等。所述布局规则可以包括第一优先级规则(即硬性规则)和第二优先级规则(即软性规则)等。另外,对于不同的应用场景,可以设置与所述应用场景相匹配的第一优先级规则和第二优先级规则,本申请在此不做限制。用户还可以在用户界面中配置强化学习的累加奖赏阈值、策略价值的阈值等其他信息。
图2示出了所述强化学习模块101和所述模仿学习模块103的内部执行流程。对于强化学习模块101的执行流程,在步骤201中,可以首先构建强化学习模型环境,例如可以构建马尔科夫局决策模型,具体包括对模型的状态、行动、奖励、状态转移等参数的定义。在步骤203中,可以基于构建的强化学习模型环境,根据用户提供的页面信息、元素信息和布局规则,生成页面布局策略。步骤205中,可以判断所述页面布局策略的累积奖赏是否满足奖赏阈值。若判断结果包括所述累积奖赏满足奖赏阈值,则在步骤209中,将满足奖赏阈值的 页面布局策略作为候选页面布局策略;否则,在步骤207中,可以利用布局规则(包括硬性规则和软性规则)的约束,生成下一个页面布局策略。如图1所示,通过对步骤203、205和207的多次迭代,最终可以生成累积奖赏满足奖赏阈值的至少一个候选页面布局策略105。
对于模仿学习模块103的内部工作流程,在步骤211中,可以利用奖励函数确定所述至少一个候选页面布局策略105的评分,并根据所述评分确定其中最符合人类审美的至少一个页面布局策略。然后,在步骤213中,可以将确定的所述至少一个页面布局策略发送至用户,判断是否符合用户的审美。如果不符合用户的审美,用户可以对所述页面布局策略进行调整,使得调整后的页面布局策略符合用户审美。步骤215中所述模仿学习模块103可以接收用户对所述页面布局策略的调整,并在步骤217中,学习调整后的页面布局策略,使得所述模仿学习模块103更加符合用户的审美。通过对步骤211-217的迭代,最终可以生成符合用户审美的页面布局策略107。
下面结合附图对本申请所述的页面布局方法进行详细的说明。图3为本申请提供的页面布局方法的一种实施例的方法流程示意图。虽然本申请提供了如下述实施例或附图所示的方法操作步骤,但基于常规或者无需创造性的劳动在所述方法中可以包括更多或者更少的操作步骤。在逻辑性上不存在必要因果关系的步骤中,这些步骤的执行顺序不限于本申请实施例提供的执行顺序。所述方法在实际中的页面布局过程中或者装置执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境)。
具体的,本申请提供的页面布局方法的一种实施例如图3示,所述方法可以由前述页面布局装置100执行,所述方法可以包括:
S301:获取页面信息、至少一个待布局页面元素的元素信息以及布局规则。
在页面布局的准备阶段,可以获取到页面的页面信息。所述页面信息可以包括所述页面的尺寸(包括宽度和高度)、页面的可布局区域和不可布局区域的位置描述信息、页面的背景信息(包括背景颜色、背景透明度、背景图像)等等。所述待布局页面元素可以包括图像、视频、文本、页面控件或者上述任意多种元素的组合等。所述元素信息可以包括待布局页面元素的尺寸,例如一副图像的尺寸为640像素×480像素。在待布局页面元素为不规则形状的情况下,可以将所述待布局页面元素的最小外接矩形的尺寸作为所述待布局页面元素的尺寸。对于图像或者视频而言,所述元素信息还可以包括图像像素的RGBA通道值(针对位图)、矢量元素的描述(针对矢量图)、视频播放窗口的样式、图像或者视频的来源等。对于文本而言,所述元素信息还可以包括文本框的大小、样式,文本内容,文本样式等等。
本申请实施例中,所述布局规则可以作为在利用强化学习进行页面布局时所遵循的规则。由于涉及到具体强化学习算法,单纯利用文字则无法将布局规则运用到页面布局中。在本申请实施例中,可以将布局规则进行量化处理。具体实现时,对于硬性规则而言,可以根据页面元素的最小外接矩形中各个边的坐标值确定两个页面元素是否重叠或者确定页面元素是否超出可布局区域的范围。再如,对于软性规则而言,可以根据页面元素分别距离页面四个边的距离值,再根据页面中沿四个方向上重复的距离值的个数确定页面元素是否对齐。
S303:基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略。
为了能更加清楚地表达强化学习在页面布局中的使用方法,首先介绍强化学习的基本理 论模型,马尔可夫决策过程(Markov Decision Process,英文缩写为MDP)。
图4是本申请提供的MDP的模型示意图,如图4所示,MDP中涉及到智能体(Agent)和环境两个交互的主体,其中,Agent为做出决策的主体,环境作为信息反馈的主体。MDP可以用一个四元组<S,A,R,T>表示,其中,
(1)S为状态空间(State Space),可以包含Agent可能感知到的环境状态集合;
(2)A为动作空间(Action Space),可以包含Agent在每个环境状态上可以采取的动作集合;
(3)R为奖赏函数(Reward Function),R(s,a,s’)可以表示在状态s上执行动作a,并转移到状态s’时,Agent从环境中获得的奖赏;
(4)T为环境状态转移函数(State Transition Function),T(s,a,s’)可以表示在状态s上执行动作a,并转移到状态s’的概率。
如图4所示的MDP中Agent和环境之间的交互过程,Agent感知到t时刻的环境状态为s
t,基于所述环境状态s
t,Agent可以从动作空间A中选择动作a
t执行;环境在接收Agent所选择的动作之后,给以Agent相应的奖赏信号反馈r
t+1,并转移到新的环境状态s
t+1,并等待Agent做出新的决策。在与环境的交互过程中,Agent的目标是找到一个最优策略π
*,使得π
*在任意状态s和任意时间步骤t下,都能够获得最大的长期累积奖赏。在一个示例中,π
*可以被定义为公式(1):
其中,π表示Agent的某个策略(即状态到动作的概率分布),E
π表示策略π下的期望值,γ为折扣率(Discount Rate),k为未来时间步骤,r
t+k表示Agent在时间步骤(t+k)上获得的即时奖赏。
本申请实施例中,可以基于以上MDP模型,构建本申请实施例中的强化学习模型环境。具体可以设置为:
状态s
t:t时刻的页面状态,可以包括可布局区域和不可布局区域的位置描述;
动作a
t:t时刻待布局页面元素在页面中的布局位置,可以表示为待布局页面元素的最小外接矩阵的左上角顶点在页面中的位置(x
t,y
t);
状态转移矩阵p(s
t+1|s
t):执行完动作a
t后,下一状态s
t+1发生的概率;
奖赏r
t:将已选软性规则通过量化方式转变成奖励。
在所述强化学习模型环境中,智能体可以作为实施页面布局决策的对象。智能体在感知到页面状态的情况下,可以在布局规则(主要包括硬性规则)的约束下根据页面状态执行对应的动作,执行的动作可以包括所述待布局页面元素在所述页面中的布局位置。环境主体在接收到智能体的动作之后,可以根据布局规则(主要包括软性规则)给予智能体奖赏信号,并转移到新的页面状态。
下面通过一个具体的实施例说明利用强化学习生成候选页面布局策略的方法,图5示出了该方法的流程示意图。如图5所示,所述基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略,包括:
S501:根据所述页面信息和第一个待布局页面元素的元素信息,按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,并生成所述页面的页面状态。
在本申请实施例中,可以按照每次布局一个待布局页面元素的方式对所述页面进行布局。这种将所述至少一个待布局页面元素逐个地布局于所述页面中的方式,可以在每次迭代过程中通过布局规则只约束一个待布局页面元素的布局位置,实现最终完成的候选页面布局策略能够完全满足所述布局规则。在本申请的一个实施例中,可以将所述至少一个待布局页面元素按照尺寸从大到小的顺序布局。如图6的上图所示,共有5个待布局页面元素需要布局于页面中,将这5个待布局页面元素按照尺寸从大到小的顺序排序之后示出图6中下图的布局顺序,即可以按照4-3-1-5-2的顺序布局这5个待布局页面元素。将所述至少一个待布局页面元素按照尺寸从大到小的顺序布局,可以提升布局效率,加快强化学习算法的收敛速度。当然,在其他实施例中,还可以按照其他任意顺序对所述至少一个待布局页面元素进行布局,本申请在此不做限制。
本申请实施例中,在确定第一个待布局页面元素之后,可以根据所述页面信息和第一个待布局页面元素的元素信息,按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,并生成所述页面的页面状态。在一个具体的示例中,如图7所示,获取到页面701,将图6所示的5个待布局页面元素布局于页面701中。将待布局页面元素4作为第一个待布局页面元素,并且待布局页面元素4在页面701中的布局需要受到所述布局规则的约束。所述布局规则例如可以包括不同的待布局页面元素之间不重叠、待布局页面元素不占据页面中的不可布局区域、待布局页面元素的间距均匀等等。例如,布局规则包括在页面701中左右保留3cm的页边距,上下保留5cm的页边距。基于此,将待布局页面元素设置于页面701的过程中,需要获取到待布局页面元素距离页面701的上、下、左、右四个边的距离,图7中示出了待布局页面元素4与四个边之间的距离的示意图。具体来说,在确定d1、d3都大于等于3cm,d2、d4都大于等于5cm的情况下,确定待布局页面元素4的布局满足布局规则。
需要说明的是,为了加快页面布局的效率,还可以将页面701划分成多个网格703。其中,网格703的尺寸根据所述至少一个待布局页面元素中最小元素的尺寸设置。例如,在上述5个待布局页面元素中,待布局页面元素2的尺寸最小,因此,可以设置网格703的边长与待布局页面元素2的最小边长相等。在将所述待布局页面元素布局于页面的过程中,可以将所述待布局页面元素的外接框的至少一个顶点与所述网格的顶点重合。如图7所示,设置待布局页面元素4的左上角顶点与其中一个网格顶点重合。通过网格化的布局方式,可以使得待布局页面元素之间更容易对齐,同时也会加快页面布局的效率。另外,在其他实施例中,网格703的尺寸还可以设置为任意其他的数值,如固定值、最小页边距等,本申请在此不做限制。
本申请实施例中,将所述第一个待布局页面元素布局于页面之后,可以生成页面701的页面状态。所述页面状态可以包括页面701中的不可布局区域和可布局区域。在一个实施例中,所述不可布局区域和所述可布局区域可以利用坐标方式描述,例如,可以利用多个关键点描述。在本申请的另一个实施例中,可以利用设置屏蔽标识的方式描述页面701中的不可布局区域。具体来说,将页面701划分成多个网格703之后,可以确定网格的数量为M×N,如图7中M=8,N=10。在该实施例中,可以在所述不可布局区域上设置屏蔽标识,所述屏蔽标识用于表示对应的位置处不可布局页面元素。例如可以设置所述屏蔽标识为“1”,其他可布局区域处默认为“0”,这样,可以利用大小为M×N的矩阵描述页面701的页面状态。通过上述利用网格化和矩阵相结合的方式描述页面状态,不仅可以简化页面状态的描述,还可 以减小后续布局的搜索空间,加快搜索速度,提升布局效率。
S503:根据所述页面状态以及下一个待布局页面元素的元素信息,按照所述布局规则确定所述下一个待布局页面元素在所述页面中的布局位置。
本申请实施例中,MDP模型中的智能体可以获取到上述页面状态,并根据所述页面状态和下一个待布局页面元素的元素信息,执行下述动作:按照所述布局规则确定所述下一个待布局页面元素在所述页面中的布局位置。例如在图6中,下一个待布局页面元素为待布局页面元素3,根据图7所述的页面状态以及待布局页面元素3,可以利用与上述实施例同样的方式将待布局页面元素3布局于页面701中。
S505:确定所述页面状态及所述布局位置所对应的奖赏值,并更新所述页面状态。
本申请实施例,MDP模型中的环境主体可以获取到智能体所执行的动作(即所述布局位置),并确定所述页面状态与所述动作所对应的奖赏值。所述奖赏值可以根据布局规则确定,例如可以将软性规则量化为奖赏值。如图8所示的示例中,将待布局页面元素3布局于页面701之后,可以分别获取已布局页面元素分别距离页面701四个边的距离值,例如,页面元素4距离四个边的距离值分别为d1、d2、d3、d4,页面元素3距离四个边的距离值分别为d5、d6、d7、d8,统计上述距离值中无重复的距离值的个数。根据所述无重复的距离值的个数,确定所述页面状态与所述动作所对应的奖赏值。重复的距离值越多,表示页面布局越对齐。那么,奖赏值可以利用下述公式计算得到:
reward
align=4×n-n
u-n
d-n
l-n
r
其中,n为已布局页面元素的数量,n
u,n
d,n
l,n
r为页面701中沿上、下、左、右四个方向上无重复的距离值的个数。那么在上述示例中,除了d2和d6为重复的距离值,其他的均为非重复的距离值,那么,奖赏值=8-0-2-2-2=2。
在本申请的另一个实施例中,还可以将页面元素之间间距均匀这一软性规则量化为奖赏值。具体来说,首先可以获取每个已布局页面元素在横向和纵向上与相邻已布局页面元素之间的间距集合(A
h,A
v),以及获取每个已布局页面元素到页面701四边的距离集合(A
u,A
d,A
l,A
r)。然后,可以计算每个间距集合和每个距离集合的标准差,标准差越大,表示已布局页面元素的间距越不均匀。那么奖赏值可以利用下述公式计算得到:
reward
gap=-std(A
h)-std(A
v)-std(A
h)-std(A
v)-std(A
h)-std(A
v)
其中,std(*)表示标准差运算。
需要说明的是,在页面布局中,可以利用的软性规则不限于上述举例,针对不同的应用场景,可以设置与应用场景相关的软性规则。例如,在具有触控组件的页面布局中,可以设置在触控组件布局于页面的右侧所获取的奖赏值越高。另外,不同的软性规则所贡献的奖赏值不相同,基于此,在本申请的一个实施例中,可以设置不同的软性规则所量化的奖赏值的权重,并根据所述权重确定最终的奖赏值,例如确定的最终的奖赏值可以表示为:
r=a×reward
align+β×reward
gap
其中,α为根据页面元素对齐所量化得到的奖赏值的权重,β为根据页面元素间距均匀所量化得到的奖赏值的权重。
另外,将所述下一个待布局页面元素布局于页面701之后,页面701中的可布局区域和不可布局区域发生了变化,基于此,所述环境主体还可以更新页面状态,并将计算得到的奖赏值和更新后的页面状态发送给所述智能体。
S507:迭代S503和S505,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略。
通过对上述S503和S505的迭代,直至将全部的待布局页面元素布局于所述页面中,并生成页面布局策略。所述页面布局策略可以包括各个待布局页面元素在页面701中的布局位置。需要说明的是,每次迭代布局的待布局页面元素的数量不限于一个,在需要布局的页面元素数量较多的情况下,还可以每次迭代布局两个或者两个以上的待布局页面元素,本申请在此不做限制。
S509:累计每次迭代所确定的奖赏值,将累计得到的奖赏值满足预设条件时所对应的页面布局策略作为候选页面布局策略。
在将所有待布局页面元素都布局于页面701中之后,可以累计每次迭代所确定的奖赏值,在累计的奖赏值满足预设条件时所对应的页面布局策略作为候选页面布局策略。所述预设条件例如可以包括累计的奖赏值大于预设奖赏阈值。
S305:利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。
基于以上对所述模仿学习模块103内部流程的概述,下面通过具体的实施例说明所述模仿学习模块103如何从所述至少一个候选页面布局策略105中确定目标页面布局策略107。图9展示了一种实施场景的流程图,如图9所示,模仿学习器901可以利用正样本集合903和负样本集合905训练得到,正样本集合903中可以包括符合用户审美或者习惯的多个页面布局策略样本,所述用户可以包括设计师、普通消费者等等,负样本集合905可以包括与正样本相对应的利用强化学习算法所得到的页面布局策略样本。本申请实施例中,可以利用正样本和负样本之间的排序损失确定所述模仿学习器901的奖励函数。
在本申请的一个实施例中,在训练模仿学习器901的过程中,可以从所述正样本获取到状态动作对集合D
human={(s
i,a
i)},该状态动作对集合D
human可以包括在生成页面布局策略的过程中人类专家的状态和动作的完整历史记录。另一方面,还可以从负样本中获取到状态动作对集合D
agent={(s
j,a
j)},该状态动作对集合D
agent包括在利用强化学习算法生成页面布局策略的过程中智能体的状态和动作的完整历史记录。然后,模仿学习器901可以利用D
human和D
agent训练得到奖励函数r(s,a),使得D
human获取的奖励大于等于D
agent获取的奖励,也就说正样本获取的奖赏永远是排在负样本获取的奖赏的前面。具体来说,模仿学习器901可以将状态s
i作为输入数据,将s
i对应的动作a
i作为标记(label)进行模仿学习,每个状态动作对(s
i,a
i)都可以获取到一个奖赏,那么D
human获取的奖励是状态动作对获取的累计奖赏。基于D
human获取的奖励大于等于D
agent获取的奖励的前提,最终可以获取到r(s,a)。需要说明的是,模仿学习器901可以包括多层前馈神经网络,如卷积神经网络等,本申请在此不做限制。
本申请实施例中,如图9所示,基于训练得到的模仿学习器901的奖励函数,可以分别对所述至少一个候选页面布局策略105进行评分,并确定评分最高的一个或者多个目标页面布局策略107。然后,可以将所述一个或者多个目标页面布局策略107展示给用户905。若用户905对展示的所述一个或者多个目标页面布局策略107表示满意,则可以直接利用所述一个或者多个目标页面布局策略107。若用户905对所述一个或者多个目标页面布局策略107不满意,并对其进行调整,可以将调整后的目标页面布局策略作为正样本,并训练所述模仿学习器901学习所述调整后的目标页面布局策略,获得优化的奖励函数,使得所述模仿学习器901的审美与用户907的审美更加接近。
本申请实施例中,由于机器学习对图像处理的技术已经较为成熟,因此,在模仿学习阶 段,所述正样本集合901和所述负样本集合903可以均为图像格式。对应的,在将所述至少一个候选页面布局策略105输入至所述模仿学习器901之前,可以将所述候选页面布局策略105转化成图像格式,与训练数据的格式保持一致。
在实际的应用场景下,页面布局所涉及到的布局规则有很多,并且在不同的应用场景下,涉及到的布局规则可能还不相同。但是不同的布局规则,其对应的重要程度不相同,基于此,可以将所述布局规则划分为第一优先级布局规则和第二优先级布局规则,所述第一优先级布局规则的优先级大于所述第二优先级布局规则。例如,所述第一优先级布局规则可以包括硬性规则,所述第二优先级布局规则可以包括软性规则。在本申请实施例中,所述至少一个候选页面布局策略满足所述第一优先级布局规则,且在所述第二优先级布局规则的约束下获取的奖赏大于预设奖赏阈值。在具体的实施例中,可以将所述第一优先级布局规则作为智能体实施动作时的约束条件,将所述第二优先级布局规则作为所述环境主体确定奖赏值的依据。通过上述方式,可以使得生成的候选页面布局策略完全满足硬性规则,并尽可能地满足软性规则。
考虑到在实际的应用场景中,若涉及到的待布局页面元素的数量较多,那么可能实现的页面布局策略的数量可能呈指数级增长。在本申请实施例中,可以通过蒙特卡洛树搜索算法减少页面布局策略的搜索空间,提升布局效率。具体来说,如图10所示,在步骤505之后、步骤507之前,可以包括:
S1001:确定当前的累计奖赏值,并根据所述累计奖赏值确定当前页面布局策略的策略价值。
S1003:判断所述当前页面布局策略的策略价值是否大于等于预设价值阈值。
本申请实施例中,在强化学习阶段,每迭代一次,则可以获得一个奖赏值,根据获得的奖赏值,可以确定本次迭代的累计奖赏值。根据所述累计奖赏值,可以确定当前页面布局策略(也可以是当前页面状态)的策略价值,在一个示例中,所述策略价值与累计奖赏之间的关系可以表示为:
如果当前页面布局策略的策略价值越高,表示当前页面布局策略越有价值,基于此,可以判断当前页面布局策略的策略价值是否大于等于预设价值阈值。在确定所述策略价值大于等于预设价值阈值的情况下,可以继续执行步骤507。在本申请的另一个实施例中,如图10所示,在确定所述策略价值小于所述预设价值阈值的情况下,可以排除当前的页面布局策略,并回到步骤501,重新开始新的布局。
本实施例中,在强化学习过程中利用蒙特卡洛树搜索算法减小页面布局的搜索空间,可以在待布局页面元素较多的情况下大大降低搜索复杂度。另外,通过价值函数预估中间结点的长期收益,能够在减小搜索空间的同时获取到奖赏值更高的页面布局策略。
本申请另一方面还提供所述页面布局装置100的另一种实施例,如图11所示,所述页面布局装置100可以包括:
初始信息获取模块1101,用于获取页面信息、至少一个待布局页面元素的元素信息以及布局规则;
强化学习模块101,用于基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略;
模仿学习模块103,用于利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。
可选的,在本申请的一个实施例中,所述布局规则包括第一优先级布局规则和第二优先级布局规则,所述至少一个候选页面布局策略满足所述第一优先级布局规则,且在所述第二优先级布局规则的约束下获取的奖赏大于预设奖赏阈值。
可选的,在本申请的一个实施例中,所述强化学习模块,具体用于:
步骤1,根据所述页面信息和第一个待布局页面元素的元素信息,按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,并生成所述页面的页面状态;
步骤2,根据所述页面状态以及下一个待布局页面元素的元素信息,按照所述布局规则确定所述下一个待布局页面元素在所述页面中的布局位置;
步骤3,确定所述页面状态及所述布局位置所对应的奖赏值,并更新所述页面状态;
迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略;
累计每次迭代所确定的奖赏值,将累计得到的奖赏值满足预设条件的页面布局策略作为候选页面布局策略。
可选的,在本申请的一个实施例中,所述强化学习模块,还用于::
计算已迭代的过程中每次页面状态及每次布局位置所对应的奖赏值的累计值,并根据所述奖赏值的累计值确定当前页面布局策略的策略价值;
在确定所述策略价值不小于预设价值阈值的情况下,继续迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略。
可选的,在本申请的一个实施例中,所述模仿学习模块,具体用于:
根据预先训练模仿学习器得到的奖励函数确定每个候选页面布局策略的评分,其中,所述奖励函数由训练所述模仿学习器采用的正样本和负样本的排序损失确定,所述正样本包括符合用户审美的页面布局策略;
根据所述评分从所述至少一个页面布局策略布局中确定目标页面布局策略。
可选的,在本申请的一个实施例中,所述模仿学习模块,还用于:
发送所述目标页面布局策略;
接收用户对所述目标页面布局策略的调整;
将所述调整后的所述目标页面布局策略作为训练所述模仿学习器的正样本,训练所述模仿学习器,获得优化的奖励函数。
可选的,在本申请的一个实施例中,所述初始信息获取模块,具体用于:
接收用户在用户界面中配置的页面信息、至少一个待布局页面元素的元素信息以及布局规则。
根据本申请实施例的页面布局装置100可对应于执行本申请实施例中描述的方法,并且页面布局装置100中的各个模块的上述和其它操作和/或功能分别为了实现图2、图3、图5、 图10中的各个方法的相应流程,为了简洁,在此不再赘述。
另外需说明的是,以上所描述的实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
本申请实施例还提供了一种设备1200,用于实现上述图1所示的系统架构图中页面布局装置100的功能。其中,设备1200可以是物理设备或物理设备集群,也可以是虚拟化的设备,如云计算集群中的至少一个云虚拟机。为了便于理解,本申请对该设备1200的结构进行示例说明。
图12提供了一种设备1200的结构示意图,如图12所示,设备1200包括总线1201、处理器1202、通信接口1203和存储器1204。处理器1202、存储器1204和通信接口1203之间通过总线1201通信。总线1201可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口1203用于与外部通信。例如,获取页面信息或者所述至少一个待布局页面元素的信息等等。
其中,处理器1202可以为中央处理器(central processing unit,CPU)。存储器1204可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1204还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。
存储器1204中存储有可执行代码,处理器1202执行该可执行代码以执行前述页面布局方法。
具体地,在实现图1所示实施例的情况下,且图1实施例中所描述的页面布局装置100的各模块为通过软件实现的情况下,执行图1中的强化学习模块101、模仿学习模块103功能所需的软件或程序代码存储在存储器1204中。处理器1202执行存储器1204中存储的各模块对应的程序代码,如强化学习模块101、模仿学习模块103对应的程序代码,以获取至少一个候选页面布局策略,并从所述至少一个候选页面布局策略中确定目标页面布局策略。处理器1202还可以执行如图11所述的初始信息获取模块1101对应的程序代码。
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory, RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程 序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。
Claims (17)
- 一种页面布局方法,其特征在于,包括:获取页面信息、至少一个待布局页面元素的元素信息以及布局规则;基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略;利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。
- 根据权利要求1所述的方法,其特征在于,所述布局规则包括第一优先级布局规则和第二优先级布局规则,所述至少一个候选页面布局策略满足所述第一优先级布局规则,且在所述第二优先级布局规则的约束下获取的奖赏大于预设奖赏阈值。
- 根据权利要求1或2所述的方法,其特征在于,所述基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略,包括:步骤1,根据所述页面信息和第一个待布局页面元素的元素信息,按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,并生成所述页面的页面状态;步骤2,根据所述页面状态以及下一个待布局页面元素的元素信息,按照所述布局规则确定所述下一个待布局页面元素在所述页面中的布局位置;步骤3,确定所述页面状态及所述布局位置所对应的奖赏值,并更新所述页面状态;迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略;累计每次迭代所确定的奖赏值,将累计得到的奖赏值满足预设条件的页面布局策略作为候选页面布局策略。
- 根据权利要求3所述的方法,其特征在于,所述迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略,包括:计算已迭代的过程中每次页面状态及每次布局位置所对应的奖赏值的累计值,并根据所述奖赏值的累计值确定当前页面布局策略的策略价值;在确定所述策略价值不小于预设价值阈值的情况下,继续迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略。
- 根据权利要求1-4中任意一项所述的方法,其特征在于,所述利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略,包括:根据预先训练模仿学习器得到的奖励函数确定每个候选页面布局策略的评分,其中,所述奖励函数由训练所述模仿学习器采用的正样本和负样本的排序损失确定,所述正样本包括符合用户审美的页面布局策略;根据所述评分从所述至少一个页面布局策略布局中确定目标页面布局策略。
- 根据权利要求5所述的方法,其特征在于,在根据所述评分从所述至少一个页面布局 策略布局中确定目标页面布局策略之后,所述方法还包括:发送所述目标页面布局策略;接收用户对所述目标页面布局策略的调整;将所述调整后的所述目标页面布局策略作为训练所述模仿学习器的正样本,训练所述模仿学习器,获得优化的奖励函数。
- 根据权利要求1至6任意一项所述的方法,其特征在于,所述获取页面信息、至少一个待布局页面元素的元素信息以及布局规则,包括:接收用户在用户界面中配置的页面信息、至少一个待布局页面元素的元素信息以及布局规则。
- 一种页面布局装置,其特征在于,包括:初始信息获取模块,用于获取页面信息、至少一个待布局页面元素的元素信息以及布局规则;强化学习模块,用于基于所述页面信息、所述至少一个待布局页面元素的元素信息、所述布局规则,利用强化学习算法获取至少一个候选页面布局策略;模仿学习模块,用于利用模仿学习算法从所述至少一个候选页面布局策略中确定目标页面布局策略。
- 根据权利要求8所述的装置,其特征在于,所述布局规则包括第一优先级布局规则和第二优先级布局规则,所述至少一个候选页面布局策略满足所述第一优先级布局规则,且在所述第二优先级布局规则的约束下获取的奖赏大于预设奖赏阈值。
- 根据权利要求8或9所述的装置,其特征在于,所述强化学习模块,具体用于:步骤1,根据所述页面信息和第一个待布局页面元素的元素信息,按照所述布局规则将所述第一个待布局页面元素布局于所述页面中,并生成所述页面的页面状态;步骤2,根据所述页面状态以及下一个待布局页面元素的元素信息,按照所述布局规则确定所述下一个待布局页面元素在所述页面中的布局位置;步骤3,确定所述页面状态及所述布局位置所对应的奖赏值,并更新所述页面状态;迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略;累计每次迭代所确定的奖赏值,将累计得到的奖赏值满足预设条件的页面布局策略作为候选页面布局策略。
- 根据权利要求10所述的装置,其特征在于,所述强化学习模块,还用于::计算已迭代的过程中每次页面状态及每次布局位置所对应的奖赏值的累计值,并根据所述奖赏值的累计值确定当前页面布局策略的策略价值;在确定所述策略价值不小于预设价值阈值的情况下,继续迭代步骤2和步骤3,直至将所述至少一个待布局页面元素布局于所述页面中,并生成页面布局策略。
- 根据权利要求8-11中任意一项所述的装置,其特征在于,所述模仿学习模块,具体用于:根据预先训练模仿学习器得到的奖励函数确定每个候选页面布局策略的评分,其中,所述奖励函数由训练所述模仿学习器采用的正样本和负样本的排序损失确定,所述正样本包括符合用户审美的页面布局策略;根据所述评分从所述至少一个页面布局策略布局中确定目标页面布局策略。
- 根据权利要求12所述的装置,其特征在于,所述模仿学习模块,还用于:发送所述目标页面布局策略;接收用户对所述目标页面布局策略的调整;将所述调整后的所述目标页面布局策略作为训练所述模仿学习器的正样本,训练所述模仿学习器,获得优化的奖励函数。
- 根据权利要求8至13任意一项所述的装置,其特征在于,所述初始信息获取模块,具体用于:接收用户在用户界面中配置的页面信息、至少一个待布局页面元素的元素信息以及布局规则。
- 一种计算设备,其特征在于,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现权利要求1-7任意一项所述的方法。
- 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-7中任意一项所述的方法。
- 一种计算机程序产品,其特征在于,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述权利要求1-7中任意一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21922417.7A EP4276601A4 (en) | 2021-01-28 | 2021-10-28 | METHOD AND DEVICE FOR LAYING OUT A PAGE |
US18/360,497 US20230376674A1 (en) | 2021-01-28 | 2023-07-27 | Page Layout Method and Apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110118584.2 | 2021-01-28 | ||
CN202110118584.2A CN114816395A (zh) | 2021-01-28 | 2021-01-28 | 一种页面布局方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/360,497 Continuation US20230376674A1 (en) | 2021-01-28 | 2023-07-27 | Page Layout Method and Apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022160826A1 true WO2022160826A1 (zh) | 2022-08-04 |
Family
ID=82525895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/127122 WO2022160826A1 (zh) | 2021-01-28 | 2021-10-28 | 一种页面布局方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230376674A1 (zh) |
EP (1) | EP4276601A4 (zh) |
CN (1) | CN114816395A (zh) |
WO (1) | WO2022160826A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115599809A (zh) * | 2022-11-16 | 2023-01-13 | 泽恩科技有限公司(Cn) | 基于粒子群优化算法的数据库布局设计方法 |
CN116107574A (zh) * | 2023-04-12 | 2023-05-12 | 南京数睿数据科技有限公司 | 应用界面自动构建方法、装置、电子设备和可读介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060200759A1 (en) * | 2005-03-04 | 2006-09-07 | Microsoft Corporation | Techniques for generating the layout of visual content |
US20170344656A1 (en) * | 2016-05-29 | 2017-11-30 | Wix.Com Ltd. | System and method for the creation and update of hierarchical websites based on collected business knowledge |
CN110018869A (zh) * | 2019-02-20 | 2019-07-16 | 阿里巴巴集团控股有限公司 | 通过强化学习向用户展示页面的方法及装置 |
US20200043359A1 (en) * | 2018-07-31 | 2020-02-06 | Korea Advanced Institute Of Science And Technology | Apparatus and method for eliciting optimal strategy of the humans in the interactive games using artificial intelligence |
US20200134388A1 (en) * | 2018-10-31 | 2020-04-30 | Salesforce.Com, Inc. | Refinement of Machine Learning Engines for Automatically Generating Component-Based User Interfaces |
CN111353822A (zh) * | 2020-03-03 | 2020-06-30 | 广东博智林机器人有限公司 | 一种图像布局、模型训练方法、装置、设备及存储介质 |
-
2021
- 2021-01-28 CN CN202110118584.2A patent/CN114816395A/zh active Pending
- 2021-10-28 EP EP21922417.7A patent/EP4276601A4/en active Pending
- 2021-10-28 WO PCT/CN2021/127122 patent/WO2022160826A1/zh unknown
-
2023
- 2023-07-27 US US18/360,497 patent/US20230376674A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060200759A1 (en) * | 2005-03-04 | 2006-09-07 | Microsoft Corporation | Techniques for generating the layout of visual content |
US20170344656A1 (en) * | 2016-05-29 | 2017-11-30 | Wix.Com Ltd. | System and method for the creation and update of hierarchical websites based on collected business knowledge |
US20200043359A1 (en) * | 2018-07-31 | 2020-02-06 | Korea Advanced Institute Of Science And Technology | Apparatus and method for eliciting optimal strategy of the humans in the interactive games using artificial intelligence |
US20200134388A1 (en) * | 2018-10-31 | 2020-04-30 | Salesforce.Com, Inc. | Refinement of Machine Learning Engines for Automatically Generating Component-Based User Interfaces |
CN110018869A (zh) * | 2019-02-20 | 2019-07-16 | 阿里巴巴集团控股有限公司 | 通过强化学习向用户展示页面的方法及装置 |
CN111353822A (zh) * | 2020-03-03 | 2020-06-30 | 广东博智林机器人有限公司 | 一种图像布局、模型训练方法、装置、设备及存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4276601A4 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115599809A (zh) * | 2022-11-16 | 2023-01-13 | 泽恩科技有限公司(Cn) | 基于粒子群优化算法的数据库布局设计方法 |
CN116107574A (zh) * | 2023-04-12 | 2023-05-12 | 南京数睿数据科技有限公司 | 应用界面自动构建方法、装置、电子设备和可读介质 |
CN116107574B (zh) * | 2023-04-12 | 2023-06-13 | 南京数睿数据科技有限公司 | 应用界面自动构建方法、装置、电子设备和可读介质 |
Also Published As
Publication number | Publication date |
---|---|
CN114816395A (zh) | 2022-07-29 |
EP4276601A1 (en) | 2023-11-15 |
EP4276601A4 (en) | 2024-09-11 |
US20230376674A1 (en) | 2023-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12086516B2 (en) | Generating integrated circuit floorplans using neural networks | |
WO2021078027A1 (zh) | 构建网络结构优化器的方法、装置及计算机可读存储介质 | |
WO2022160826A1 (zh) | 一种页面布局方法及装置 | |
US10579737B2 (en) | Natural language image editing annotation framework | |
CN111758105A (zh) | 学习数据增强策略 | |
AU2018205185B2 (en) | Scalable font pairing with asymmetric metric learning | |
JP2019091434A (ja) | 複数のディープ・ラーニング・ニューラル・ネットワークを動的に重み付けすることによるフォント認識の改善 | |
WO2017197806A1 (zh) | 基于人工智能提供智能服务的方法、智能服务系统及智能终端 | |
Bahrehmand et al. | Optimizing layout using spatial quality metrics and user preferences | |
CN111126574A (zh) | 基于内镜图像对机器学习模型进行训练的方法、装置和存储介质 | |
CN110196908A (zh) | 数据分类方法、装置、计算机装置及存储介质 | |
KR102179890B1 (ko) | 텍스트 데이터 수집 및 분석을 위한 시스템 | |
JP7430820B2 (ja) | ソートモデルのトレーニング方法及び装置、電子機器、コンピュータ可読記憶媒体、コンピュータプログラム | |
CN110378438A (zh) | 标签容错下的图像分割模型的训练方法、装置及相关设备 | |
CN109643384A (zh) | 用于零样本学习的方法和装置 | |
CN113407709A (zh) | 生成式文本摘要系统和方法 | |
US10936938B2 (en) | Method for visualizing neural network models | |
CN111679829B (zh) | 用户界面设计的确定方法和装置 | |
KR102401112B1 (ko) | UX-bit를 이용한 정책망을 포함하는 자동 디자인 생성 인공신경망 장치 및 방법 | |
CN113536182A (zh) | 长文本网页的生成方法、装置、电子设备和存储介质 | |
Maheshwari et al. | Exemplar based experience transfer | |
CN112560490A (zh) | 知识图谱关系抽取方法、装置、电子设备及存储介质 | |
CN117290515A (zh) | 文本标注模型的训练方法、文生图方法及装置 | |
CN117391497A (zh) | 一种新闻稿件质量主客观评分一致性评价方法及系统 | |
JP6798555B2 (ja) | 情報処理装置、情報処理方法、及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21922417 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021922417 Country of ref document: EP Effective date: 20230807 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |