CN109670264B

CN109670264B - Method and system for optimizing layout of reinforcement learning home

Info

Publication number: CN109670264B
Application number: CN201811633297.XA
Authority: CN
Inventors: 陈旋; 郑龙
Original assignee: Jiangsu Aijia Household Products Co Ltd
Current assignee: Jiangsu Aijia Household Products Co Ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2022-07-08
Anticipated expiration: 2038-12-28
Also published as: CN109670262A; CN109670264A; CN109670262B

Abstract

The invention discloses a method and a system for optimizing a home layout for reinforcement learning, wherein the scheme comprises the following steps: 1) the method comprises the steps of obtaining the layout state of the existing home decoration, functional areas, walls, doors and windows, hard clothes, soft clothes, furniture, household appliances, decoration and other articles, and the space central point coordinates (x, y and z) and the lengths (w, l and h) of the axes of x, y and z of the corresponding articles. 2) And obtaining a household layout deduction project list by using a grading calculation, wherein the household layout deduction project list comprises deduction projects, deduction articles, deviation values and the like. 3) And coding the deduction items to be used as a 'state', selecting the deduction item with the maximum future reward value by using reinforcement learning, and optimizing the deduction item through deduction deviation values.

Description

Method and system for optimizing layout of reinforcement learning home

Technical Field

The invention relates to divisional application of patents with patent application numbers of 2018116235931, application dates of 2018, 12 months and 28 days, and a name of computer-assisted home layout optimization method and system.

The invention relates to the technical field of home furnishing layout, in particular to a home furnishing layout optimization technology.

Background

The evaluation of the traditional home decoration scheme depends on manual examination, so that a large amount of manpower and material resources are consumed, and the scores are difficult to unify due to the difference of different individuals. The invention relates to an automatic home decoration scoring method and system, which are used for reducing labor cost and providing a unified judging standard.

Disclosure of Invention

The invention aims to solve the technical problems that the conventional household layout work needs to be optimized manually, the efficiency is low and the standards are inconsistent. The invention adopts a computer-aided home design layout optimization method, which can effectively reduce the labor cost and provide a uniform judgment standard.

In a first aspect of the present invention, there is provided:

a computer-aided home layout optimization method comprises the following steps,

s1, obtaining the target object needing to be optimized in the floor plan;

s2, representing the target object by adopting a space rectangle, and determining the position and the size of the target object;

s3, designing a scoring index according to the position relation between the target objects, and scoring the target objects in the current floor plan by adopting the scoring index;

and S4, moving the target object, and re-grading to obtain the layout of the floor plan under the optimal grading condition.

In one embodiment, the target object includes a functional area, a wall, a door and window, a hard dress, a soft dress, furniture, a household appliance, a decoration, and the like.

In one embodiment, the position of the space rectangle is represented by three-dimensional coordinates, and the size of the space rectangle is represented by the length, width and height of the space rectangle.

In one embodiment, the scoring index is selected from one or more of center alignment, edge alignment, overlap, direction, and distance.

In one embodiment, objects in a room are projected onto walls around the objects, the projections are used as virtual walls, and the virtual walls are used as reference objects for comparing the positional relationships.

In one embodiment, different deviation intervals are designed according to the deviation degree of each position relation, and different deduction values are designed according to the conditions in the different deviation intervals.

In one embodiment, the step in S4 specifically refers to: and after the deduction values of the layout evaluation items obtained in the step S3 are coded, optimizing different layout states according to the deduction values by adopting a reinforcement learning method to obtain the optimal layout.

In a second aspect of the present invention, there is provided:

a computer-aided home layout optimization system, comprising:

the target object acquisition module is used for acquiring a target object needing to be optimized in the house type diagram;

the target object position and size determining module is used for representing the target object by adopting a space rectangle and determining the position and size of the target object;

the scoring module is used for designing a scoring index according to the position relation between the target objects and scoring the target objects in the current floor plan by adopting the scoring index;

and the optimization module moves the target object and calls the scoring module to score again to obtain the layout of the user-type graph under the optimal scoring condition.

In one embodiment, the target object position size determination module represents the position of the space rectangle by using three-dimensional coordinates, and represents the size of the space rectangle by using the length, width and height of the space rectangle.

In one embodiment, the scoring indexes in the scoring module are selected from one or more of center alignment, edge alignment, overlapping, direction and distance.

In one embodiment, the scoring module projects objects in the room onto walls around the objects, the projections are used as virtual walls, and the virtual walls are used as reference objects for comparing the position relationship.

In one embodiment, the scoring module designs different deviation intervals for each position relationship according to the deviation degree of the position relationship, and designs different deduction values under the conditions in the different deviation intervals.

In a third aspect of the present invention, there is provided:

and a computer readable medium recorded with a program capable of operating the computer-aided home layout optimization method.

Advantageous effects

According to the automatic home decoration scoring method and system, the deduction items between every two articles are used as the judgment standard, 5 ways of judgment of distance, direction, center alignment and edge overlapping are supported, the judgment standard is unified, and errors caused by individual differences are reduced; and supporting each deduction item to correspond to a plurality of deviation intervals, and further refining the scoring precision.

Drawings

FIG. 1 is a schematic structural diagram of a home automatic scoring system and method

FIG. 2 is a schematic view of a virtual object and its orientation

FIG. 3 is a schematic view of center alignment deviation values

FIG. 4 is a schematic diagram of edge alignment deviation values

FIG. 5 overlay bias diagram

FIG. 6 is a schematic view of the deviation of the orientation

FIG. 7 is a flow chart of a home layout optimization method

FIG. 8 is a schematic view of a breakdown coding scheme

FIG. 9 is a schematic diagram of reinforcement learning structure

FIG. 10 neural network output schematic

FIG. 11 center alignment optimization schematic

FIG. 12 edge alignment optimization schematic

FIG. 13 schematic diagram of overlay optimization

FIG. 14 is a schematic view of the optimization of the orientation

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. It should be understood that these exemplary embodiments are given only for the purpose of enabling those skilled in the relevant art to better understand and to implement the present invention, and are not intended to limit the scope of the present invention in any way.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Although various references are made herein to certain systems, modules, or elements of a system according to embodiments of the present application, any number of different modules may be used and run on a client and/or server. The modules are merely illustrative and different aspects of the systems and methods may use different modules.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

The steps of the optimization method in the invention are shown in fig. 1, and are explained in detail as follows:

1 obtaining a house type picture, identifying functional areas, walls, doors and windows, hard clothes, soft clothes, furniture, household appliances, decoration and other objects and corresponding length, width and height. The functional areas in the present invention refer to functions of rooms, such as bedrooms, living rooms, kitchens, toilets, etc., and since the positions of related objects in different functional rooms are different.

2, establishing a space coordinate system according to the acquired house type graph, representing objects such as functional areas, walls, doors and windows, hard clothes, soft clothes, furniture, household appliances and decorations into space rectangles, and respectively using (x, y, z) to represent a space center coordinate point of the objects and (w, l and h) to represent the position of a corresponding coordinate axis. The spatial rectangle may also refer to a cube, and the spatial rectangle is used to represent each object to facilitate the subsequent scoring of the positional relationship.

And 3, generating a furniture virtual wall according to the furniture and the wall information, wherein each furniture can be projected in four directions to generate virtual walls in 4 directions, and the direction of each virtual wall points to the furniture. According to furniture and peripheral walls, virtual walls in 4 directions are generated, and if no wall exists in any direction, the wall in the direction is not generated. Each virtual wall is a projection of the furniture on the wall, where only the projection of the nearest wall in each direction is considered. The naming mode of the virtual wall is named according to the anticlockwise rotation angle of the furniture, and the direction of the virtual object points to the furniture. With particular reference to figure 2. In this step, the purpose of generating the virtual wall is to facilitate calculating the relative position relationship between other objects in the room and the wall, so that the scoring is easier.

4, selecting articles according to the deduction detailed table, and calculating the deviation value of the deduction item in the detailed table corresponding to the two articles; the deduction item comprises: center aligned, edge aligned, overlapped, directional, distance 5.

And the center alignment deviation value is used for calculating the Euclidean distance between the center points of the two articles. The center alignment simultaneously comprises the center alignment of an 'x' axis and the center alignment of a 'y' axis, and in the practical implementation process, the minimum value of the two directions is taken as the final center alignment deviation value. For example: the sofa is aligned with the center of the television cabinet; the television cabinet is aligned with the television center; the bed is aligned with the center of the television cabinet, etc., as shown in fig. 3.

And the edge alignment deviation value is used for calculating the Euclidean distance at the boundary of the continuous objects. The edge alignment simultaneously comprises 8 ways of alignment, and in the practical implementation process, the minimum value is taken as the final edge alignment deviation value. For example, the edge of the table is aligned with the edge of a sofa, the edge of a bed is aligned with the edge of a bedside table, etc. With particular reference to figure 4.

And the overlapping deviation value is used for calculating the overlapping area of the two articles. For example, a bedside table may overlap a bed, a bedside table may overlap a wall, etc. With particular reference to figure 5.

And the direction deviation value is used for calculating the deviation value of the positive directions of the two articles. For example, the direction of the bed deviates 180 from the direction of the television cabinet; the orientation of the bed deviates 180 from the television, etc. With particular reference to figure 6.

The list of the deductions (part) is as follows:

TABLE 1

And traversing the deduction detailed table, and respectively calculating the deviation values meeting the conditions. In the actual implementation process, only the deviation value of the items 1 and 2 in the deduction list is calculated, namely, the deduction judgment is carried out only when the items 1 and 2 exist simultaneously.

And 5, finding out a corresponding deviation interval according to the deviation value to obtain a deduction value.

And 6, counting all the deduction values to obtain a final scoring result.

7. And (4) moving the articles which can move indoors, calculating the final scoring result again through the steps, and obtaining the layout under the condition of optimal scoring.

In order to further perform computer aided design on the above object moving and optimizing process, there may be the following improved optimizing method, mainly: on the basis of the existing layout, a reinforcement learning model is established for the deduction items of the scoring system, an optimal optimization scheme is found out, and the attractiveness of the layout effect is improved. The specific steps are shown in fig. 7:

step 1, firstly, acquiring data which is already laid out, including: the functional area, the wall body, door and window, hard dress, soft dress, furniture, household electrical appliances, ornaments and other articles, and the current layout state is recorded by corresponding the space central point coordinates (x, y, z) of the articles and the lengths (w, l, h) of the axes of 'x', 'y' and 'z'. In the above process, the position and size of the object are mainly specified as a cube or a rectangle, and the coordinates of the center point and the values of the length, the width and the height are used for representing the object.

And 2, acquiring a household layout deduction item list by using an evaluation system, wherein the household layout deduction item list comprises deduction items, deduction items and deviation values. In the layout design, a plurality of different layouts need to be examined, the obtained data needs to be subjected to an evaluation system to obtain a household layout deduction item list, wherein an example of a part of evaluation items is shown in table 2. And acquiring a household layout deduction item list by using an evaluation system for the acquired data, wherein each deduction item of the list comprises an object, a reference object, a deduction type, a reference direction and a deviation interval. The deduction articles comprise specific household articles, walls, doors and windows and virtual walls mapped by furniture, such as bed-head walls generated by mapping beds and walls. The deviation value refers to the edge point distance, the center point distance, the overlapping area or the angle deviation between two objects.

TABLE 2

And 3, generating a multi _ hot code by using the deduction item to represent the state of the current system, wherein each bit code is used for representing one deduction item and one deduction point value. Each deduction item is made into one of the middle positions, when the deduction item of the current position is in the deduction state, the position is '1', otherwise, the position is '0', and the coding mode is as shown in fig. 8.

According to the above rules, the deduction values of the individual subentries can be formed and can be accumulated into an overall layout score, with higher scores being better.

And 4, selecting the deduction item with the maximum future reward and the deduction article by using reinforcement learning, moving or rotating the article according to different deduction types and corresponding deviation values, optimizing the deduction item, and updating the layout state. Fig. 9 shows a schematic diagram of the reinforcement learning structure. The scheme belongs to an episodic task (episodic task), namely, the iteration is finished in a limited step, a scoring optimal value in the whole process is selected as a final optimization scheme, and two termination conditions are adopted, wherein 1 iteration time reaches the maximum value. 2 the current layout state has no deduction. The reward function used in this case belongs to delayed reward (penalty), as follows:

wherein G is_tThe reward value of the current moment is represented, score represents the sum of the score values of all the deduction items of the scoring system, T _ represents the acquired optimal scheme moment, T represents the ending state moment, parameter alpha measures the effectiveness of the current behavior of the current moment, and parameter lambda measures the effectiveness of the current behavior of the whole process. The neural network structure adopted in the present case is a 3-layer fully-connected neural network, and the output node of the network corresponds to a deduction term and a moving object, as shown in fig. 10.

The behavior of the case is to optimize the deduction item, and the following formula is used for selecting the behavior:

wherein, output represents the output probability distribution of the neural network, and action is the action to be executed, including optimizing the moved article and the optimized deduction item.

And moving the buckling items refers to moving the positions of the items or rotating the directions of the items according to the central distance deviation value, the edge distance deviation value, the overlapping area and the angle deviation value for optimizing the buckling items.

And 5, selecting the deduction item with the maximum future reward and the deduction article by using reinforcement learning, moving or rotating the article according to different deduction types and corresponding deviation values, optimizing the deduction item, and updating the layout state. The optimized deduction items comprise:

1 center alignment optimization. The article is moved to center align it in the designated direction as shown in fig. 11.

2 edge alignment optimization. The article is moved so that its edges are aligned in the desired direction, as shown in fig. 12.

And 3, overlapping optimization. The articles are moved so that they no longer overlap, as shown in fig. 13.

4, optimizing in the direction. The article is moved to rotate in a given direction as shown in fig. 14.

And repeatedly executing the steps 1-5 until a stopping condition is met (the maximum iteration times are reached; a score threshold value is met).

And selecting the state with the maximum score from the recorded layout states as a final result.

The specific algorithm steps are as follows:

inputting the existing home layout state (including specific articles, and the central coordinates (x, y, z) and the length (w, l, h) of coordinate axes of the corresponding articles); maximum number of iterations max _ deep; the optimal threshold value threshold. Initialization parameter time t =0, neural network F, layout states = state

Accumulated award value G =

And parameters

Epoch and

。

the process is as follows:

1, calling a scoring algorithm to obtain a score at the current moment_tCurrent layout state_tAnd record (state)_t，score_t)-> states;

2 cycle body 1:

for i=1 to epoch:

while True:

if (t > max_deep) or (score_t>threshold):

break exit loop with stop condition satisfied

else:

Use state_tGenerating a multi _ hot code representing the current system state S_t;

Calculating a state behavior evaluation function, and selecting an action to be executed:

;

;

action (center alignment optimization, edge alignment optimization, direction optimization or overlap optimization) is executed to obtain a new layout state_t+1;

Calling a scoring algorithm to obtain a score at the current moment_t+1Current layout state_t+1And record (state)_t+1，score_t+1)-> states;

t = t + 1;

reverse(states)// flipping states

for state, score_detail in states:

G_t -> G;

for state_t, score_t in states:

Computing a loss function

Gradient update

Based on the above process, the optimization method is as follows:

a method for optimizing a home layout for reinforcement learning comprises the following steps:

step 1, acquiring a layout state of a home;

step 2, calculating a deduction value of the layout scoring item according to the layout state in the step 1;

and 3, after the deduction values of all the layout evaluation items in the step 2 are coded, optimizing different layout states according to the deduction values by adopting a reinforcement learning method to obtain the optimal layout.

In one embodiment, the layout state refers to a spatial position layout state of the article.

In one embodiment, the article corresponding to the layout state includes: functional areas, walls, doors and windows, hard-set, soft-set, furniture, home appliances and the like.

In one embodiment, the position and size of the article are represented by its center point coordinates and the length, width, and height of the article.

In one embodiment, the scoring term includes one or more of center alignment, edge alignment, overlap, direction deviation, or distance deviation.

In one embodiment, the deduction value of the layout score in step 3 is encoded in multi-hot.

In one embodiment, the code is arranged according to each layout score, and if the layout score of the current bit is in a score deduction state, the position is "1", otherwise, the position is "0".

In one embodiment, the reward function used in reinforcement learning employs a delayed reward function.

In one embodiment, the reward function is:

wherein G is_tThe method comprises the steps of representing a reward value of the current time, representing the sum of scoring values of all scoring items of a scoring system by score, representing the acquired optimal scheme time by T _ T, representing the ending state time by T, using a parameter alpha for measuring the effectiveness of the current behavior of the current time, and using a parameter lambda for measuring the effectiveness of the current behavior of the whole process.

In one embodiment, the behavior in the reinforcement learning process takes the following function:

wherein output represents an output probability distribution of the neural network; action is an action to be performed, which refers to the optimized movement of the layout score items.

A reinforcement learning home layout optimization system, comprising:

the layout state acquisition module is used for acquiring the layout state of the home;

the deduction value calculating module is used for calculating the deduction value of the layout evaluation item according to the layout state;

the coding module is used for coding the deduction value of the layout scoring item;

and the reinforcement learning module is used for optimizing different layout states according to the deduction values by adopting a reinforcement learning method to obtain the optimal layout.

In one embodiment, the layout state acquisition module represents the position and size of the article by the coordinates of the center point of the article and the length, width and height of the article.

In one embodiment, the encoding module employs multi-hot encoding.

In one embodiment, the coding module is arranged according to each layout score item, and if the layout score item of the current bit is in a deduction state, the position is "1", otherwise, the position is "0".

In one embodiment, the reward function used in the reinforcement learning module is a delayed reward function.

In one embodiment, the reward function is:

In one embodiment, the behavior in the reinforcement learning process in the reinforcement learning module adopts the following function:

A computer-readable medium is described that carries a program that can perform the above-described method.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

A computer readable signal medium may comprise a propagated data signal with computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, and the like, or any suitable combination. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer readable signal medium may be propagated over any suitable medium, including radio, electrical cable, fiber optic cable, radio frequency signals, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Claims

1. A method for optimizing a home layout for reinforcement learning is characterized by comprising the following steps:

step 1, acquiring a layout state of a home;

step 3, after the deduction values of all the layout evaluation items in the step 2 are coded, optimizing different layout states according to the deduction values by adopting a reinforcement learning method to obtain an optimal layout;

the layout state refers to the space position layout state of the article; the items corresponding to the layout state comprise: functional areas, walls, doors and windows, hard-set, soft-set, furniture and household appliances; the position and the size of the article are represented by the coordinates of the center point of the article and the length, the width and the height of the article;

the scoring item comprises one or more of center alignment deviation, edge alignment deviation, overlapping deviation, direction deviation or distance deviation;

the center alignment deviation is used for calculating the Euclidean distance between the center points of the two articles;

the edge alignment deviation is used for calculating the Euclidean distance at the boundary of two objects;

the overlapping deviation is used for calculating the overlapping area of the two articles;

the direction deviation is used for calculating the deviation value of the positive directions of the two articles;

in the step 3, the deduction value of the layout scoring item is coded by multi-hot; in the coding, the layout scores are arranged according to each layout score, if the layout score at the current position is in a deduction state, the position is '1', and if not, the position is '0'.

2. The reinforcement learning home layout optimization method according to claim 1, wherein a reward function used in reinforcement learning is a delayed reward function; the reward function is:

wherein Gt represents the reward value at the current moment, score represents the sum of the scoring values of all the scoring items of the scoring system, T _ represents the acquired optimal scheme moment, T represents the ending state moment, parameter alpha is used for measuring the effectiveness of the current behavior at the current moment, and parameter lambda is used for measuring the effectiveness of the current behavior in the whole process.

3. The reinforcement learning home layout optimization method according to claim 1, wherein the behavior in the reinforcement learning process adopts the following function:

action＝argmax(output)

wherein output represents an output probability distribution of the neural network; action is an action to be executed, and refers to the optimization movement of the layout scoring items.

4. A reinforcement learning home layout optimization system is characterized by comprising:

the reinforcement learning module is used for optimizing different layout states according to the deduction values by adopting a reinforcement learning method to obtain an optimal layout;

the layout state refers to the space position layout state of the article; the items corresponding to the layout state comprise: functional areas, walls, doors and windows, hard-set, soft-set, furniture and household appliances; the layout state acquisition module represents the position and size of an article through the coordinates of the center point of the article and the length, width and height of the article; the scoring items comprise one or more of center alignment deviation, edge alignment deviation, overlapping deviation, direction deviation or distance deviation;

the coding module adopts multi-hot coding;

the coding module is arranged according to each layout scoring item, if the layout scoring item at the current position is in a deduction state, the position is '1', and if not, the position is '0'.

5. The reinforcement learning home layout optimization system of claim 4, wherein the reward function used in the reinforcement learning module is a delayed reward function; the reward function is:

wherein Gt represents the reward value at the current moment, score represents the sum of the scoring values of all the scoring items of the scoring system, T _ represents the acquired optimal scheme moment, T represents the ending state moment, parameter alpha is used for measuring the effectiveness of the current behavior at the current moment, and parameter lambda is used for measuring the effectiveness of the current behavior in the whole process;

the behavior in the reinforcement learning process in the reinforcement learning module adopts the following functions:

action＝argmax(output)

6. A computer-readable medium recording a program for executing the reinforcement learning home layout optimization method according to any one of claims 1 to 3.