CN112947412B - Method for autonomously selecting advancing destination of vending robot - Google Patents

Method for autonomously selecting advancing destination of vending robot Download PDF

Info

Publication number
CN112947412B
CN112947412B CN202110104046.8A CN202110104046A CN112947412B CN 112947412 B CN112947412 B CN 112947412B CN 202110104046 A CN202110104046 A CN 202110104046A CN 112947412 B CN112947412 B CN 112947412B
Authority
CN
China
Prior art keywords
robot
vending
vending robot
decision
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110104046.8A
Other languages
Chinese (zh)
Other versions
CN112947412A (en
Inventor
宋杰
赵星辰
冯晓月
王蓓蕾
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202110104046.8A priority Critical patent/CN112947412B/en
Publication of CN112947412A publication Critical patent/CN112947412A/en
Application granted granted Critical
Publication of CN112947412B publication Critical patent/CN112947412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention provides a method for autonomously selecting a travel destination by a vending robot, and relates to the technical field of machine learning. According to the method, the automatic vending robot is combined with various scenes to establish an interactive relation to obtain scene information, the destination selected by the vending robot is evaluated through returned information, and the destination capable of bringing maximum profit to the vending robot is found by utilizing continuous trial and error and selection. The method of the invention applies the reinforcement learning algorithm to the problem of the decision destination of the vending robot, has practical application value, not only changes the disadvantage of poor flexibility of the position fixing of the vending machine, but also brings higher economic benefit.

Description

Method for autonomously selecting advancing destination of vending robot
Technical Field
The invention relates to the technical field of machine learning, in particular to a method for autonomously selecting a travel destination by a vending robot.
Background
Along with the development of economy and the rapid improvement of the living standard of economy, people increasingly pursue a convenient living mode. The traditional market supermarket adopts a manual sales mode, has the characteristics of low speed, low efficiency and the like, and is difficult to meet the requirement of fast-paced life on the city. The advent of vending machines increased service efficiency and reduced labor intensity; the contact between food and service personnel is reduced during epidemic situation, and the food safety is improved; and the food can be sold in full time without much space, thereby greatly facilitating the life of people. According to statistics, 2016, the united states has 690 thousands of vending machines with annual income of 251 million dollars, which is the first share worldwide. Because of the huge population base, the number of vending machines owned by people in China is low. At present, the market of vending machines in China has developed to a certain extent, and various vending machines are increased. According to incomplete statistics, the domestic market reaches 300 ten thousand, and the annual retail sum reaches 600 hundred million yuan. In summary, vending machines will become an industry that fills a huge business in China, with great market potential. In recent years, various countries are dedicated to the research of vending machines, but most of the designs are traditional vending machines, the research direction is still under the condition that the vending machines are fixed in position, the vending machines have no capability of moving at any time, and customers cannot be provided with services in a mobile manner, so that the advantages of rapidness and convenience are greatly weakened, and the change of the vending machines is imperative.
In places frequently appearing in some lives, such as schools, if the vending robot can predict places frequently appearing by students and plan paths to destinations, the time spent on the road when the students want to purchase commodities can be saved, great convenience is brought, more commodities can be sold, and higher profits are brought to merchants.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for autonomously selecting a travel destination by a vending robot, which describes a hotspot feature image on a map, refers to the idea of a reinforcement learning Q-learning algorithm and enables the vending robot to autonomously decide the destination.
In order to solve the technical problems, the invention adopts the following technical scheme: a method for autonomously selecting a travel destination by a vending robot, comprising the steps of:
step 1: modeling and acquiring relevant data of the vending robot; the vending robot travel related data comprises field data, pedestrian attribute data and vending robot attribute data;
step 1.1: selling robot approach site data modeling; the properties of the field comprise { length, width, barrier, charger }, wherein length and width are the length and width of the field respectively; barrier is an obstacle range in the field, namely an area where neither the robot nor the pedestrian can walk, and is expressed in the form of { P } top ,P down },P top 、P down The upper left and lower right corner coordinates of barrer, respectively; the Charger is the position of a charging pile of the vending robot and is also the starting position of the vending robot, the representation mode is { x, y }, and x and y are respectively the abscissa and the ordinate of the Charger;
step 1.2: modeling pedestrian data; the attribute of the pedestrians comprises { Account, expected, probability, aview }, wherein Account is the number of pedestrians in the field; the estimated is the advancing speed of the pedestrian, and the probability of purchasing operation of the pedestrian in the vending robot is the probability of the pedestrian; aview is the view angle of the pedestrian, the expression mode is { radius, angle }, radius, angle are the radius and central angle of the view angle of the pedestrian respectively;
step 1.3: vending Robot (Robot) data modeling; the attributes of the vending robot comprise { Rcount, rspeed, rview, electric property }, wherein Rcount is the number of the vending robots in a field, rspeed and Rview are respectively the advancing speed and the viewing angle of the vending robots, electric property is the attribute related to the electric quantity of the vending robots, the expression modes are { capacitance speed, charge speed, electric property }, consumeSpeed, chargeSpeed are respectively the power consumption speed and the charging speed of the vending robots, and electric property is the residual electric quantity of the vending robots;
step 1.4: acquiring site data; after setting all parameters related to the field of the vending robot, inputting the width and length of the field to the vending robot, and enabling the vending robot to walk randomly within the range of the given field to continuously acquire related data of the field, namely updating the cognition of the field; the knowledge of this field is expressed in mathematical form as shown in equation (1):
wherein, -1 indicates that the coordinate point is not explored, 0 and 1 indicate that the coordinate point is explored for selling the interested position of the robot, 0 is an obstacle non-walkable area, and 1 is a walkable area; in the formula (1), a left matrix represents initial cognition of the robot to the field, and a right matrix represents cognition of the vending robot after the robot walks for a period of time to update the field;
step 1.5: judging whether the data acquisition stage is finished, specifically: calculating the proportion lambda of the current explored area of the vending robot to the overall area of the field, if lambda is larger than a set threshold value, the vending robot carries out an autonomous decision stage, otherwise, returning to the step 1.4, and continuing to acquire data by random walk;
step 2: setting hot spot characteristics, determining hot spots in a field, and automatically finding and recording the hot spots by using a vending robot;
determining hot spots in the venue; the hot spot is a position prone to robot decision selling, and a coordinate point with one of the following four characteristics is called a hot spot: (1) position ChargePoint (CP) of the charging post; (2) The pedestrian once purchased the commodity at the vending robot's location SalePoint (SP); (3) A location ActorPoint (AP) through which the pedestrian has walked; (4) Selling locations where the robot was not explored during the random walk, also referred to as locations InterestPoint (IP) of interest to the robot; the priority of the feature (1) is highest, and the features (2) and (3) are next lowest, and the feature (4) is lowest;
the vending robot automatically discovers the record hot spot; the position of the charging pile is known as a starting point of the vending robot; the position AP through which the pedestrian walks once is used as a camera for the vending robot to scan a coordinate point where the pedestrian appears to be recorded; the position SP where the pedestrian purchased the commodity at the vending robot is recorded as a coordinate point where the vending robot cabinet door is opened and the transaction is completed; the position IP of interest of the vending robot is known;
step 3: establishing a hotspot decision model, wherein the model is represented by a triplet < P current ,P target T >, wherein P current Coordinate sequence (x) representing all walkable positions of the current vend robot current ,y current ),P target Representing a sequence of coordinates (x target ,y target ) T represents the degree of inclination from the current position to the target position, and t=p current ×P target The method comprises the steps of carrying out a first treatment on the surface of the The dynamic process of the decision model is described as: the current position p of a vending robot c0 ∈P current Vending robot autonomously determined destination position p t0 ∈P target After the vending robot moves to the target position, the target position becomes the position p where the vending robot is located c1 ∈P current And get a tendency T 0 Feeding back;
the tendency degree indicates the tendency degree of the vending robot to go to the target position in the current position decision, and is used for measuring profits possibly obtained from the current position of the robot to the target position; wherein the degree of inclination is equal to P current And P target Distance between L, P target Whether or not it is a sales history point H sp Whether it is a crowd history point H ap 、P target Number of surrounding points of interest H ip In relation, the following formula is shown:
T=η 1 ·L+η 2 ·H+η 3 ·H ip (2)
where h=max { H sp ,H ap ,0},η 1 、η 2 、η 3 Are the influencing factors of three influencing factors respectively, and eta 2 Maximum;
step 4: establishing and updating a decision table; the vending robot trains through the tendency matrix to obtain a decision table to guide the actions of the robot;
step 4.1: establishing a tendency matrix and a decision table of the action of the vending robot; the tendency matrix is represented by P current For row P target A matrix describing for a column the inclination of the vending robot from the current position to the target position; the decision table is represented by P current For row P target The decision table established for the column is the same as the trend matrix in order, initialized to 0 and updated according to the trend matrix;
step 4.2: updating a decision table; each element on the decision table is called a decision value D, and the optimal decision value is equal to the sum of the tendency of the current position to the next destination and the maximum decision value of the next destination to the final destination; the updating of each decision value D on the decision table is shown in the following formula:
D t (p ct p tt )=T t (p ct ->p tt )+γmaxD t (p ct+1 p t ) (3)
wherein D is t (p ct p tt )、D t (p ct+1 p t ) Selling robot slaves at the time of the t-th update, respectivelyDecision value of front position coordinates to destination coordinates and decision value from destination coordinates to final destination coordinates, p ct Is the current position coordinate of the vending robot, p tt Is the destination coordinate, p of the vending robot in the current position coordinate decision ct+1 Is the next destination coordinate after decision, p t Is the final destination coordinate of the travel of the vending robot, T t The tendency of the vending robot to the destination coordinate at the current position coordinate is that gamma is an attenuation factor, gamma is more than or equal to 0 and less than 1, the importance degree of future decisions is represented, benefits brought by the current tendency position are only considered when gamma=0, and benefits brought by future decisions are more emphasized when gamma tends to 1;
the update formula (3) of each decision value D on the decision table is established under the condition that the maximum profit can be obtained by the optimal decision, but when the decision table is established, the left and right of the formula (3) are not equal, and errors shown by the following formula exist:
ΛD(p ct p tt )=T t (p ct ->p tt )+γmaxD t (p ct+1 p t )-D t (p ct p tt ) (4)
wherein ΔD (p ct p tt ) Updating errors for the decision value D;
thus, the process of the vending robot to iteratively update the final approach objective step by step towards the optimal decision is represented by equation 5:
D t+1 (p ct p tt )=D t (p ct p tt )+αΛD(p ct p tt ) (5)
wherein D is t+1 (p ct p tt ) For decision value updated for the t+1st iteration, alpha is learning rate, 0 < alpha < 1;
combining equations 4 and 5 results in a final update rule for the decision table, as shown in equation 6:
D t+1 (p ct p tt )=D t (p ct p tt )+α[T t+1 (p ct ->p tt )+γmaxD t (p ct+1 p t )-D t (p ct p tt )] (6)
step 5: the vending robot selects a column of the maximum decision value corresponding to the current position as a destination according to the decision table; then planning a walking path according to the shortest path A;
step 6: and when the number of hot spots recorded at the current moment of selling the robot is increased to be mu times of the original number, the step 4 of calculating the trend matrix is re-executed until the trend matrix is not changed or the change degree is smaller than the set threshold value.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: according to the method for autonomously selecting the advancing destination by the vending robot, provided by the invention, the vending robot establishes an interactive relation with a complex dynamic environment, the robot searches and records hot spots in the environment according to the hot spot characteristics and returns the current tendency, the robot evaluates the selected destination according to the tendency, the robot learns the optimal strategy after performing multiple attempts on the position of each destination, and then the robot can decide the destination bringing the maximum profit each time.
Drawings
Fig. 1 is a flowchart of a method for autonomously selecting a travel destination by a vending robot according to an embodiment of the present invention;
fig. 2 is initial data related to the travel of a vending robot according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a site display provided in an embodiment of the present invention;
fig. 4 is a travel route of a vending robot and a pedestrian provided by an embodiment of the present invention;
FIG. 5 is a set of positions with larger decision values in a decision table according to an embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
In this embodiment, a method for autonomously selecting a travel destination by a vending robot, as shown in fig. 1, includes the following steps:
step 1: modeling and acquiring relevant data of the vending robot;
step 1.1: vending robot approach (Map) data modeling; the properties of the field comprise { length, width, barrier, charger }, wherein length and width are the length and width of the field respectively; barrier is an obstacle range in the field, namely an area where neither the robot nor the pedestrian can walk, and is expressed in the form of { P } top ,P down },P top 、P down The upper left and lower right corner coordinates of barrer, respectively; the Charger is the position of a charging pile of the vending robot and is also the starting position of the vending robot, the representation mode is { x, y }, and x and y are respectively the abscissa and the ordinate of the Charger;
step 1.2: pedestrian (Actor) data modeling; the decision of the robot is largely determined according to the behavior of the person, so the invention needs to simulate the attribute and the travelling track of the pedestrian. The attribute of the pedestrians comprises { Account, expected, probability, aview }, wherein Account is the number of pedestrians in the field; the estimated is the advancing speed of the pedestrian, and the probability of purchasing operation of the pedestrian in the vending robot is the probability of the pedestrian; aview is the view angle of the pedestrian, the expression mode is { radius, angle }, radius, angle are the radius and central angle of the view angle of the pedestrian respectively;
step 1.3: vending Robot (Robot) data modeling; the attributes of the vending robot comprise { Rcount, rspeed, rview, electric property }, wherein Rcount is the number of the vending robots in a field, rspeed and Rview are respectively the advancing speed and the viewing angle of the vending robots, electric property is the attribute related to the electric quantity of the vending robots, the expression modes are { capacitance speed, charge speed, electric property }, consumeSpeed, chargeSpeed are respectively the power consumption speed and the charging speed of the vending robots, and electric property is the residual electric quantity of the vending robots;
step 1.4: acquiring site data; after setting all parameters related to the field of the vending robot, inputting the width and length of the field to the vending robot, and enabling the vending robot to walk randomly within the range of the given field to continuously acquire related data of the field, namely updating the cognition of the field; the knowledge of this field is expressed in mathematical form as shown in equation (1):
wherein, -1 indicates that the coordinate point is not explored, 0 and 1 indicate that the coordinate point is explored for selling the interested position of the robot, 0 is an obstacle non-walkable area, and 1 is a walkable area; in the formula (1), a left matrix represents initial cognition of the robot to the field, and a right matrix represents cognition of the vending robot after the robot walks for a period of time to update the field;
step 1.5: judging whether the data acquisition stage is finished, specifically: calculating the proportion lambda of the current explored area of the vending robot to the total area of the field, if lambda is larger than a set threshold value, indicating that most areas are explored, carrying out an autonomous decision stage by the vending robot, otherwise, indicating that the current known area is too small, and returning to the step 1.4, and continuing to acquire data by the vending robot by random walk;
in order to more clearly demonstrate the vending robot decision and travel process, the present embodiment models specific vending scenarios, and the modeled objects include sites, pedestrians, and vending robots, whose specific attributes are shown in fig. 2. Meanwhile, the embodiment adopts a simple 10×10 terrain space, the initial scene and coordinates are shown in fig. 3, wherein the hatched part represents an area where an obstacle exists and the area cannot move; the white portion represents the walkable region; strat is the initial position of the robot, namely the position of the charging pile.
Step 2: setting hot spot characteristics, determining hot spots in a field, and automatically finding and recording the hot spots by using a vending robot;
determining hot spots in the venue; the hot spot is a position prone to robot decision selling, and a coordinate point with one of the following four characteristics is called a hot spot: (1) position ChargePoint (CP) of the charging post; (2) The pedestrian once purchased the commodity at the vending robot's location SalePoint (SP); (3) A location ActorPoint (AP) through which the pedestrian has walked; (4) Selling locations where the robot was not explored during the random walk, also referred to as locations InterestPoint (IP) of interest to the robot; the priority of the feature (1) is highest, and the features (2) and (3) are next lowest, and the feature (4) is lowest;
the vending robot automatically discovers the record hot spot; the position of the charging pile is known as a starting point of the vending robot; the position AP through which the pedestrian walks once is used as a camera for the vending robot to scan a coordinate point where the pedestrian appears to be recorded; the position SP where the pedestrian purchased the commodity at the vending robot is recorded as a coordinate point where the vending robot cabinet door is opened and the transaction is completed; the position IP of interest of the vending robot is known in step 1;
the reason is as follows: (1) Maintaining the survival of the robot is the most important, and the robot needs to calculate whether the electric property can support the distance from the robot to the next decision destination to the charging pile or not when deciding each time, otherwise, the robot is likely to stop in the middle to cause inconvenience; (2) The probability is higher for selling and selling goods at the positions with the past sales history and the crowd gathering history; (3) Some pedestrians often occur at locations that are not within the known range of the robot, so the points of interest should also be hot spots, but have little if any effect on the results if no references are incorporated, so the priorities are lowest.
In this embodiment, in order to have a certain knowledge on the ground, the vending robot performs random walk to obtain data, and a route is shown in fig. 4, where a dotted line is a walking route of a pedestrian; the solid line is the walking route of the vending robot, and the destination is represented by D i Indicating i e {1,2,3. The knowledge of the topography during the walk is determined by the viewing angle of the vending robot, the present embodiment sets the radius of view of the vending robot to 2, and the central angle to 120, which is shown by the matrix under the visible region of the vending robot when the vending robot is about to move rightward at (0, 1), including coordinate points (1, 0), (1, 1), (1, 2), (2, 1), and the rest of the positions are not within the view.
In the process of the robot being moved by the vending robot, pedestrians are moving, even some people purchase goods sold by the robot, three walking routes of the pedestrians are simulated respectively in the embodiment, the starting points are A1, A2 and A3 respectively, and the destination is Da. The vending robot records some hotspots during this period: the AP coordinates include (2, 3), (2, 5); the SP coordinates include (2, 2), (8, 4), and the pedestrian and robot movement trajectories and points of interest are shown in FIG. 4.
When the vending robot reaches each destination, the area S of the current area that the robot has explored is calculated, and it is determined whether the specific gravity of the current total terrain area is greater than a certain threshold, the threshold is set to be at least greater than 0.5 but not too large, otherwise the robot wander phase is too long and may not even stop all the time, so the threshold is set to 0.8 in this embodiment. Until the destination D4, s=86, and the walk phase ends. At this time, the cognition of the robot to the terrain is shown in the following matrix, and the hot spot IP coordinates are the positions of matrix elements of-1.
Step 3: establishing a hotspot decision model, wherein the model is represented by a triplet < P current ,P target T >, wherein P current Coordinate sequence (x) representing all walkable positions of the current vend robot current ,y current ),P target Representing a sequence of coordinates (x target ,y target ) T represents the degree of inclination from the current position to the target position, and t=p current ×P target The method comprises the steps of carrying out a first treatment on the surface of the The dynamic process of the decision model is described as: the current position p of a vending robot c0 ∈P current Vending robot autonomously determined destination position p t0 ∈P target After the vending robot moves to the target position, the target position becomes the position p where the vending robot is located c1 ∈P current And get a tendency T 0 Feeding back;
the tendency degree indicates the tendency degree of the vending robot to go to the target position in the current position decision, and is used for measuring profits possibly obtained from the current position of the robot to the target position; the higher the propensity, the greater the likelihood that the robotic vendor will sell the product, the higher the resulting profit, where the propensity is relative to P current And P target Distance L, P between target Whether or not it is a sales history point H sp Whether it is a crowd history point H ap 、P targett Number of surrounding points of interest H ip In relation, the following formula is shown:
T=η 1 ·L+η 2 ·H+η 3 ·H ip (2)
where h=max { H sp ,H ap ,0},η 1 、η 2 、η 3 Are the influencing factors of three influencing factors respectively, and eta 2 Maximum; because the robot judges whether the position is the most direct basis for people to frequently appear and purchase goods, and the influence factors of each hot spot are different according to the different priorities of each hot spot, the influence degree of obvious feature SP is maximum, the feature AP is secondary and the feature IP is final, so H=max { H } sp ,H ap ,0};
In this embodiment, η1:η2:η3=0.3 is set: 0.5:0.2. next, L, H and H are determined ip Where L should be calculated as the manhattan distance, but since R and L are inversely proportional, L is calculated as the difference between the longest (diagonal) distance on the map and the manhattan distance, i.e., l=width+length- (|x) current -x target |+|y current -y target I), where length is the length of the map, width is the width of the map, x current Is the abscissa of the current position, x target Is the abscissa of the target position, y current Is the ordinate of the current position, y target Is the ordinate of the target position; h ip For interest points, namely, calculating how much of the robot is around the destination in the topography cognition, wherein the theoretical maximum value is 8 (the surrounding 8 directions can walk); h is the tendency of the interest point, and the embodiment sets the selling history point H sp Is inclined to (a)Orientation degree is 10, crowd history point H ap The degree of inclination of (2) was 5.
Step 4: establishing and updating a decision table; the vending robot trains the decision table to guide the action of the robot through the tendency matrix, namely, the decision table is used for deciding which destination is most beneficial to the vending robot;
step 4.1: establishing a tendency matrix and a decision table of the action of the vending robot; the tendency matrix is represented by P current For row P target A matrix describing for a column the inclination of the vending robot from the current position to the target position; the decision table is represented by P current For row P target The decision table established for the column is the same as the trend matrix in order, initialized to 0 and updated according to the trend matrix;
step 4.2: updating a decision table; each element on the decision table is called a decision value D, and the optimal decision value is equal to the sum of the tendency of the current position to the next destination and the maximum decision value of the next destination to the final destination; the updating of each decision value D on the decision table is shown in the following formula:
D t (p ct p tt )=T t (p ct ->p tt )+γmaxD t (p ct+1 p t ) (3)
wherein D is t (p ct p tt )、D t (p ct+1 p t ) Selling a decision value of the robot from the current position coordinate to the destination coordinate and a decision value from the destination coordinate to the final destination coordinate at the t-th update, p ct Is the current position coordinate of the vending robot, p tt Is the destination coordinate, p of the vending robot in the current position coordinate decision ct+1 Is the next destination coordinate after decision, p t Is the final destination coordinate of the travel of the vending robot, T t The tendency of the vending robot to the destination coordinate at the current position coordinate is that gamma is an attenuation factor, gamma is more than or equal to 0 and less than 1, the importance degree of future decisions is represented, benefits brought by the current tendency position are only considered when gamma=0, and benefits brought by future decisions are more emphasized when gamma tends to 1;
the update formula (3) of each decision value D on the decision table is established under the condition that the maximum profit can be obtained by the optimal decision, but when the decision table is established, the left and right of the formula (3) are not equal, and errors shown by the following formula exist:
ΛD(p ct p tt )=T t (p ct ->p tt )+γmaxD(p ct+1 p t )-D t (p ct p tt ) (4)
wherein ΔD (p ct p tt ) Updating errors for the decision value D;
thus, the process of the vending robot to iteratively update the final approach objective step by step towards the optimal decision is represented by equation 5:
D t+1 (p ct p tt )=D(p ct p tt )+αΛD(p ct p tt ) (5)
wherein D' (p) ct p tt ) For iteratively updated decision values, α is the learning rate, 0 < α < 1, the less the effect of retaining previous learning when α tends to 1;
combining equations 4 and 5 results in a final update rule for the decision table, as shown in equation 6:
D t+1 (p ct p tt )=D t (p ct p tt )+α[T t+1 (p ct ->p tt )+γmaxD t (p ct+1 p t )-D t (p ct p tt )] (6)
since the robot decision destination is free of termination conditions, it can operate indefinitely as long as the robot does not damage it, so the decision table is no longer chosen to change as a condition for update termination.
In the embodiment, a tendency matrix of 100×100 is established according to the cognition of the robot to the map; because the matrix is too large to be conveniently displayed, the embodiment only displays a1×100 inclination matrix from the position (8, 0) of the robot at the end of the walk to the other robot walkable positions, as shown in the following matrix, wherein the inclination of the area which is considered unreachable by the vending robot is 0, and other areas are calculated according to the formula of the step 3, and meanwhile, the same-order decision table is initialized to be 0.
[3,0,0,0,4.2,0,0,0,0,5.1,2.9,3.2,3.3,3.6,3.9,4.2,4.5,4.8,5.1,0,0,2.7,8,3.3,3.6,3.9,4.2,4.5,4.8,0,0,2.4,5.2,0,0,0,0,4.2,4.5,4.4,0,2.1,2.4,0,0,0,0,3.9,9.2,4.3,1.5,1.8,4.6,0,0,0,0,3.6,3.9,4,0,1.5,1.8,0,0,0,0,3.3,3.6,3.7,0,1.2,1.5,1.8,2.1,2.4,2.7,3,3.3,3.4,0,0.9,1.2,1.5,1.8,2.1,2.4,2.7,3,3.3,0.3,0.6,0,1.4,1.9,2.2,2.5,0,3.1,2.8]
The decision table is then calculated from the tendency matrix, and since the decision destination is more focused on future trends, γ=0.8. The end result when the decision table is no longer changed is as follows:
[39.8,-1,-1,-1,41,-1,-1,-1,36.8,41.9,39.7,40,40.1,40.4,40.7,41,41.3,41.6,41.9,-1,-1,39.5,47.8,40.1,40.4,40.7,41,41.3,41.6,-1,-1,39.2,44.5,-1,-1,-1,-1,41,41.3,41.2,-1,38.9,41.1,-1,-1,-1,-1,40.7,46,41.1,38.3,38.6,42.9,-1,-1,-1,-1,40.4,40.7,40.8,-1,38.3,38.6,-1,-1,-1,-1,40.1,40.4,40.5,-1,38,38.3,38.6,38.9,39.2,39.5,39.8,40.1,40.2,-1,37.7,38,38.3,38.6,38.9,39.2,39.5,39.8,40.1,37.1,37.4,-1,38.2,38.7,39,39.3,-1,39.9,39.6]。
step 5: the vending robot selects a column of the maximum decision value corresponding to the current position as a destination according to the decision table; then planning a walking path according to the shortest path A algorithm so as to save electric quantity and improve the effective running time of the robot;
step 6: and when the number of hot spots recorded at the current moment of selling the robot is increased to be mu times of the original number, the step 4 of calculating the trend matrix is re-executed until the trend matrix is not changed or the change degree is smaller than the set threshold value. The reason is that as the robot continues to explore the scene, more hot spot data will be collected, and the previous trends are no longer applicable. The decision table also needs to be retrained with the updating of the trend matrix.
In this embodiment, when the vending robot makes an initial decision at the destination D4, it can be seen that the maximum column of the corresponding values in the decision table is 22, so the vending robot autonomously determines D5 (2, 2) as the destination. According to the tendency matrix we probably consider that the location of the robot decision should be (8, 5) as this is the nearest hot spot to the robot, but instead the location of the robot decision is the far point of interest D5 as D5 is the most pedestrian-coming location, and D5 can obtain higher profits in the long term.
According to the analysis of the decision table, the destinations with larger decision values are counted, and the destinations are found to have a common characteristic, namely, the destinations are not far away from the destinations and pedestrians frequently appear/shop positions or areas unknown to the current robot, the long-distance positions tend to be more important when the robot makes a decision, and the positions with larger decision values are shown as the areas circled by thick lines in fig. 5.
In this embodiment, after a period of time has elapsed from the vending robot, step 4-5 is re-executed, the tendency matrix is updated, and the decision table is re-iteratively calculated, so that the destination of the last decision is found to be more advantageous than the previous one.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims (4)

1. A method for autonomously selecting a travel destination by a vending robot, comprising the steps of: the method comprises the following steps:
step 1: modeling and acquiring relevant data of the vending robot; the vending robot travel related data comprises field data, pedestrian attribute data and vending robot attribute data;
step 2: setting hot spot characteristics, determining hot spots in a field, and automatically finding and recording the hot spots by using a vending robot;
step 3: establishing a hotspot decision model, wherein the model is represented by a triplet<P current ,P target ,T>Wherein P is current Coordinate sequence (x) representing all walkable positions of the current vend robot current ,y current ),P target Representing a sequence of coordinates (x target ,y target ) T represents the degree of inclination from the current position to the target position, and t=p current ×P target The method comprises the steps of carrying out a first treatment on the surface of the The dynamic process of the decision model is described as: the current position p of a vending robot c0 ∈P current Vending robot autonomously determined destination position p t0 ∈P target After the vending robot moves to the target position, the target position becomes the position p where the vending robot is located c1 ∈P current And get a tendency T 0 Feeding back;
step 4: establishing and updating a decision table; the vending robot trains through the tendency matrix to obtain a decision table to guide the actions of the robot;
step 4.1: establishing a tendency matrix and a decision table of the action of the vending robot; the tendency matrix is represented by P current For row P target A matrix describing for a column the inclination of the vending robot from the current position to the target position; the decision table is represented by P current For row P target The decision table established for the column is the same as the trend matrix in order, initialized to 0 and updated according to the trend matrix;
step 4.2: updating a decision table; each element on the decision table is called a decision value D, and the optimal decision value is equal to the sum of the tendency of the current position to the next destination and the maximum decision value of the next destination to the final destination; the updating of each decision value D on the decision table is shown in the following formula:
D t (p ct p tt )=T t (p ct ->p tt )+γmaxD t (p ct+1 p t ) (3)
wherein D is t (p ct p tt )、D t (p ct+1 p t ) Selling a decision value of the robot from the current position coordinate to the destination coordinate and a decision value from the destination coordinate to the final destination coordinate at the t-th update, p ct Is the current position coordinate of the vending robot, p tt Is the destination coordinate, p of the vending robot in the current position coordinate decision ct+1 Is the next destination coordinate after decision, p t Is the final destination coordinate of the travel of the vending robot, T t The tendency degree of the vending robot on the current position coordinate to the destination coordinate is that gamma is an attenuation factor which is more than or equal to 0 and less than or equal to gamma<1, representing the importance of future decisions, considering only benefits brought by the current trend position when gamma=0, and paying more attention to benefits brought by future decisions when gamma tends to 1;
the update formula (3) of each decision value D on the decision table is established under the condition that the maximum profit can be obtained by the optimal decision, but when the decision table is established, the left and right of the formula (3) are not equal, and errors shown by the following formula exist:
ΛD(p ct p tt )=T t (p ct ->p tt )+γmaxD t (p ct+1 p t )-D t (p ct p tt ) (4)
wherein ΔD (p ct p tt ) Updating errors for the decision value D;
thus, the process of the vending robot to iteratively update the final approach objective step by step towards the optimal decision is represented by equation 5:
D t+1 (p ct p tt )=D t (p ct p tt )+αΛD(p ct p tt ) (5)
wherein D is t+1 (p ct p tt ) Decision value updated for the t+1st iteration, α is learning rate, 0<α<1;
Combining equations 4 and 5 results in a final update rule for the decision table, as shown in equation 6:
D t+1 (p ct p tt )=D t (p ct p tt )+α[T t+1 (p ct ->p tt )+γmaxD t (p ct+1 p t )-D t (p ct p tt )] (6)
step 5: the vending robot selects a column of the maximum decision value corresponding to the current position as a destination according to the decision table; then planning a walking path according to the shortest path A;
step 6: and when the number of hot spots recorded at the current moment of selling the robot is increased to be mu times of the original number, the step 4 of calculating the trend matrix is re-executed until the trend matrix is not changed or the change degree is smaller than the set threshold value.
2. A method for autonomous selection of a travel destination by a vending robot as recited in claim 1, wherein: the specific method of the step 1 is as follows:
step 1.1: selling robot approach site data modeling; the properties of the field comprise { length, width, barrier, charger }, wherein length and width are the length and width of the field respectively; barrier is an obstacle range in the field, namely an area where neither the robot nor the pedestrian can walk, and is expressed in the form of { P } top ,P down },P top 、P down The upper left and lower right corner coordinates of barrer, respectively; the Charger is the position of a charging pile of the vending robot and is also the starting position of the vending robot, the representation mode is { x, y }, and x and y are respectively the abscissa and the ordinate of the Charger;
step 1.2: modeling pedestrian data; the attribute of the pedestrians comprises { Account, expected, probability, aview }, wherein Account is the number of pedestrians in the field; the estimated is the advancing speed of the pedestrian, and the probability of purchasing operation of the pedestrian in the vending robot is the probability of the pedestrian; aview is the view angle of the pedestrian, the expression mode is { radius, angle }, radius, angle are the radius and central angle of the view angle of the pedestrian respectively;
step 1.3: vending Robot (Robot) data modeling; the attributes of the vending robot comprise { Rcount, rspeed, rview, electric property }, wherein Rcount is the number of the vending robots in a field, rspeed and Rview are respectively the advancing speed and the viewing angle of the vending robots, electric property is the attribute related to the electric quantity of the vending robots, the expression modes are { capacitance speed, charge speed, electric property }, consumeSpeed, chargeSpeed are respectively the power consumption speed and the charging speed of the vending robots, and electric property is the residual electric quantity of the vending robots;
step 1.4: acquiring site data; after setting all parameters related to the field of the vending robot, inputting the width and length of the field to the vending robot, and enabling the vending robot to walk randomly within the range of the given field to continuously acquire related data of the field, namely updating the cognition of the field; the knowledge of this field is expressed in mathematical form as shown in equation (1):
wherein, -1 indicates that the coordinate point is not explored, 0 and 1 indicate that the coordinate point is explored for selling the interested position of the robot, 0 is an obstacle non-walkable area, and 1 is a walkable area; in the formula (1), a left matrix represents initial cognition of the robot to the field, and a right matrix represents cognition of the vending robot after the robot walks for a period of time to update the field;
step 1.5: judging whether the data acquisition stage is finished, specifically: calculating the proportion lambda of the current explored area of the vending robot to the overall area of the field, if lambda is larger than a set threshold value, carrying out an autonomous decision stage by the vending robot, otherwise returning to the step 1.4, and continuing to acquire data by random walk by the vending robot.
3. A method of autonomous selection of a travel destination by a vending robot as recited in claim 2, wherein: the specific method of the step 2 is as follows:
determining hot spots in the venue; the hot spot is a position prone to robot decision selling, and a coordinate point with one of the following four characteristics is called a hot spot: (1) position ChargePoint (CP) of the charging post; (2) The pedestrian once purchased the commodity at the vending robot's location SalePoint (SP); (3) A location ActorPoint (AP) through which the pedestrian has walked; (4) Selling locations where the robot was not explored during the random walk, also referred to as locations InterestPoint (IP) of interest to the robot; the priority of the feature (1) is highest, and the features (2) and (3) are next lowest, and the feature (4) is lowest;
the vending robot automatically discovers the record hot spot; the position of the charging pile is known as a starting point of the vending robot; the position AP through which the pedestrian walks once is used as a camera for the vending robot to scan a coordinate point where the pedestrian appears to be recorded; the position SP where the pedestrian purchased the commodity at the vending robot is recorded as a coordinate point where the vending robot cabinet door is opened and the transaction is completed; the positions IP of interest of the vending robot are known.
4. A method of autonomous selection of a travel destination by a vending robot as claimed in claim 3, wherein: step 3, the tendency degree indicates the tendency degree of the vending robot to go to the target position in the current position decision, and the tendency degree is used for measuring profits possibly obtained from the current position of the robot to the target position; wherein the degree of inclination is equal to P current And P target Distance between L, P target Whether or not it is a sales history point H sp Whether it is a crowd history point H ap 、P target Number of surrounding points of interest H ip In relation, the following formula is shown:
T=η 1 ·L+η 2 ·H+η 3 ·H ip (2)
where h=max { H sp ,H ap ,0},η 1 、η 2 、η 3 Are the influencing factors of three influencing factors respectively, and eta 2 Maximum.
CN202110104046.8A 2021-01-26 2021-01-26 Method for autonomously selecting advancing destination of vending robot Active CN112947412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110104046.8A CN112947412B (en) 2021-01-26 2021-01-26 Method for autonomously selecting advancing destination of vending robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110104046.8A CN112947412B (en) 2021-01-26 2021-01-26 Method for autonomously selecting advancing destination of vending robot

Publications (2)

Publication Number Publication Date
CN112947412A CN112947412A (en) 2021-06-11
CN112947412B true CN112947412B (en) 2023-09-26

Family

ID=76237043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110104046.8A Active CN112947412B (en) 2021-01-26 2021-01-26 Method for autonomously selecting advancing destination of vending robot

Country Status (1)

Country Link
CN (1) CN112947412B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580337A (en) * 2013-10-29 2015-04-29 上海沐风数码科技有限公司 Multi-objective optimization calculating method based on internet-of-things whole-course monitoring of 3G communication technology
CN105953804A (en) * 2016-04-20 2016-09-21 腾讯科技(深圳)有限公司 Method and apparatus for updating data of map
CN108171875A (en) * 2017-11-24 2018-06-15 深兰科技(上海)有限公司 A kind of intelligent distribution pallet piling up method and robot
KR101906046B1 (en) * 2017-04-07 2018-10-08 인천대학교 산학협력단 Method for establishing strategy of purchasing of goods and facility operation and implemeting the same
WO2019151995A1 (en) * 2018-01-30 2019-08-08 Ford Global Technologies, Llc Motion planning for autonomous point-of-sale vehicles
CN110135660A (en) * 2019-05-29 2019-08-16 新石器慧通(北京)科技有限公司 A kind of unmanned sales cart and vending method of cruising
CN110216646A (en) * 2019-05-31 2019-09-10 深兰科技(上海)有限公司 One kind peddling robot

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6584375B2 (en) * 2001-05-04 2003-06-24 Intellibot, Llc System for a retail environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580337A (en) * 2013-10-29 2015-04-29 上海沐风数码科技有限公司 Multi-objective optimization calculating method based on internet-of-things whole-course monitoring of 3G communication technology
CN105953804A (en) * 2016-04-20 2016-09-21 腾讯科技(深圳)有限公司 Method and apparatus for updating data of map
KR101906046B1 (en) * 2017-04-07 2018-10-08 인천대학교 산학협력단 Method for establishing strategy of purchasing of goods and facility operation and implemeting the same
CN108171875A (en) * 2017-11-24 2018-06-15 深兰科技(上海)有限公司 A kind of intelligent distribution pallet piling up method and robot
WO2019151995A1 (en) * 2018-01-30 2019-08-08 Ford Global Technologies, Llc Motion planning for autonomous point-of-sale vehicles
CN110135660A (en) * 2019-05-29 2019-08-16 新石器慧通(北京)科技有限公司 A kind of unmanned sales cart and vending method of cruising
CN110216646A (en) * 2019-05-31 2019-09-10 深兰科技(上海)有限公司 One kind peddling robot

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高校校园自动售货机配货路径优化研究;王艳春;《中国优秀硕士学位论文全文数据库信息科技辑》(第1期);全文 *

Also Published As

Publication number Publication date
CN112947412A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
US20220292543A1 (en) Pop-up retial franchising and complex econmic system
US10482636B2 (en) Data visualization methods
US9445234B2 (en) Wireless-based identification of person tracks
Ge et al. Vision-based analysis of small groups in pedestrian crowds
Shaw et al. Knowledge management and data mining for marketing
US8342928B1 (en) Market-based simulation game and system
US11421993B2 (en) Human vision-empowered 3D scene analysis tools
CN107316344A (en) A kind of method that Roam Path is planned in virtual reality fusion scene
CN110135644A (en) A kind of robot path planning method for target search
Rhim et al. Assessing potential threats to incumbent brands: New product positioning under price competition in a multisegmented market
WO2020088136A1 (en) Customer path tracking method and system
Dubey et al. Identifying indoor navigation landmarks using a hierarchical multi-criteria decision framework
CN110472999A (en) Passenger flow pattern analysis method and device based on subway and shared bicycle data
CN109145127A (en) Image processing method and device, electronic equipment and storage medium
JP2024008869A (en) Method and device for multi-target multi-camera head tracking
CN110292773A (en) A kind of role movement follower method and device calculate equipment and storage medium
CN112947412B (en) Method for autonomously selecting advancing destination of vending robot
Zhou et al. Robust global localization by using global visual features and range finders data
JP2020095404A (en) Information processing method, program, information processing apparatus and method for generating learned model
Nader et al. Smart out-of-home advertising using artificial intelligence and GIS data
JPH11259570A (en) Business management simulation method and business management decision making support system using the simulation method
Ettehadieh Systematic parameter optimization and application of automated tracking in pedestrian-dominant situations
KR101200872B1 (en) Method and system for trade area analysis by using travel time and o-d matrix
Shin et al. Recommendation in Offline Stores: A Gamification Approach for Learning the Spatiotemporal Representation of Indoor Shopping
Li et al. Robust Construction of Spatial-Temporal Scene Graph Considering Perception Failures for Autonomous Driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant