CN114167749A

CN114167749A - Control method of football robot and related device

Info

Publication number: CN114167749A
Application number: CN202111361254.2A
Authority: CN
Inventors: 陈海波; 程巍; 吉文雅; 盛沿桥; 王帅
Original assignee: Shenlan Shengshi Technology Suzhou Co ltd
Current assignee: Shenlan Shengshi Technology Suzhou Co ltd
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-03-11

Abstract

The application provides a control method and a related device of a football robot, wherein the control method comprises the following steps: acquiring image data in real time by using image acquisition equipment; aiming at each football robot, acquiring an action strategy of the football robot by using the image data and an intelligent player model corresponding to the football robot; based on the action strategy of football robot, control football robot's action to for the real person sportsman provides the sparring function. The control method and the related device of the football robot solve the problems that when the football robot in the prior art is accompanied, the simulation degree is insufficient, and the accompanying effect is reduced.

Description

Control method of football robot and related device

Technical Field

The application relates to the technical field of deep learning, in particular to a control method of a football robot and a related device.

Background

The robot has been widely applied to a plurality of fields of competitive sports and national quality education, strives to improve the technical level of various domestic robots, especially designs an intelligent robot mechanism suitable for various sports training purposes and improves the control level thereof, exerts and popularizes the application and popularization advantages thereof, and further promotes the efficient development and the steady promotion of national health level and competitive sports career.

For example, in chinese patent CN109976330A, an image processing module is used to process an image acquired by a soccer robot to generate image information; processing the image information by using a positioning module to generate position and distance information of the football robot; the position and distance information is sent to other football robots and instruction information sent by other football robots is received by using a communication module; and receiving the position and distance information and the instruction information by using a motion state management and decision module of the motion controller, and controlling the action of the football robot according to the position and distance information and the instruction information. The technical scheme provided by the invention patent can realize team cooperation of a plurality of football robots.

However, in the prior art, the soccer robot can only complete simple soccer actions, and cannot realize sufficient simulation for real players, so that when the soccer robot in the prior art is used for accompanying, the simulation degree is insufficient, and the accompanying effect is reduced.

Disclosure of Invention

The application aims to provide a control method and a related device of a football robot, and the control method and the related device solve the problems that in the prior art, when the football robot carries out accompanying training, the simulation degree is insufficient, and the accompanying training effect is reduced.

The purpose of the application is realized by adopting the following technical scheme:

in a first aspect, the present application provides a method of controlling a soccer robot, the method for controlling one or more soccer robots to provide a partner practice function for a live player, the method comprising: acquiring image data in real time by using image acquisition equipment; aiming at each football robot, acquiring an action strategy of the football robot by using the image data and an intelligent player model corresponding to the football robot; based on the action strategy of football robot, control football robot's action to for the real person sportsman provides the sparring function.

The technical scheme has the advantages that image data are collected, action strategies are obtained by utilizing the image data and the intelligent player model, and actions of the football robot are controlled based on the action strategies; because the acquisition of this action strategy has utilized the intelligent player model, consequently, the action of the football robot that this action strategy corresponds is high to the simulation degree of real sportsman's action, can bring highly anthropomorphic experience for the real sportsman who accompanies, has improved the result of accompanying the training.

In some optional embodiments, the image data is obtained by the image acquisition device shooting one or more objects of football, the live players and other football robots. The technical scheme has the advantages that the image data is obtained by shooting one or more objects in the real-time player and other football robots, the image data reflects the real scene during the partner training, and the experience of the real player is improved.

In some optional embodiments, the training process of the corresponding intelligent player model of the soccer robot is as follows: acquiring an action strategy of the football robot by using training data and a preset reinforcement learning model, wherein the action strategy of the football robot is used for simulating the action of the football robot as a player in a computer virtual environment, and the training data comprises a historical football video; determining a reward value of an action strategy of the soccer robot; and updating the parameters of the reinforcement learning model based on the reward value of the action strategy of the football robot to obtain the intelligent player model.

The technical scheme has the advantages that the action strategy is obtained by utilizing training data comprising the historical football video and a preset reinforcement learning model, then the action of a player corresponding to the action strategy is simulated in a virtual environment, the reward value of the action strategy is further determined, and the intelligent player model is trained according to the reward value; the obtained intelligent player model has high simulation degree.

In some optional embodiments, the determining the reward value of the action strategy of the soccer robot includes: and determining the reward value of the action strategy of the football robot by taking the goal of creating the maximum number of shooting opportunities. The technical scheme has the advantages that the reward value aims at creating the most shooting opportunities, is closer to the scene of real football confrontation compared with other targets, such as more pass opportunities, higher shooting accuracy and the like, and can simulate the action habits of real players to the maximum extent; moreover, because the intelligent player model is obtained by utilizing historical football video training, the goal of creating the largest number of shooting opportunities by the current football robot is taken, so that the action of each football robot is closer to a specific individual player, especially the real action habit of a football star with distinct characteristics, the individual characteristics of the football robot for accompanying training are increased, the intelligent player model is more popular with consumer groups of non-professional players, and the business prospect is higher.

In some optional embodiments, the determining the reward value of the action strategy of the soccer robot includes: and determining the reward value of the action strategy of the football robot by taking the number of the first shooting opportunities corresponding to the creation of the preset difficulty mode as a target. The beneficial effects of this technical scheme lie in, predetermine the degree of difficulty mode through setting up in order to correspond shooting chance quantity for the true sportsman of specific level can be simulated to the sportsman model, has improved the fidelity degree of sportsman model.

In some optional embodiments, the preset difficulty pattern is each of a plurality of difficulty patterns, each of the golfer models is matched with one of the plurality of difficulty patterns, and the method further comprises: obtaining a configuration difficulty mode, wherein the configuration difficulty mode is one of the plurality of difficulty modes; and determining a smart player model matched with the configuration difficulty mode as the smart player model corresponding to the football robot.

This technical scheme's beneficial effect lies in, acquires a configuration degree of difficulty mode in a plurality of degree of difficulty modes to with the intelligent sportsman model that the configuration degree of difficulty mode matches as the intelligent sportsman model that football robot corresponds, can match obtain with a assorted intelligent sportsman model in a plurality of degree of difficulty modes, when training accompanying, can select suitable degree of difficulty according to the training situation, adapt to different real sportsman or training scene or training cycle, improved training accompanying efficiency.

In some optional embodiments, the determining the reward value of the action strategy of the soccer robot includes: and when the plurality of intelligent player models are jointly trained, determining the reward value of the action strategy of each football robot by taking the number of second shooting opportunities corresponding to the preset difficulty mode created by all the football robots as a target. The technical scheme has the advantages that when the football robots are used for combined training, the number of second shooting opportunities corresponding to the preset difficulty mode created by all the football robots is used as a target, the reward value of the action strategy of each football robot is determined, compared with the number of first shooting opportunities, the number of second shooting opportunities is closer to the real state of the team sports of football, a plurality of intelligent player models trained by the number of second shooting opportunities can bring real experience to real players during combined training, and the accompanying effect is improved; and the number of the second shooting opportunities is more suitable for the accompanying training of a specific team, the requirements of consumer groups of professional athletes are better met, and compared with the method of engaging a real specific team for training, the cost is greatly reduced. In addition, the football game can simulate a direct competitor team, thereby really playing the role of improving the fighting capacity of the accompanied football team by scientific and technological means.

In some optional embodiments, the method is used for controlling a plurality of the soccer robots to provide the player with an accompanying function, the preset difficulty pattern is each of a plurality of difficulty patterns, each of the golfer models is matched with one of the difficulty patterns, and the method further comprises: obtaining a configuration difficulty mode, wherein the configuration difficulty mode is one of the plurality of difficulty modes; determining a plurality of intelligent player models matched with the configuration difficulty pattern as intelligent player models corresponding to the football robots so as to obtain an intelligent player model corresponding to each football robot; and the plurality of intelligent player models matched with the configuration difficulty mode correspond to the plurality of football robots one to one.

In some alternative embodiments, the plurality of golfer models that match the configured difficulty pattern comprises a plurality of identical golfer models. The beneficial effect of this technical scheme lies in, a plurality of same intelligent player models have improved the efficiency of model training, also can satisfy specific consumer's individualized demand, for example can simulate a plurality of C roc or a plurality of meixi for oneself's training.

In some optional embodiments, the obtaining, by using the training data and a preset reinforcement learning model, an action strategy of the soccer robot includes: acquiring a parameter value of a basic ability parameter of each player corresponding to the configuration difficulty mode; adjusting the training data based on the parameter value of the basic ability parameter of each player corresponding to the configuration difficulty mode; and inputting the adjusted training data into the preset reinforcement learning model to obtain the action strategy of the football robot.

The technical scheme has the advantages that the basic capability parameter value corresponding to the configuration difficulty mode is obtained, the training data are adjusted, and therefore the action strategy of the football robot is obtained, for example, the action data of the football player obtained from the historical football video can be adjusted, the difference of the basic capability of the football player in the specific configuration difficulty mode is fully considered, and the simulation degree of the intelligent player model is improved.

In some alternative embodiments, the basic capability parameters include one or more of speed of movement, kicking power, and angular deviation. The technical scheme has the beneficial effects that various basic capability parameters are provided, and the simulation degree of the intelligent player model is improved.

In some optional embodiments, the reinforcement learning algorithm employs a PPO algorithm; the training data further comprises motion data for one or more players in the historical soccer video; the action data comprises one or more of ball carrying data, ball passing data, ball receiving data and shooting data; the dribbling data comprises one or more of forward data, translation data and backward data; the pass data is for indicating one or more of a target player and a target direction, the target player being one of the players other than the player. The shot data may include, for example, shot times and shot right times, the pass data may include, for example, long pass times and success probabilities, and the motion data may further include one or more of attack assistance times, break due times, and man-pass times.

The technical scheme has the beneficial effects that the PPO algorithm obtains the optimal balance among sampling efficiency, algorithm performance and complexity of realization and debugging; the action data of various players are provided, the football game scene is closer to the reality, and the fidelity of the intelligent player model is improved.

In a second aspect, the present application further provides a control device for a soccer robot, the device is used for controlling one or more soccer robots to provide a partner training function for a live player, the device includes: the acquisition module is used for acquiring image data in real time by utilizing image acquisition equipment; the decision-making module is used for acquiring an action strategy of each football robot by using the image data and the intelligent player model corresponding to the football robot; and the control module is used for controlling the action of the football robot based on the action strategy of the football robot so as to provide the partner training function for the real player.

In some optional embodiments, the image data is obtained by the image acquisition device shooting one or more objects of football, the live players and other football robots.

In some optional embodiments, the determining the reward value of the action strategy of the soccer robot includes: and determining the reward value of the action strategy of the football robot by taking the goal of creating the maximum number of shooting opportunities.

In some optional embodiments, the determining the reward value of the action strategy of the soccer robot includes: and determining the reward value of the action strategy of the football robot by taking the number of the first shooting opportunities corresponding to the creation of the preset difficulty mode as a target.

In some optional embodiments, the determining the reward value of the action strategy of the soccer robot includes: and when the plurality of intelligent player models are jointly trained, determining the reward value of the action strategy of each football robot by taking the number of second shooting opportunities corresponding to the preset difficulty mode created by all the football robots as a target.

In some alternative embodiments, the plurality of golfer models that match the configured difficulty pattern comprises a plurality of identical golfer models.

In some alternative embodiments, the basic capability parameters include one or more of speed of movement, kicking power, and angular deviation.

In some optional embodiments, the reinforcement learning algorithm employs a PPO algorithm; the training data further comprises motion data for one or more players in the historical soccer video; the action data comprises one or more of ball carrying data, ball passing data, ball receiving data and shooting data; the dribbling data comprises one or more of forward data, translation data and backward data; the pass data is for indicating one or more of a target player and a target direction, the target player being one of the players other than the player.

In a third aspect, the present application further provides a soccer robot, where the soccer robot includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of any one of the control methods of the soccer robot when executing the computer program.

In a fourth aspect, the present application further provides a control system of a soccer robot, where the system includes an image acquisition device, a remote control device, and one or more soccer robots; the image acquisition equipment is used for acquiring image data in real time and sending the image data to the remote control equipment; the remote control device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the control method of any one of the football robots when executing the computer program, wherein for each football robot, the remote control device sends a control instruction corresponding to the football robot so as to control the action of the football robot; the football robot includes the controller, the controller is used for receiving control command, and according to control command control football robot's action.

In some optional embodiments, the image capture device is disposed on the soccer robot, and the soccer robot further includes a wheeled mobile chassis, a batting mallet, and a pool cue.

In some alternative embodiments, the image capture device is a depth camera.

In a fifth aspect, the present application further provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the steps of the control method for a soccer robot according to any one of the above aspects.

The foregoing description is only an overview of the technical solutions of the present application, and in order to enable those skilled in the art to more clearly understand the technical solutions of the present application and to implement the technical solutions according to the content of the description, the following description is provided with preferred embodiments of the present application and the accompanying detailed drawings.

Drawings

The present application is further described below with reference to the drawings and examples.

Fig. 1 is a schematic flowchart of a control method of a soccer robot according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of a training process of a model of a smart player provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of determining a reward value of an action strategy of a soccer robot according to an embodiment of the present disclosure;

fig. 4 is a partial schematic flow chart of another control method of a soccer robot according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart for acquiring an action strategy of a soccer robot according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a control device of a soccer robot according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a soccer robot according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a program product for implementing a control method of a soccer robot according to an embodiment of the present application.

Detailed Description

The present application is further described with reference to the accompanying drawings and the detailed description, and it should be noted that, in the present application, the embodiments or technical features described below may be arbitrarily combined to form a new embodiment without conflict.

It should be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Referring to fig. 1, in a first aspect, the present application provides a control method of a soccer robot for controlling one or more soccer robots to provide a partner training function to a live player, the method including steps S101 to S102.

Wherein, the real person's sportsman is that the human sportsman who utilizes football robot to carry out football skill or tactics training, is the signing player of football club for example, for example C roc, plum west, also can be football training student, for example the student that the football class of colleges and universities was used for class etc..

Step S101: and acquiring image data in real time by using an image acquisition device.

Step S102: aiming at each football robot, the action strategy of the football robot is obtained by utilizing the image data and the intelligent player model corresponding to the football robot.

The intelligent player model can be arranged in a storage medium in the football robot and operated by a processor in the football robot, or the intelligent player model can be arranged at the cloud end of a local communication network consisting of a plurality of football robots, is operated by a special computer, sends the action strategy to the football robot and controls the action of the football robot; the action strategy is information for guiding the soccer robot to execute a soccer action such as moving, stopping, shooting, passing, etc., and the action strategy may be simple text information, for example, "move to coordinates 50, 40 and perform a shooting action at the coordinates", and the soccer robot executes an action by itself according to the information; the detailed action indication information may be, for example, "the belt ball moves in the north direction, the chassis moving motor rotates at 5000 rpm, the brake is performed after 5 seconds of operation, and then the shooting mechanism performs the shooting action, and the shooting ball speed is 50 m/s. "

Step S103: based on the action strategy of football robot, control football robot's action to for the real person sportsman provides the sparring function.

Therefore, image data are collected, action strategies are obtained by utilizing the image data and the intelligent player model, and actions of the football robot are controlled based on the action strategies; because the acquisition of this action strategy has utilized the intelligent player model, consequently, the action of the football robot that this action strategy corresponds is high to the simulation degree of real sportsman's action, can bring highly anthropomorphic experience for the real sportsman who accompanies, has improved the result of accompanying the training.

In some embodiments, the image data is captured by the image capture device of one or more objects of a soccer ball, the human player, and other soccer robots.

The image acquisition equipment is, for example, electronic equipment such as a visible light camera, an infrared camera and a millimeter wave radar which can detect the surrounding environment of the soccer robot; the visual angle range of the image acquisition equipment can be 120 degrees right ahead of the football robot, and at the moment, the visual angle of the football robot is the same as that of a real human player, so that the visual angle is closer to the perception level of the real human player on the court environment, and the simulation degree is improved; also can be 360 looks around, at this moment, football robot is stronger to the perception of whole environment, can closely cooperate with other football robots, improves training intensity.

Therefore, the image data is obtained by shooting one or more objects in the real-person player and other football robots in real time, the image data reflects the real scene during the partner training, and the experience of the real-person player is improved.

Referring to fig. 2, in some embodiments, the training process of the corresponding intelligent player model of the soccer robot may include the following steps S201 to S203.

Step S201: utilize training data and predetermined reinforcement learning model, acquire football robot's action strategy, football robot's action strategy is arranged in simulating in computer virtual environment football robot is as sportsman's action, training data includes historical football video.

Step S202: determining a reward value of an action strategy of the soccer robot.

Step S203: and updating the parameters of the reinforcement learning model based on the reward value of the action strategy of the football robot to obtain the intelligent player model.

The updating of the parameter may be to the direction of the maximum value of the bonus value, or to the extent that the bonus value exceeds the preset bonus value.

Therefore, a motion strategy is obtained by utilizing training data comprising a historical football video and a preset reinforcement learning model, then the motion of a player corresponding to the motion strategy is simulated in a virtual environment, further the reward value of the motion strategy is determined, and an intelligent player model is trained according to the reward value; the obtained intelligent player model has high simulation degree.

In some embodiments, the step S202 may include step S301.

Step S301: and determining the reward value of the action strategy of the football robot by taking the goal of creating the maximum number of shooting opportunities.

Therefore, the reward value aims at creating the most shooting opportunities, is closer to the scene of real football confrontation compared with other targets, such as more pass opportunities, higher shooting accuracy and the like, and can simulate the action habits of real players to the maximum extent; moreover, because the intelligent player model is obtained by utilizing historical football video training, the goal of creating the largest number of shooting opportunities by the current football robot is taken, so that the action of each football robot is closer to a specific individual player, especially the real action habit of a football star with distinct characteristics, the individual characteristics of the football robot for accompanying training are increased, the intelligent player model is more popular with consumer groups of non-professional players, and the business prospect is higher.

In some embodiments, the step S202 may include the step S401.

Step S401: and determining the reward value of the action strategy of the football robot by taking the number of the first shooting opportunities corresponding to the creation of the preset difficulty mode as a target.

The preset difficulty mode can refer to the difficulty degree brought to the real player by the football robot, and can be classified into simple, medium and difficult; it may also refer to preset training strengths, such as low strength, medium strength, and high strength training; the difficulty ratings may also be more finely differentiated by a score range of 1-100, e.g., a score of 1 indicates the lowest difficulty and a score of 100 indicates the highest difficulty.

From this, predetermine the degree of difficulty mode through the setting in order to correspond shooting chance quantity for the true sportsman of specific level can be simulated to the intelligent sportsman model, has improved the fidelity degree of intelligent sportsman model.

Referring to fig. 3, in some embodiments, the preset difficulty pattern is each of a plurality of difficulty patterns, each of the golfer models is matched with one of the difficulty patterns, and the method may further include steps S204 to S205.

Step S204: obtaining a configuration difficulty mode, wherein the configuration difficulty mode is one of the plurality of difficulty modes.

Step S205: and determining a smart player model matched with the configuration difficulty mode as the smart player model corresponding to the football robot.

From this, acquire one configuration degree of difficulty mode in a plurality of degree of difficulty modes to with the intelligent sportsman model that matches with configuration degree of difficulty mode as the intelligent sportsman model that football robot corresponds, can match obtain with a assorted intelligent sportsman model in a plurality of degree of difficulty modes, when training accompanying, can select suitable degree of difficulty according to the training situation, adapt to different real sportsman or training scene or training cycle, improved training accompanying efficiency.

In some embodiments, the step S202 may include the step S501.

Step S501: and when the plurality of intelligent player models are jointly trained, determining the reward value of the action strategy of each football robot by taking the number of second shooting opportunities corresponding to the preset difficulty mode created by all the football robots as a target.

Therefore, when the football robots are used for combined training, the number of second shooting opportunities corresponding to the preset difficulty mode is created by all the football robots as a target, the reward value of the action strategy of each football robot is determined, compared with the number of first shooting opportunities, the number of second shooting opportunities is closer to the real state of team sports of football, a plurality of intelligent player models obtained by training the number of second shooting opportunities can bring more real experience to real players during combined training, and the accompanying effect is improved; and the number of the second shooting opportunities is more suitable for the training of a specific team, and more meets the requirements of consumer groups of professional athletes, for example, the competition of the royal club and the Manjie club with the team of the real player can be simulated, and compared with the competition of playing by the real royal and Manjibs, the cost is greatly reduced. In addition, a direct competitor team can be simulated, for example, a Chinese team can be used for simulating an opponent of an Asian pre-selection match, a Japanese team and a Korean team, and each player in the opponent team is simulated by using a plurality of football robots, so that the effect of improving the fighting force of the accompanied football team by scientific and technological means is really achieved.

Referring to fig. 4, in some embodiments, the method is used for controlling a plurality of soccer robots to provide an accompanying function for the real players, the preset difficulty pattern is each of a plurality of difficulty patterns, each of the golfer models is matched with one of the difficulty patterns, and the method may further include steps S206 to S207.

Step S206: obtaining a configuration difficulty mode, wherein the configuration difficulty mode is one of the plurality of difficulty modes.

Step S207: and determining a plurality of intelligent player models matched with the configuration difficulty mode to serve as a plurality of intelligent player models corresponding to the football robot so as to obtain each intelligent player model corresponding to the football robot.

And the plurality of intelligent player models matched with the configuration difficulty mode correspond to the plurality of football robots one to one.

In some embodiments, the plurality of golfer models matched to the configuration difficulty pattern comprises a plurality of identical golfer models.

Therefore, the same intelligent player models improve the efficiency of model training, and can meet the personalized requirements of specific consumers, such as simulation of multiple C-ROE or multiple Meixi for training.

Referring to fig. 5, in some embodiments, the step S201 may include steps S601 to S603.

Step S601: and acquiring a parameter value of the basic ability parameter of each player corresponding to the configuration difficulty mode.

Step S602: and adjusting the training data based on the parameter value of the basic ability parameter of each player corresponding to the configuration difficulty mode.

Step S603: and inputting the adjusted training data into the preset reinforcement learning model to obtain the action strategy of the football robot.

Therefore, the basic capability parameter value corresponding to the configuration difficulty mode is obtained, the training data are adjusted, the action strategy of the football robot is obtained, for example, the action data of the football player obtained from the historical football video can be adjusted, the difference of the basic capability of the football player in the specific configuration difficulty mode is fully considered, and the simulation degree of the intelligent player model is improved.

In some embodiments, the basic capability parameters may include one or more of speed of movement, kicking power, and angular deviation. Therefore, various basic capability parameters are provided, and the simulation degree of the intelligent player model is improved.

In some embodiments, the reinforcement learning algorithm may employ a PPO algorithm; the training data may also include motion data for one or more players in the historical soccer video; the motion data may include one or more of dribbling data, passing data, catching data, and shooting data; the dribbling data may include one or more of forward data, pan data, and reverse data; the pass data is for indicating one or more of a target player and a target direction, the target player being one of the players other than the player.

The shooting data can comprise shooting times and shooting right times, the pass data can comprise long pass times and success probability, and the action data can further comprise one or more of attack assisting times, breaking times and person passing times.

Therefore, the PPO algorithm obtains the optimal balance among sampling efficiency, algorithm performance and complexity of realization and debugging; the action data of various players are provided, the football game scene is closer to the reality, and the fidelity of the intelligent player model is improved.

In some application scenarios, training obtains a smart player model, and the process of training a live player using the smart player model is as follows:

the method comprises the steps of recording the behaviors of real players in a historical football video according to a structured data format, wherein the specific behaviors comprise action decisions such as ball carrying (advancing, translating and retreating), ball passing (the target is other real players or other football robots, or the target is a certain direction), ball receiving, shooting and the like. And training by utilizing a PPO algorithm in reinforcement learning to obtain algorithm models (namely a intelligent player model and the same below) of a plurality of intelligent bodies (namely intelligent players and the same below).

In the process of training to obtain a plurality of agents, simulating the behavior of the plurality of agents as virtual players in a computer virtual environment, and training by using the goal of creating more shooting opportunity quantity as far as possible as the action decision reward value of each intelligent player. In the virtual environment, the basic ability parameters of the virtual player can correspond to the moving speed, the kicking force and the angle deviation. The linkage partner training difficulty mode of the multi-agent is matched by the combination of the behavior mode and the basic capability parameter.

The robot is characterized in that an algorithm model of an intelligent body obtained through training is transferred to a robot in an actual scene, a training system is composed of a service control platform and a plurality of training robots, the robot is only provided with a wheel type moving chassis, a batting mallet, a batting rod, a depth camera and a motion controller with a mobile communication module, the controller receives an instruction from the service control platform to control the behavior of the robot, the robot body does not perform operation and reasoning, and only the camera is used for identifying football, a human body and other robots.

Therefore, through the process, the robot partner training function with high personification is realized.

Referring to fig. 6, the present application further provides a control device for a soccer robot, where the device is used to control one or more soccer robots to provide a partner training function for a live player.

The specific implementation manner is consistent with the implementation manner and the achieved technical effect described in the embodiment of the control method of the soccer robot, and some contents are not described again.

The device comprises: the acquisition module 101 is used for acquiring image data in real time by using image acquisition equipment; a decision module 102, configured to, for each soccer robot, obtain an action strategy of the soccer robot by using the image data and an intelligent player model corresponding to the soccer robot; and the control module 103 is used for controlling the action of the football robot based on the action strategy of the football robot so as to provide the partner training function for the real player.

In some embodiments, the training process of the corresponding intelligent player model of the soccer robot may be as follows: acquiring an action strategy of the football robot by using training data and a preset reinforcement learning model, wherein the action strategy of the football robot is used for simulating the action of the football robot as a player in a computer virtual environment, and the training data comprises a historical football video; determining a reward value of an action strategy of the soccer robot; and updating the parameters of the reinforcement learning model based on the reward value of the action strategy of the football robot to obtain the intelligent player model.

In some embodiments, the determining the reward value of the action strategy of the soccer robot may include: and determining the reward value of the action strategy of the football robot by taking the goal of creating the maximum number of shooting opportunities.

In some embodiments, the determining the reward value of the action strategy of the soccer robot may include: and determining the reward value of the action strategy of the football robot by taking the number of the first shooting opportunities corresponding to the creation of the preset difficulty mode as a target.

In some embodiments, the preset difficulty pattern is each of a plurality of difficulty patterns, each of the golfer models is matched with one of the plurality of difficulty patterns, and the method may further include: obtaining a configuration difficulty mode, wherein the configuration difficulty mode is one of the plurality of difficulty modes; and determining a smart player model matched with the configuration difficulty mode as the smart player model corresponding to the football robot.

In some embodiments, the determining the reward value of the action strategy of the soccer robot may include: and when the plurality of intelligent player models are jointly trained, determining the reward value of the action strategy of each football robot by taking the number of second shooting opportunities corresponding to the preset difficulty mode created by all the football robots as a target.

In some embodiments, the method for controlling a plurality of the soccer robots may provide the player with an accompanying function, the preset difficulty pattern is each of a plurality of difficulty patterns, each of the golfer models matches one of the plurality of difficulty patterns, and the method may further include: obtaining a configuration difficulty mode, wherein the configuration difficulty mode is one of the plurality of difficulty modes; determining a plurality of intelligent player models matched with the configuration difficulty pattern as intelligent player models corresponding to the football robots so as to obtain an intelligent player model corresponding to each football robot; and the plurality of intelligent player models matched with the configuration difficulty mode correspond to the plurality of football robots one to one.

In some embodiments, the plurality of golfer models that match the configuration difficulty pattern may comprise a plurality of identical golfer models.

In some embodiments, the obtaining the action strategy of the soccer robot by using the training data and the preset reinforcement learning model may include: acquiring a parameter value of a basic ability parameter of each player corresponding to the configuration difficulty mode; adjusting the training data based on the parameter value of the basic ability parameter of each player corresponding to the configuration difficulty mode; and inputting the adjusted training data into the preset reinforcement learning model to obtain the action strategy of the football robot.

In some embodiments, the basic capability parameters may include one or more of speed of movement, kicking power, and angular deviation.

In some embodiments, the reinforcement learning algorithm may employ a PPO algorithm; the training data further comprises motion data for one or more players in the historical soccer video; the action data comprises one or more of ball carrying data, ball passing data, ball receiving data and shooting data; the dribbling data comprises one or more of forward data, translation data and backward data; the pass data is for indicating one or more of a target player and a target direction, the target player being one of the players other than the player.

Referring to fig. 7, the present embodiment also provides a soccer robot 200, where the soccer robot 200 includes at least one memory 210, at least one processor 220, and a bus 230 connecting different platform systems.

The memory 210 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)211 and/or cache memory 212, and may further include Read Only Memory (ROM) 213.

The memory 210 further stores a computer program, and the computer program can be executed by the processor 220, so that the processor 220 executes the steps of the control method of the soccer robot in the embodiment of the present application, and a specific implementation manner of the control method of the soccer robot is consistent with the implementation manner and the achieved technical effect described in the embodiment of the control method of the soccer robot, and some contents are not described again.

Memory 210 may also include a utility 214 having at least one program module 215, such program modules 215 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Accordingly, the processor 220 may execute the computer programs described above, and may execute the utility 214.

Bus 230 may be a local bus representing one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or any other type of bus structure.

The soccer robot 200 may also communicate with one or more external devices 240, such as a keyboard, pointing device, bluetooth device, etc., and may also communicate with one or more devices capable of interacting with the soccer robot 200, and/or any devices (e.g., routers, modems, etc.) that enable the soccer robot 200 to communicate with one or more other computing devices. Such communication may be through input-output interface 250. Also, the soccer robot 200 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the soccer robot 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with soccer robot 200, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

The application also provides a control system of the football robot, wherein the system comprises image acquisition equipment, remote control equipment and one or more football robots; the image acquisition equipment is used for acquiring image data in real time and sending the image data to the remote control equipment; the remote control device comprises a memory and a processor, the memory stores a computer program, the processor implements the steps of the control method of any one of the soccer robots when executing the computer program, the specific implementation manner is consistent with the implementation manner and the achieved technical effect described in the embodiments of the control method of the soccer robot, and some contents are not described again.

For each football robot, the remote control equipment sends a control instruction corresponding to the football robot so as to control the action of the football robot; the football robot includes the controller, the controller is used for receiving control command, and according to control command control football robot's action.

In some embodiments, the image capture device may be disposed on the soccer robot, the soccer robot further including a wheeled mobile chassis, a ball striking mallet, and a ball stopping rod.

In some embodiments, the image capture device may be a depth camera.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing a computer program, and when the computer program is executed, the steps of the control method of the soccer robot in the embodiment of the present application are implemented, and a specific implementation manner of the steps is consistent with the implementation manner and the achieved technical effect described in the embodiment of the control method of the soccer robot, and some contents are not described again.

Fig. 8 shows a program product 300 for implementing the above-mentioned control method of the soccer robot according to the present embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be executed on a terminal device, such as a personal computer. However, the program product 300 of the present invention is not so limited, and in this application, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Program product 300 may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that can communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

While the present application is described in terms of various aspects, including exemplary embodiments, the principles of the invention should not be limited to the disclosed embodiments, but are also intended to cover various modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of controlling a soccer robot, the method for controlling one or more soccer robots to provide a partner practice function to a live player, the method comprising:

acquiring image data in real time by using image acquisition equipment;

aiming at each football robot, acquiring an action strategy of the football robot by using the image data and an intelligent player model corresponding to the football robot;

based on the action strategy of football robot, control football robot's action to for the real person sportsman provides the sparring function.

2. The method for controlling a soccer robot according to claim 1, wherein said image data is obtained by said image capturing device capturing one or more objects of a soccer ball, said human player, and other soccer robots.

3. The control method of the soccer robot as claimed in claim 1, wherein the training process of the corresponding golfer model of the soccer robot is as follows:

acquiring an action strategy of the football robot by using training data and a preset reinforcement learning model, wherein the action strategy of the football robot is used for simulating the action of the football robot as a player in a computer virtual environment, and the training data comprises a historical football video;

determining a reward value of an action strategy of the soccer robot;

and updating the parameters of the reinforcement learning model based on the reward value of the action strategy of the football robot to obtain the intelligent player model.

4. The method for controlling a soccer robot according to claim 3, wherein said determining a reward value of an action strategy of the soccer robot comprises:

and determining the reward value of the action strategy of the football robot by taking the goal of creating the maximum number of shooting opportunities.

5. The method for controlling a soccer robot according to claim 3, wherein said determining a reward value of an action strategy of the soccer robot comprises:

and determining the reward value of the action strategy of the football robot by taking the number of the first shooting opportunities corresponding to the creation of the preset difficulty mode as a target.

6. The control method of a soccer robot according to claim 5, wherein said preset difficulty pattern is each of a plurality of difficulty patterns, each of said golfer models being matched with one of said plurality of difficulty patterns, said method further comprising:

obtaining a configuration difficulty mode, wherein the configuration difficulty mode is one of the plurality of difficulty modes;

and determining a smart player model matched with the configuration difficulty mode as the smart player model corresponding to the football robot.

7. The method for controlling a soccer robot according to claim 3, wherein said determining a reward value of an action strategy of the soccer robot comprises:

and when the plurality of intelligent player models are jointly trained, determining the reward value of the action strategy of each football robot by taking the number of second shooting opportunities corresponding to the preset difficulty mode created by all the football robots as a target.

8. The method as claimed in claim 7, wherein the method is used for controlling a plurality of soccer robots to provide the player with an accompanying function, the preset difficulty pattern is each of a plurality of difficulty patterns, each of the plurality of golfer models is matched with one of the plurality of difficulty patterns, and the method further comprises:

determining a plurality of intelligent player models matched with the configuration difficulty pattern as intelligent player models corresponding to the football robots so as to obtain an intelligent player model corresponding to each football robot;

9. The method of controlling a soccer robot according to claim 8, wherein the plurality of golfer models matching the configuration difficulty pattern comprises a plurality of identical golfer models.

10. The method for controlling a soccer robot according to claim 6 or 8, wherein the obtaining the action strategy of the soccer robot using the training data and the pre-set reinforcement learning model comprises:

acquiring a parameter value of a basic ability parameter of each player corresponding to the configuration difficulty mode;

adjusting the training data based on the parameter value of the basic ability parameter of each player corresponding to the configuration difficulty mode;

and inputting the adjusted training data into the preset reinforcement learning model to obtain the action strategy of the football robot.

11. The method of controlling a soccer robot according to claim 10, wherein said basic capability parameters include one or more of moving speed, kicking force and angular deviation.

12. The control method of a soccer robot according to claim 3, wherein the reinforcement learning algorithm employs a PPO algorithm;

the training data further comprises motion data for one or more players in the historical soccer video;

the action data comprises one or more of ball carrying data, ball passing data, ball receiving data and shooting data;

the dribbling data comprises one or more of forward data, translation data and backward data;

the pass data is for indicating one or more of a target player and a target direction, the target player being one of the players other than the player.

13. A control device of a soccer robot, the device being used to control one or more soccer robots to provide a partner training function for a real player, the device comprising:

the acquisition module is used for acquiring image data in real time by utilizing image acquisition equipment;

the decision-making module is used for acquiring an action strategy of each football robot by using the image data and the intelligent player model corresponding to the football robot;

and the control module is used for controlling the action of the football robot based on the action strategy of the football robot so as to provide the partner training function for the real player.

14. A soccer robot, characterized in that the soccer robot comprises a memory and a processor, the memory storing a computer program, the processor implementing the steps of the control method of the soccer robot according to any one of claims 1-12 when executing the computer program.

15. A control system of a football robot is characterized by comprising an image acquisition device, a remote control device and one or more football robots;

the image acquisition equipment is used for acquiring image data in real time and sending the image data to the remote control equipment;

the remote control apparatus includes a memory storing a computer program, and a processor implementing a control method of the soccer robot according to any one of claims 1 to 12 when the computer program is executed by the processor, wherein the remote control apparatus transmits a control instruction corresponding to the soccer robot for each soccer robot to control an action of the soccer robot;

the football robot includes the controller, the controller is used for receiving control command, and according to control command control football robot's action.

16. The control system of a soccer robot according to claim 15, wherein said image capturing device is disposed on said soccer robot, said soccer robot further comprising a wheeled mobile chassis, a hitting mallet, and a stopping rod.

17. The control system of a soccer robot according to claim 16, wherein said image capture device is a depth camera.

18. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps of the control method of the soccer robot according to any one of claims 1-12.