CN116880269A - Intelligent body control method and device, electronic equipment and readable storage medium - Google Patents

Intelligent body control method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN116880269A
CN116880269A CN202310793705.2A CN202310793705A CN116880269A CN 116880269 A CN116880269 A CN 116880269A CN 202310793705 A CN202310793705 A CN 202310793705A CN 116880269 A CN116880269 A CN 116880269A
Authority
CN
China
Prior art keywords
action
trained
primitive
actions
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310793705.2A
Other languages
Chinese (zh)
Inventor
赵哲一
于非
贺颖
孙喜龙
施斯
陈加壹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Priority to CN202310793705.2A priority Critical patent/CN116880269A/en
Publication of CN116880269A publication Critical patent/CN116880269A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • G05B19/042Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
    • G05B19/0423Input/output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/25Pc structure of the system
    • G05B2219/25257Microcontroller
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an agent control method, an agent control device, electronic equipment and a readable storage medium. The method comprises the following steps: acquiring a user command and visual information of an external environment; performing target detection on the visual information to obtain target detection information; splicing the user command and the target detection information to obtain a first instruction; using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set; and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed. The application realizes the interaction between the LLM and the external environment in the reasoning process of adding the visual information into the LLM, and can select the most favorable decision for the intelligent agent based on the environment.

Description

Intelligent body control method and device, electronic equipment and readable storage medium
Technical Field
The application belongs to the technical field of intelligent control, and particularly relates to an agent control method, an agent control device, electronic equipment and a readable storage medium.
Background
Currently, large language models (Large Language Model, LLM) are artificial intelligence models that aim to understand and generate human language. They train on a large amount of text data and can perform a wide range of tasks including text summarization, translation, emotion analysis, and so forth.
But large language models lack real world experience and do not provide the most powerful decisions to agents in a particular environment.
Disclosure of Invention
The embodiment of the application provides an agent control method, an agent control device, an electronic device, a readable storage medium and a computer program product, which can solve the problem that a large language model cannot provide the most powerful decision for an agent under a specific environment.
In a first aspect, an embodiment of the present application provides an agent control method, including:
acquiring a user command and visual information of an external environment;
performing target detection on the visual information to obtain target detection information;
splicing the user command and the target detection information to obtain a first instruction;
using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set;
and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed.
In one embodiment, the method further comprises:
inputting the first instruction into a trained action prescreening model to obtain a related action set output by the trained action prescreening model, wherein the related action set comprises the primitive actions related to the user command;
scoring each primitive action in the related action set by using a trained evaluation model to obtain a first scoring result;
using a first instruction to instruct the trained LLM to score each primitive action in the related action set, and obtaining a second scoring result;
and determining the action to be executed according to the first scoring result and the second scoring result.
In one embodiment, before the inputting the first instruction into the trained motion screening model, the method further comprises:
obtaining a positive sample and a negative sample, wherein the positive sample comprises actions selected by the trained LLM for a task sample, and the negative sample comprises actions not selected by the trained LLM for the task sample;
and training an action preliminary screening model by utilizing the positive sample and the negative sample to obtain the trained action preliminary screening model.
In one embodiment, the instructing the trained LLM to score each of the primitive actions in the related set of actions using a first instruction to obtain a second scoring result includes:
inputting the first instruction and each primitive action in the related action set to the trained LLM to obtain a second scoring result output by the trained LLM;
after the intelligent agent executes the action to be executed, the following steps are executed in a circulating way until the user instruction is completed;
and inputting the first instruction, the executed actions of the intelligent agent and the primitive actions in the related action set to the trained LLM to obtain the second scoring result output by the trained LLM.
In one embodiment, the determining the action to be performed according to the first scoring result and the second scoring result includes:
determining the score of each primitive action in the related action set according to the first scoring result and the second scoring result;
and selecting the primitive actions corresponding to the target scores, wherein the primitive actions corresponding to the target scores are the actions to be executed, and the target scores are the highest scores among the scores of the primitive actions in the related action set.
In one embodiment, inputting the first instruction, the executed actions of the agent, and each of the primitive actions in the set of related actions into the trained LLM comprises:
splicing the first instruction, the executed actions of the intelligent agent and the primitive actions in the relevant action set to obtain a reconstruction instruction;
and inputting the reconfiguration instruction to the trained LLM.
In one embodiment, said scoring each of said primitive actions in said set of related actions using a trained evaluation model to obtain a first scoring result comprises:
inputting each primitive action of the related action set into the trained evaluation model to obtain a probability value of each primitive action in the related action set output by the trained evaluation model, wherein the probability value is used for representing the probability of successful execution of the primitive action in the external environment, and the first scoring result comprises the probability value of each primitive action in the related action set.
In a second aspect, an embodiment of the present application provides an agent control device, including:
the acquisition module is used for acquiring a user command and visual information of an external environment;
the target detection module is used for carrying out target detection on the visual information to obtain target detection information;
the splicing module is used for splicing the user command and the target detection information to obtain a first instruction;
the execution module is used for indicating the trained LLM to select actions to be executed from all primitive actions in the primitive action set by using the first instruction;
and the method is also used for sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any one of the first aspects when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as in any of the first aspects above.
In a fifth aspect, embodiments of the present application provide a computer program product for, when run on an electronic device, causing the electronic device to perform the method of any one of the first aspects.
Compared with the prior art, the embodiment of the application has the beneficial effects that:
the embodiment of the application obtains the visual information of the user command and the external environment; performing target detection on the visual information to obtain target detection information; splicing the user command and the target detection information to obtain a first instruction; using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set; and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed, and realizing the interaction between the LLM and the external environment in the reasoning process of adding the visual information into the LLM, so that the decision which is most beneficial to the intelligent agent can be selected based on the environment. + -
It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for controlling an agent according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a second flow chart of an agent control method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an intelligent agent control device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
A large language model (Large Language Model, LLM) is an artificial intelligence model that aims to understand and generate human language. They train on a large amount of text data and can perform a wide range of tasks including text summarization, translation, emotion analysis, and so forth. LLM has good reasoning ability and can act as a "brain" capable of decomposing high-level language instructions into low-level language instructions that can be executed by the "cerebellum". For example, the high-level language instruction is { I want to eat sandwiches }, and the corresponding low-level instruction for decomposition is {1. Pick up the plate, 2. Put down the plate, 3. Pick up lettuce, 4. Put down lettuce, … }.
However, LLM has no "eyes" and "hands", and cannot interact well with the environment, so that the environment information cannot participate in the reasoning process of LLM, so that LLM lacks experience in the real world, and cannot provide the most powerful decision to the agent under the specific environment.
In view of the above, the application provides an agent control method, which comprises the steps of obtaining a user command and visual information of an external environment; performing target detection on the visual information to obtain target detection information; splicing the user command and the target detection information to obtain a first instruction; using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set; and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed, and realizing the interaction between the LLM and the external environment in the reasoning process of adding the visual information into the LLM, so that the decision which is most beneficial to the intelligent agent can be selected based on the environment.
Fig. 1 is a schematic flow chart of a method for controlling an agent according to an embodiment of the application. As shown in fig. 1, the method includes:
s11: and acquiring visual information of the user command and the external environment.
In an application, when a user requires an agent to provide a service, a user command is input to an electronic device. The user command may be inputted by voice or may be inputted in text form.
The electronic device collects visual information of the external environment through a sensor or an image pickup device. The external environment is an environment where the object operated by the agent is located. The visual information may be photographed images, videos, and the like.
S12: and performing target detection on the visual information to obtain target detection information.
Wherein the target detection information includes a detection frame of the object and a position of the object.
By way of example, the visual information is an image, the image is subject to object detection by attention and convolution using YoloV5 (You Only Look Once, single-stage object detection algorithm), and the object detection information is output in text form. The target detection information is represented as { apple (x 1, y 1), banana (x 2, y 2), dish (x 3, y 3) }.
S13: and splicing the user command and the target detection information to obtain a first instruction.
For example, the user command is { apply me a salad }, the target detection information is { apple (x 1, y 1), banana (x 2, y 2), dish (x 3, y 3) }, and the first instruction after stitching is { apply me a salad, hereinafter the position of the object in the image, apple (x 1, y 1), banana (x 2, y 2), dish (x 3, y 3) }.
S14: and using the first instruction to instruct the trained LLM to select an action to be executed from the primitive actions in the primitive action set.
In one possible implementation, the input format of LLM is { user command+target detection information+primitive actions in primitive action set }, the first instruction is { user command+target detection information }.
Illustratively, based on the LLM input format, LLM input is { do me a salad, below is the object's position in the image, apple (x 1, y 1), banana (x 2, y 2), dish (x 3, y 3), take dish, …, take lettuce, put down dish }.
In an application, in the electronic device, the LLM selects an action to be performed from among the respective primitive actions of the input primitive action set based on the first instruction.
S15: and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed.
In an application, an agent is an intelligent device that can perform an action, such as a robotic arm, robot, or the like.
It can be understood that the visual information of the environment is processed to obtain the target detection information in the text description form, and then the target detection information is added to the user command to obtain the first instruction. The LLM then generates the most executable instructions in the current context based on the first instructions to complete the user command. This allows LLM to be free of visual language alignment without compromising the inference capabilities of LLM.
The embodiment of the application comprises the steps of obtaining a user command and visual information of an external environment; performing target detection on the visual information to obtain target detection information; splicing the user command and the target detection information to obtain a first instruction; using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set; and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed, and realizing the interaction between the LLM and the external environment in the reasoning process of adding the visual information into the LLM, so that the decision which is most beneficial to the intelligent agent can be selected based on the environment.
Fig. 2 is a schematic flow chart of a second method for controlling an agent according to an embodiment of the present application. As shown in fig. 2, the method further includes:
s21: and inputting a first instruction into the trained motion preliminary screening model to obtain a related motion set output by the trained motion preliminary screening model.
Wherein the set of related actions includes primitive actions related to the user command.
In the application, in the electronic equipment, the trained action preliminary screening model selects primitive actions which possibly occur in the execution process of the user command from all primitive actions according to the first instruction, and excludes irrelevant primitive actions. The primitive actions related to the user command are screened out through the trained action preliminary screening model, influence of the irrelevant primitive actions on reasoning of the trained LLM is reduced, and the fact that the irrelevant primitive actions are selected by the trained LLM due to certain factors is reduced.
In one possible implementation, primitive actions and excluded primitive actions of LLM model screening are used as sample training action prescreening models.
Before step S21, the method further includes:
s31: and acquiring a positive sample and a negative sample, wherein the positive sample comprises actions selected by the trained LLM for the task sample, and the negative sample comprises actions not selected by the trained LLM for the task sample.
In the application, the primitive actions generated in the task sample executing process are screened by the trained LLM model in advance, the primitive actions screened by the trained LLM model are taken as positive samples, and the primitive actions selected by the trained LLM model are taken as negative samples.
S32: and training the action preliminary screening model by using the positive sample and the negative sample to obtain a trained action preliminary screening model.
In the application, when the loss value of the model is smaller than a preset loss value, a trained action primary screening model is obtained.
S22: and scoring each primitive action in the related action set by using the trained evaluation model to obtain a first scoring result.
In one possible implementation, step S22 includes:
and inputting each primitive action of the related action set into the trained evaluation model, and obtaining a probability value of each primitive action in the related action set output by the trained evaluation model, wherein the probability value is used for representing the probability of successful execution of the primitive action in an external environment, and the first scoring result comprises the probability value of each primitive action in the related action set.
Illustratively, the trained assessment model is a model trained using reinforcement learning. Specifically, in the given environment, primitive actions are evaluated. If the primitive action is successfully executed in the environment, giving rewards to the evaluation model, and otherwise giving punishments. And when the loss value of the evaluation model is smaller than the preset loss value, obtaining a trained evaluation model.
S23: and using the first instruction to instruct the trained LLM to score each primitive action in the related action set, and obtaining a second scoring result.
Wherein the score of the primitive action in the second scoring result represents the benefit of the primitive action on the user command execution.
In one possible implementation, step S23 includes:
s231: and inputting each primitive action in the first instruction and related action set to the trained LLM to obtain a second scoring result output by the trained LLM.
The LLM input format is { user command+target detection information+primitive action in related action set }, and the first instruction is { user command+target detection information }.
In the application, the trained LLM scores each primitive action in the relevant action set according to the first instruction, and a second scoring result is obtained.
S232: after the agent executes the action to be executed, the following steps are circularly executed until the user command is completed.
S233: and inputting the first instruction, the executed actions of the intelligent agent and the actions of each primitive in the related action set to the trained LLM to obtain a second scoring result output by the trained LLM.
In the application, if the agent completes the user command after executing one action, steps S232 and S233 are not executed. If the agent does not complete the user command after executing the action once, steps S232 and S233 are executed.
Specifically, inputting the first instruction, the executed actions of the agent, and each primitive action in the related action set into the trained LLM, comprising:
s41: and splicing the first instruction, the executed actions of the intelligent agent and the primitive actions in the relevant action set to obtain a reconstruction instruction.
The reconfiguration instruction is represented as { user command+target detection information+1. The agent has performed action+2. The agent has performed action+ … +related action set primitive action }, and the first instruction is { user command+target detection information }.
S42: the reconfiguration instructions are input to the trained LLM.
In the application, the reconstruction instruction is input to the trained LLM, and the trained LLM scores each primitive action in the relevant action set according to the reconstruction instruction and the executed action to obtain a second scoring result.
S24: and determining an action to be executed according to the first scoring result and the second scoring result.
In one possible implementation, step S24 includes:
s241: and determining the score of each primitive action in the related action set according to the first scoring result and the second scoring result.
S242: and selecting primitive actions corresponding to the target scores.
The primitive actions corresponding to the target scores are actions to be executed, and the target score is the highest score among scores of all primitive actions in the related action set.
In the application, after the integration model receives the first scoring result and the second scoring result, the first scoring result and the second scoring result are integrated by utilizing the hyper-parameters of the model, and then primitive actions corresponding to the target score are selected. Wherein, the super parameter can be set manually or can be obtained through training.
Illustratively, the get user command is { I want to eat sandwiches }. The image is processed to obtain target detection information { ham (x 1, y 1), dish (x 2, y 2), lettuce (x 3, y 3), omelette (x 4, y 4), human (x 5, y 5) }. The user command and the target detection information are spliced to obtain a first instruction { I want to eat a sandwich, hereinafter the position of an object in an image, ham (x 1, y 1), dish (x 2, y 2), lettuce (x 3, y 3), omelette (x 4, y 4), human (x 5, y 5) }.
Assuming 1000 primitive actions, the trained action preliminary screening model receives the input primitive actions, and the probability of the user command execution process is marked on each primitive action, and a relevant action set is obtained by selecting primitive actions with the probability larger than a threshold value.
The primitive actions of the first instruction and the related action set are spliced to obtain { I want to eat a sandwich, the following is the position of an object in an image, ham (x 1, y 1), dish (x 2, y 2), lettuce (x 3, y 3), omelette (x 4, y 4), human (x 5, y 5) }, please score [ take dish, sandwich, take ham, take bowl, … ] }.
And inputting the spliced instruction into the trained evaluation model to obtain a first scoring result output by the trained evaluation model. The scoring flow is exemplified as follows: the trained evaluation model in the current environment judges that if the dish taking is performed, collision may occur or the mechanical arm cannot reach the position of the dish, but the score of the trained evaluation model for the dish taking is higher than that of the dish taking, that is, the probability value of the dish taking is higher than that of the dish taking.
And inputting the spliced instruction into the trained LLM to obtain a second scoring result output by the trained LLM. The scoring flow is exemplified as follows: in the current environment, the trained LLM scores low for holding sandwiches, high for holding dishes, and lower for holding bowls than for holding dishes due to no sandwiches in the environment.
And combining the first scoring result and the second scoring result to determine that the action to be executed is to take a bowl. And sending the action to be executed to the intelligent body for holding the bowl.
Because the environment is not sandwiched, a plurality of actions are required to be executed to complete the user command, and the step of selecting the action to be executed is repeatedly executed until the user command is completed:
the first instruction, the executed actions of the agent and the primitive actions of the related action set are spliced to obtain { I want to eat the sandwich, the following are the positions of objects in the image, ham (x 1, y 1), dish (x 2, y 2), lettuce (x 3, y 3), fried egg (x 4, y 4), human (x 5, y 5), take a bowl, please score [ take a dish, take a sandwich, take a ham, take a bowl, … ] }. And inputting the result into the trained LLM to obtain a second scoring result output by the trained LLM. Wherein the top scoring of the drop bowl is the highest.
And determining the action to be executed as putting down the bowl according to the second scoring result and the first scoring result.
The first instruction, the executed actions of the agent and the primitive actions of the related action set are spliced to obtain { I want to eat the sandwich, the following are the positions of objects in the image, ham (x 1, y 1), dish (x 2, y 2), lettuce (x 3, y 3), fried egg (x 4, y 4), human (x 5, y 5), take a bowl 1, put a bowl 2. Put a bowl, please score [ take a dish, take a sandwich, take a ham, take a bowl … ] }. And inputting the result into the trained LLM to obtain a second scoring result output by the trained LLM. Wherein, the lettuce is scored highest.
And determining that the action to be executed is lettuce according to the second scoring result and the first scoring result. And so on until the task is successful.
According to the embodiment of the application, the first instruction is input into the trained action preliminary screening model to obtain the relevant action set output by the trained action preliminary screening model, the relevant action set comprises primitive actions relevant to the user command, influence of irrelevant primitive actions on reasoning of the trained LLM is reduced, and probability that the trained LLM selects irrelevant primitive actions due to certain factors is reduced.
Scoring each primitive action in the related action set by utilizing the trained evaluation model to obtain a first scoring result; using the first instruction to instruct the trained LLM to score each primitive action in the related action set, and obtaining a second scoring result; and determining the action to be executed according to the first scoring result and the second scoring result, and combining the two scoring results to determine the action to be executed, so that the action to be executed can be accurately selected.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Corresponding to the methods described in the above embodiments, only those relevant to the embodiments of the present application are shown for convenience of explanation.
Fig. 3 is a schematic structural diagram of an agent control device according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:
an acquisition module 10, configured to acquire a user command and visual information of an external environment;
the target detection module 11 is used for performing target detection on the visual information to obtain target detection information;
a splicing module 12, configured to splice the user command and the target detection information to obtain a first instruction;
an execution module 13, configured to instruct the trained LLM to select an action to be executed from among the primitive actions in the primitive action set by using the first instruction;
and the method is also used for sending the action to be executed to the intelligent agent so that the intelligent agent can execute the action to be executed.
In one embodiment, the apparatus further comprises:
and the action prescreening module is used for inputting a first instruction into the trained action prescreening model to obtain a relevant action set output by the trained action prescreening model, wherein the relevant action set comprises primitive actions relevant to the user command.
And the evaluation module is used for scoring each primitive action in the related action set by utilizing the trained evaluation model to obtain a first scoring result.
The execution module is further used for indicating the trained LLM to score each primitive action in the related action set by using the first instruction, and obtaining a second scoring result; and determining an action to be executed according to the first scoring result and the second scoring result.
In one embodiment, the apparatus further comprises:
the training module is used for acquiring a positive sample and a negative sample, wherein the positive sample comprises actions selected by the trained LLM aiming at the task sample, and the negative sample comprises actions not selected by the trained LLM aiming at the task sample; and training the action preliminary screening model by using the positive sample and the negative sample to obtain a trained action preliminary screening model.
In one embodiment, the evaluation module is specifically configured to input each primitive action of the related action set into the trained evaluation model, obtain a probability value of each primitive action in the related action set output by the trained evaluation model, where the probability value is used to characterize a probability that the primitive action is successfully executed in an external environment, and the first scoring result includes the probability value of each primitive action in the related action set.
In one embodiment, the execution module is specifically configured to input each primitive action in the first instruction and related action set to the trained LLM, and obtain a second scoring result output by the trained LLM; after the intelligent agent executes the action to be executed, the following steps are circularly executed until the user instruction is completed; and inputting the first instruction, the executed actions of the intelligent agent and the actions of each primitive in the related action set to the trained LLM to obtain a second scoring result output by the trained LLM.
In one embodiment, the execution module is specifically configured to determine a score of each primitive action in the related action set according to the first scoring result and the second scoring result; and selecting primitive actions corresponding to the target scores, wherein the primitive actions corresponding to the target scores are used as actions to be executed, and the target score is the highest score in the scores of all primitive actions in the related action set.
In one embodiment, the execution module is specifically configured to splice the first instruction, the executed actions of the agent, and each primitive action in the relevant action set, to obtain a reconstructed instruction; the reconfiguration instructions are input to the trained LLM.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic apparatus 2 of this embodiment includes: at least one processor 20 (only one is shown in fig. 4), a memory 21 and a computer program 22 stored in the memory 21 and executable on the at least one processor 20, the processor 20 implementing the steps in any of the various method embodiments described above when executing the computer program 22.
The electronic device 2 may be a computing device such as a desktop computer or a cloud server. The electronic device 2 may include, but is not limited to, a processor 20, a memory 21. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 2 and is not meant to be limiting of the electronic device 2, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The processor 20 may be a central processing unit (Central Processing Unit, CPU), and the processor 20 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 21 may in some embodiments be an internal storage unit of the electronic device 2, such as a hard disk or a memory of the electronic device 2. The memory 21 may in other embodiments also be an external storage device of the electronic device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the electronic device 2. The memory 21 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 21 may also be used for temporarily storing data that has been output or is to be output.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the respective method embodiments described above.
Embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform the steps of the method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. An agent control method, comprising:
acquiring a user command and visual information of an external environment;
performing target detection on the visual information to obtain target detection information;
splicing the user command and the target detection information to obtain a first instruction;
using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set;
and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed.
2. The method as recited in claim 1, further comprising:
inputting the first instruction into a trained action prescreening model to obtain a related action set output by the trained action prescreening model, wherein the related action set comprises the primitive actions related to the user command;
scoring each primitive action in the related action set by using a trained evaluation model to obtain a first scoring result;
using a first instruction to instruct the trained LLM to score each primitive action in the related action set, and obtaining a second scoring result;
and determining the action to be executed according to the first scoring result and the second scoring result.
3. The method of claim 2, wherein before inputting the first instruction into the trained motion screening model, further comprising:
obtaining a positive sample and a negative sample, wherein the positive sample comprises actions selected by the trained LLM for a task sample, and the negative sample comprises actions not selected by the trained LLM for the task sample;
and training an action preliminary screening model by utilizing the positive sample and the negative sample to obtain the trained action preliminary screening model.
4. The method of claim 2, wherein the instructing the trained LLM to score each of the primitive actions in the set of related actions with a first instruction to obtain a second scoring result comprises:
inputting the first instruction and each primitive action in the related action set to the trained LLM to obtain a second scoring result output by the trained LLM;
after the intelligent agent executes the action to be executed, the following steps are executed in a circulating way until the user instruction is completed;
and inputting the first instruction, the executed actions of the intelligent agent and the primitive actions in the related action set to the trained LLM to obtain the second scoring result output by the trained LLM.
5. The method according to any of claims 2 to 4, wherein said determining said action to be performed based on said first scoring result and said second scoring result comprises:
determining the score of each primitive action in the related action set according to the first scoring result and the second scoring result;
and selecting the primitive actions corresponding to the target scores, wherein the primitive actions corresponding to the target scores are the actions to be executed, and the target scores are the highest scores among the scores of the primitive actions in the related action set.
6. The method of claim 4, wherein inputting the first instruction, the executed actions of the agent, and the respective primitive actions in the set of related actions into the trained LLM comprises:
splicing the first instruction, the executed actions of the intelligent agent and the primitive actions in the relevant action set to obtain a reconstruction instruction;
and inputting the reconfiguration instruction to the trained LLM.
7. The method of claim 2, wherein scoring each of the primitive actions in the set of related actions using a trained evaluation model to obtain a first scoring result comprises:
inputting each primitive action of the related action set into the trained evaluation model to obtain a probability value of each primitive action in the related action set output by the trained evaluation model, wherein the probability value is used for representing the probability of successful execution of the primitive action in the external environment, and the first scoring result comprises the probability value of each primitive action in the related action set.
8. An agent control device, comprising:
the acquisition module is used for acquiring a user command and visual information of an external environment;
the target detection module is used for carrying out target detection on the visual information to obtain target detection information;
the splicing module is used for splicing the user command and the target detection information to obtain a first instruction;
the execution module is used for indicating the trained LLM to select actions to be executed from all primitive actions in the primitive action set by using the first instruction;
and the method is also used for sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.
CN202310793705.2A 2023-06-29 2023-06-29 Intelligent body control method and device, electronic equipment and readable storage medium Pending CN116880269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310793705.2A CN116880269A (en) 2023-06-29 2023-06-29 Intelligent body control method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310793705.2A CN116880269A (en) 2023-06-29 2023-06-29 Intelligent body control method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN116880269A true CN116880269A (en) 2023-10-13

Family

ID=88261465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310793705.2A Pending CN116880269A (en) 2023-06-29 2023-06-29 Intelligent body control method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116880269A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556864A (en) * 2024-01-12 2024-02-13 阿里云计算有限公司 Information processing method, electronic device, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556864A (en) * 2024-01-12 2024-02-13 阿里云计算有限公司 Information processing method, electronic device, and storage medium
CN117556864B (en) * 2024-01-12 2024-04-16 阿里云计算有限公司 Information processing method, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
US11867599B2 (en) Apparatus and methods for controlling attention of a robot
US9440352B2 (en) Apparatus and methods for robotic learning
EP3688553A1 (en) Automated evaluation of human embryos
CN110991658A (en) Model training method and device, electronic equipment and computer readable storage medium
CN113222123B (en) Model training method, device, equipment and computer storage medium
CN110458217A (en) Image-recognizing method and device, eye fundus image recognition methods and electronic equipment
CN116880269A (en) Intelligent body control method and device, electronic equipment and readable storage medium
US11710552B2 (en) Method and system for refining label information
CN114925320B (en) Data processing method and related device
CN116569180A (en) Generating data based on a pre-trained model using a generated countermeasure model
WO2024175079A1 (en) Model quantization method and related device
CN113392867A (en) Image identification method and device, computer equipment and storage medium
CN113780394B (en) Training method, device and equipment for strong classifier model
CN114492664A (en) Pig checking method, device, equipment and storage medium
US11710068B2 (en) Labeling a dataset
Lippmann Understanding and Applying Deep Learning
Danish Beef Cattle Instance Segmentation Using Mask R-Convolutional Neural Network
Reithaug Employing deep learning for fish recognition
JP7365261B2 (en) computer systems and programs
CN112822426B (en) Method and equipment for generating high dynamic range image
CN114359206B (en) Blood vessel identification method and device, computer readable storage medium and electronic equipment
CN118446283A (en) Analog decision method, system, vehicle-end equipment and storage medium
Payandan et al. Design of MobileNet algorithm to optimize image classification in Convolutional Neural Network (CNN) and implementation on FPGA
KR20230075468A (en) Automated aneuploidy screening using interventional ensembles
Bharath A Maximum Entropy Deep Reinforcement Learning Neural Tracker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination