CN116880269A

CN116880269A - Intelligent body control method and device, electronic equipment and readable storage medium

Info

Publication number: CN116880269A
Application number: CN202310793705.2A
Authority: CN
Inventors: 赵哲一; 于非; 贺颖; 孙喜龙; 施斯; 陈加壹
Original assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Current assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-10-13

Abstract

The application provides an agent control method, an agent control device, electronic equipment and a readable storage medium. The method comprises the following steps: acquiring a user command and visual information of an external environment; performing target detection on the visual information to obtain target detection information; splicing the user command and the target detection information to obtain a first instruction; using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set; and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed. The application realizes the interaction between the LLM and the external environment in the reasoning process of adding the visual information into the LLM, and can select the most favorable decision for the intelligent agent based on the environment.

Description

An intelligent body control method, device, electronic equipment and readable storage medium

技术领域Technical field

本申请属于智能控制技术领域，尤其涉及一种智能体控制方法、装置、电子设备及可读存储介质。This application belongs to the field of intelligent control technology, and in particular relates to an intelligent body control method, device, electronic equipment and readable storage medium.

背景技术Background technique

目前，大型语言模型(Large Language Model，LLM)是一种人工智能模型，旨在理解和生成人类语言。它们在大量的文本数据上进行训练，可以执行广泛的任务，包括文本总结、翻译、情感分析等等。Currently, a Large Language Model (LLM) is an artificial intelligence model designed to understand and generate human language. They are trained on large amounts of text data and can perform a wide range of tasks, including text summarization, translation, sentiment analysis, and more.

但大语言模型缺乏现实世界的经验，无法向在特定环境下的智能体提供最有力的决策。However, large language models lack real-world experience and cannot provide the most powerful decisions to agents in specific environments.

发明内容Contents of the invention

本申请实施例提供了一种智能体控制方法、装置、电子设备、可读存储介质及计算机程序产品，可以解决大语言模型无法向在特定环境下的智能体提供最有力的决策的问题。Embodiments of the present application provide an intelligent agent control method, device, electronic device, readable storage medium and computer program product, which can solve the problem that large language models cannot provide the most powerful decision-making for intelligent agents in specific environments.

第一方面，本申请实施例提供了一种智能体控制方法，包括：In the first aspect, embodiments of the present application provide an intelligent agent control method, including:

获取用户命令及外部环境的视觉信息；Obtain user commands and visual information from the external environment;

对所述视觉信息进行目标检测，获得目标检测信息；Perform target detection on the visual information to obtain target detection information;

将所述用户命令和所述目标检测信息进行拼接，获得第一指令；Splice the user command and the target detection information to obtain the first instruction;

利用第一指令，指示已训练LLM从基元动作集的各个基元动作中选取待执行动作；Using the first instruction, instruct the trained LLM to select an action to be executed from each primitive action in the primitive action set;

向智能体发送所述待执行动作，以使所述智能体执行所述待执行动作。The action to be executed is sent to the intelligent agent, so that the intelligent agent executes the action to be executed.

在一个实施例中，所述方法还包括：In one embodiment, the method further includes:

将所述第一指令输入至已训练动作初筛模型，获得所述已训练动作初筛模型输出的相关动作集，所述相关动作集包括与所述用户命令相关的所述基元动作；Input the first instruction to the trained action preliminary screening model to obtain a relevant action set output by the trained action preliminary screening model, where the relevant action set includes the primitive action related to the user command;

利用已训练评估模型，对所述相关动作集中各个所述基元动作进行打分，获得第一打分结果；Using the trained evaluation model, score each of the primitive actions in the relevant action set to obtain the first scoring result;

利用第一指令，指示所述已训练LLM对所述相关动作集中各个所述基元动作进行打分，获得第二打分结果；Using the first instruction, instruct the trained LLM to score each of the primitive actions in the relevant action set, and obtain a second scoring result;

根据所述第一打分结果和所述第二打分结果确定所述待执行动作。The action to be performed is determined based on the first scoring result and the second scoring result.

在一个实施例中，所述将所述第一指令输入至已训练动作初筛模型之前，还包括：In one embodiment, before inputting the first instruction to the trained action preliminary screening model, the method further includes:

获取正样本及负样本，所述正样本包括所述已训练LLM针对任务样本选取的动作，所述负样本包括所述已训练LLM针对所述任务样本未选取的动作；Obtain positive samples and negative samples, the positive samples include actions selected by the trained LLM for the task sample, and the negative samples include actions not selected by the trained LLM for the task sample;

利用所述正样本及所述负样本，训练动作初筛模型，获得所述已训练动作初筛模型。The positive sample and the negative sample are used to train the action preliminary screening model, and the trained action preliminary screening model is obtained.

在一个实施例中，所述利用第一指令，指示所述已训练LLM对所述相关动作集中各个所述基元动作进行打分，获得第二打分结果，包括：In one embodiment, using the first instruction to instruct the trained LLM to score each of the primitive actions in the relevant action set to obtain a second scoring result includes:

将所述第一指令和所述相关动作集中各个所述基元动作输入至所述已训练LLM，获得所述已训练LLM输出的第二打分结果；Input the first instruction and each of the primitive actions in the related action set to the trained LLM, and obtain the second scoring result output by the trained LLM;

在所述智能体执行所述待执行动作后，循环执行下述步骤，直至所述用户指令完成；After the agent executes the action to be executed, the following steps are executed in a loop until the user instruction is completed;

将所述第一指令、所述智能体的已执行动作和所述相关动作集中各个所述基元动作输入至所述已训练LLM，获得所述已训练LLM输出的所述第二打分结果。The first instruction, the executed action of the agent, and each of the primitive actions in the related action set are input to the trained LLM, and the second scoring result output by the trained LLM is obtained.

在一个实施例中，所述根据所述第一打分结果和所述第二打分结果确定所述待执行动作，包括：In one embodiment, determining the action to be performed based on the first scoring result and the second scoring result includes:

根据第一打分结果和所述第二打分结果，确定所述相关动作集中各个所述基元动作的分数；Determine the score of each primitive action in the relevant action set according to the first scoring result and the second scoring result;

选取目标分数对应的所述基元动作，所述目标分数对应的所述基元动作为所述待执行动作，所述目标分数为在所述相关动作集的各个所述基元动作的分数中最高分数。Select the primitive action corresponding to the target score, the primitive action corresponding to the target score is the action to be executed, and the target score is among the scores of each primitive action in the relevant action set highest score.

在一个实施例中，将所述第一指令、所述智能体的已执行动作和所述相关动作集中各个所述基元动作输入至所述已训练LLM，包括：In one embodiment, inputting each primitive action of the first instruction, the executed action of the agent, and the related action set to the trained LLM includes:

将所述第一指令、所述智能体的已执行动作和所述相关动作集中各个所述基元动作进行拼接，获得重构指令；Splice the first instruction, the executed action of the agent, and the primitive actions in the related action set to obtain a reconstruction instruction;

将所述重构指令输入至所述已训练LLM。The reconstruction instructions are input to the trained LLM.

在一个实施例中，所述利用已训练评估模型，对所述相关动作集中各个所述基元动作进行打分，获得第一打分结果，包括：In one embodiment, using the trained evaluation model to score each of the primitive actions in the relevant action set to obtain the first scoring result includes:

将所述相关动作集的各个所述基元动作输入至所述已训练评估模型，获得已训练评估模型输出的所述相关动作集中各个所述基元动作的概率值，所述概率值用于表征在所述外部环境下所述基元动作执行成功的概率，所述第一打分结果包括所述相关动作集中各个所述基元动作的所述概率值。Each primitive action of the relevant action set is input to the trained evaluation model, and a probability value of each primitive action in the relevant action set output by the trained evaluation model is obtained, and the probability value is used for Characterizing the probability of successful execution of the primitive action in the external environment, the first scoring result includes the probability value of each primitive action in the relevant action set.

第二方面，本申请实施例提供了一种智能体控制装置，包括：In a second aspect, embodiments of the present application provide an intelligent control device, including:

获取模块，用于获取用户命令及外部环境的视觉信息；Acquisition module, used to obtain user commands and visual information of the external environment;

目标检测模块，用于对所述视觉信息进行目标检测，获得目标检测信息；A target detection module, used to perform target detection on the visual information and obtain target detection information;

拼接模块，用于将所述用户命令和所述目标检测信息进行拼接，获得第一指令；A splicing module, used to splice the user command and the target detection information to obtain the first instruction;

执行模块，用于利用第一指令，指示已训练LLM从基元动作集的各个基元动作中选取待执行动作；The execution module is used to use the first instruction to instruct the trained LLM to select an action to be executed from each primitive action in the primitive action set;

还用于向智能体发送所述待执行动作，以使所述智能体执行所述待执行动作。It is also used to send the action to be executed to the intelligent agent, so that the intelligent agent executes the action to be executed.

第三方面，本申请实施例提供了一种电子设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上述第一方面中任一项所述的方法。In a third aspect, embodiments of the present application provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program Implement the method described in any one of the above first aspects.

第四方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如上述第一方面中任一项所述的方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the computer program implements any of the above-described first aspects. method described.

第五方面，本申请实施例提供了一种计算机程序产品，当计算机程序产品在电子设备上运行时，使得电子设备执行上述第一方面中任一项所述的方法。In a fifth aspect, embodiments of the present application provide a computer program product, which when the computer program product is run on an electronic device, causes the electronic device to execute the method described in any one of the above first aspects.

本申请实施例与现有技术相比存在的有益效果是：Compared with the prior art, the beneficial effects of the embodiments of the present application are:

本申请实施例通过获取用户命令及外部环境的视觉信息；对视觉信息进行目标检测，获得目标检测信息；将用户命令和目标检测信息进行拼接，获得第一指令；利用第一指令，指示已训练LLM从基元动作集的各个基元动作中选取待执行动作；向智能体发送待执行动作，以使智能体执行待执行动作，实现将视觉信息加入LLM的推理过程中，使得LLM与外部环境进行交互，能够基于环境选取对智能体最有利的决策。+-The embodiment of the present application acquires user commands and visual information of the external environment; performs target detection on the visual information to obtain target detection information; splices user commands and target detection information to obtain the first instruction; and uses the first instruction to indicate that training has been performed LLM selects the action to be executed from each primitive action in the primitive action set; sends the action to be executed to the agent, so that the agent executes the action to be executed, and adds visual information to the reasoning process of LLM, allowing LLM to interact with the external environment Interact and be able to select the most beneficial decision for the agent based on the environment. +-

可以理解的是，上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述，在此不再赘述。It can be understood that the beneficial effects of the above-mentioned second aspect to the fifth aspect can be referred to the relevant description in the above-mentioned first aspect, and will not be described again here.

附图说明Description of the drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only for the purpose of the present application. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1是本申请一实施例提供的智能体控制方法的第一种流程示意图；Figure 1 is a first flow diagram of an intelligent agent control method provided by an embodiment of the present application;

图2是本申请一实施例提供的智能体控制方法的第二种流程示意图；Figure 2 is a second flow diagram of an intelligent agent control method provided by an embodiment of the present application;

图3是本申请一实施例提供的智能体控制装置的结构示意图；Figure 3 is a schematic structural diagram of an intelligent control device provided by an embodiment of the present application;

图4是本申请一实施例提供的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本申请实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of explanation rather than limitation, specific details such as specific system structures and technologies are provided to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

应当理解，当在本申请说明书和所附权利要求书中使用时，术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It will be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.

还应当理解，在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

如在本申请说明书和所附权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地，短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".

另外，在本申请说明书和所附权利要求书的描述中，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In addition, in the description of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此，在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例，而是意味着“一个或多个但不是所有的实施例”，除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

大型语言模型(Large Language Model，LLM)是一种人工智能模型，旨在理解和生成人类语言。它们在大量的文本数据上进行训练，可以执行广泛的任务，包括文本总结、翻译、情感分析等等。LLM具有良好的推理能力，可以充当“大脑”，能够将高级语言指令分解成能够被“小脑”执行的低级语言指令。例如高级语言指令为{我要吃三明治}，对应分解的低级指令为{1.拿起盘子，2.放下盘子，3.拿起生菜，4.放下生菜，…}。A Large Language Model (LLM) is an artificial intelligence model designed to understand and generate human language. They are trained on large amounts of text data and can perform a wide range of tasks, including text summarization, translation, sentiment analysis, and more. LLM has good reasoning ability and can act as a "brain", able to decompose high-level language instructions into low-level language instructions that can be executed by the "cerebellum". For example, the high-level language instruction is {I want to eat a sandwich}, and the corresponding decomposed low-level instruction is {1. Pick up the plate, 2. Put down the plate, 3. Pick up lettuce, 4. Put down lettuce,...}.

但LLM没有“眼睛”和“手”，无法与环境进行良好的交互从而无法将环境信息参与到LLM的推理过程中，使得LLM缺乏现实世界的经验，无法向在特定环境下的智能体提供最有力的决策。However, LLM does not have "eyes" and "hands" and cannot interact well with the environment, so it cannot participate in the environmental information in the reasoning process of LLM. This makes LLM lack experience in the real world and cannot provide the best solutions to agents in specific environments. Powerful decision-making.

对此，本申请提供了一种智能体控制方法，包括获取用户命令及外部环境的视觉信息；对视觉信息进行目标检测，获得目标检测信息；将用户命令和目标检测信息进行拼接，获得第一指令；利用第一指令，指示已训练LLM从基元动作集的各个基元动作中选取待执行动作；向智能体发送待执行动作，以使智能体执行待执行动作，实现将视觉信息加入LLM的推理过程中，使得LLM与外部环境进行交互，能够基于环境选取对智能体最有利的决策。In this regard, this application provides an intelligent agent control method, which includes obtaining user commands and visual information of the external environment; performing target detection on the visual information to obtain target detection information; splicing user commands and target detection information to obtain the first Instruction; use the first instruction to instruct the trained LLM to select an action to be executed from each primitive action in the primitive action set; send the action to be executed to the agent, so that the agent executes the action to be executed, thereby adding visual information to the LLM During the reasoning process, LLM interacts with the external environment and can select the most beneficial decision for the agent based on the environment.

图1是本申请一实施例提供的智能体控制方法的第一种流程示意图。如图1所示，所述方法，包括：Figure 1 is a first flow diagram of an intelligent agent control method provided by an embodiment of the present application. As shown in Figure 1, the method includes:

S11：获取用户命令及外部环境的视觉信息。S11: Obtain user commands and visual information of the external environment.

在应用中，当用户需要智能体提供服务时，向电子设备输入用户命令。可以语音输入用户命令，也可输入文字形式的用户命令。In the application, when the user needs the intelligent agent to provide services, user commands are input to the electronic device. User commands can be input by voice or in text form.

电子设备通过传感器或摄像设备采集外部环境的视觉信息。其中，外部环境为智能体所操作的目标所在的环境。视觉信息可以是拍摄的图像和视频等。Electronic devices collect visual information of the external environment through sensors or camera devices. Among them, the external environment is the environment where the target operated by the agent is located. Visual information can be captured images and videos, etc.

S12：对视觉信息进行目标检测，获得目标检测信息。S12: Perform target detection on visual information to obtain target detection information.

其中，目标检测信息包括物体的检测框和物体的位置。Among them, the target detection information includes the detection frame of the object and the location of the object.

示例的，视觉信息为图像，利用YoloV5(You Only Look Once，单阶段目标检测算法)，通过注意力和卷积对图像进行目标检测，以文本形式输出目标检测信息。目标检测信息表示为{苹果(x1，y1)，香蕉(x2，y2)，盘子(x3，y3)}。In the example, the visual information is an image. YoloV5 (You Only Look Once, single-stage target detection algorithm) is used to detect targets in the image through attention and convolution, and the target detection information is output in the form of text. The target detection information is expressed as {apple (x1, y1), banana (x2, y2), plate (x3, y3)}.

S13：将用户命令和目标检测信息进行拼接，获得第一指令。S13: Splice the user command and target detection information to obtain the first instruction.

示例的，用户命令为{给我做一个沙拉}，目标检测信息为{苹果(x1，y1)，香蕉(x2，y2)，盘子(x3，y3)}，拼接后第一指令为{给我做一个沙拉，以下是物体在图像中的位置，苹果(x1，y1)，香蕉(x2，y2)，盘子(x3，y3)}。For example, the user command is {Make me a salad}, the target detection information is {Apple (x1, y1), Banana (x2, y2), Plate (x3, y3)}, and the first command after splicing is {Give me Make a salad. The following are the positions of the objects in the image, apple (x1, y1), banana (x2, y2), plate (x3, y3)}.

S14：利用第一指令，指示已训练LLM从基元动作集的各个基元动作中选取待执行动作。S14: Use the first instruction to instruct the trained LLM to select an action to be executed from each primitive action in the primitive action set.

在一种可能的实现方式中，LLM的输入格式为{用户命令+目标检测信息+基元动作集中基元动作}，第一指令为{用户命令+目标检测信息}。In a possible implementation, the input format of the LLM is {user command + target detection information + primitive action in the primitive action set}, and the first instruction is {user command + target detection information}.

示例的，基于LLM的输入格式，LLM的输入为{给我做一个沙拉，以下是物体在图像中的位置，苹果(x1，y1)，香蕉(x2，y2)，盘子(x3，y3)，拿盘子，…，拿生菜，放下盘子}。Example, based on the input format of LLM, the input of LLM is {Make me a salad, the following is the position of the object in the image, apple (x1, y1), banana (x2, y2), plate (x3, y3), Take the plate, ..., take the lettuce, put the plate down}.

在应用中，在电子设备中，LLM基于第一指令，从输入的基元动作集的各个基元动作中选取待执行动作。In the application, in the electronic device, the LLM selects an action to be executed from each primitive action in the input primitive action set based on the first instruction.

S15：向智能体发送待执行动作，以使智能体执行待执行动作。S15: Send the action to be executed to the agent, so that the agent can execute the action to be executed.

在应用中，智能体为可以执行动作的智能设备，例如机械臂、机器人等。In applications, agents are intelligent devices that can perform actions, such as robotic arms, robots, etc.

可以理解的，先将环境的视觉信息进行处理，获得文本描述形式的目标检测信息，然后将目标检测信息加入至用户命令，获得第一指令。然后LLM基于第一指令产生在当前环境下最能够执行的指令以完成用户命令。这使得LLM不用进行视觉语言对齐，不会损伤LLM的推理能力。It is understandable that the visual information of the environment is first processed to obtain target detection information in the form of text description, and then the target detection information is added to the user command to obtain the first instruction. Then LLM generates the most executable instruction in the current environment based on the first instruction to complete the user command. This eliminates the need for LLM to perform visual language alignment and does not damage LLM's reasoning capabilities.

本申请实施例包括获取用户命令及外部环境的视觉信息；对视觉信息进行目标检测，获得目标检测信息；将用户命令和目标检测信息进行拼接，获得第一指令；利用第一指令，指示已训练LLM从基元动作集的各个基元动作中选取待执行动作；向智能体发送待执行动作，以使智能体执行待执行动作，实现将视觉信息加入LLM的推理过程中，使得LLM与外部环境进行交互，能够基于环境选取对智能体最有利的决策。Embodiments of the present application include obtaining user commands and visual information of the external environment; performing target detection on the visual information to obtain target detection information; splicing user commands and target detection information to obtain the first instruction; using the first instruction to indicate that training has been performed LLM selects the action to be executed from each primitive action in the primitive action set; sends the action to be executed to the agent, so that the agent executes the action to be executed, and adds visual information to the reasoning process of LLM, allowing LLM to communicate with the external environment Interact and be able to select the most beneficial decision for the agent based on the environment.

图2是本申请一实施例提供的智能体控制方法的第二种流程示意图。如图2所示，所述方法还包括：Figure 2 is a second schematic flowchart of an intelligent agent control method provided by an embodiment of the present application. As shown in Figure 2, the method also includes:

S21：将第一指令输入至已训练动作初筛模型，获得已训练动作初筛模型输出的相关动作集。S21: Input the first instruction to the trained action preliminary screening model, and obtain the relevant action set output by the trained action preliminary screening model.

其中，相关动作集包括与用户命令相关的基元动作。Among them, the relevant action set includes primitive actions related to user commands.

在应用中，在电子设备中，已训练动作初筛模型根据第一指令在所有基元动作中选取可能发生在用户命令执行过程的基元动作，排除不相关的基元动作。通过已训练动作初筛模型筛选出与用户命令相关的基元动作，减少不相关的基元动作对已训练LLM的推理产生影响，减少已训练LLM因某些因素而选取了不相关的基元动作。In the application, in the electronic device, the trained action preliminary screening model selects the primitive actions that may occur during the execution of the user command from all primitive actions according to the first instruction, and excludes irrelevant primitive actions. Use the trained action preliminary screening model to filter out primitive actions related to user commands, reduce the impact of irrelevant primitive actions on the inference of the trained LLM, and reduce the selection of irrelevant primitives by the trained LLM due to certain factors. action.

在一种可能的实现方式，将LLM模型筛选的基元动作和排除的基元动作作为样本训练动作初筛模型。In one possible implementation, the primitive actions screened by the LLM model and the excluded primitive actions are used as samples to train the action preliminary screening model.

步骤S21之前，还包括：Before step S21, it also includes:

S31：获取正样本及负样本，正样本包括已训练LLM针对任务样本选取的动作，负样本包括已训练LLM针对任务样本未选取的动作。S31: Obtain positive samples and negative samples. Positive samples include actions selected by the trained LLM for task samples, and negative samples include actions not selected by the trained LLM for task samples.

在应用中，预先利用已训练LLM模型筛选发生在任务样本执行过程的基元动作，将已训练LLM模型筛选的基元动作作为正样本和将已训练LLM模型选取的基元动作作为负样本。In the application, the trained LLM model is used in advance to screen the primitive actions that occur during the execution of the task sample, and the primitive actions screened by the trained LLM model are used as positive samples and the primitive actions selected by the trained LLM model are used as negative samples.

S32：利用正样本及负样本，训练动作初筛模型，获得已训练动作初筛模型。S32: Use positive samples and negative samples to train the action preliminary screening model and obtain the trained action preliminary screening model.

在应用中，当模型的损失值小于预设损失值，获得已训练动作初筛模型。In the application, when the loss value of the model is less than the preset loss value, the preliminary screening model of the trained actions is obtained.

S22：利用已训练评估模型，对相关动作集中各个基元动作进行打分，获得第一打分结果。S22: Use the trained evaluation model to score each primitive action in the relevant action set, and obtain the first scoring result.

在一种可能的实现方式中，步骤S22，包括：In a possible implementation, step S22 includes:

将相关动作集的各个基元动作输入至已训练评估模型，获得已训练评估模型输出的相关动作集中各个基元动作的概率值，概率值用于表征在外部环境下基元动作执行成功的概率，第一打分结果包括相关动作集中各个基元动作的概率值。Input each primitive action of the relevant action set into the trained evaluation model, and obtain the probability value of each primitive action in the relevant action set output by the trained evaluation model. The probability value is used to represent the probability of successful execution of the primitive action in the external environment. , the first scoring result includes the probability value of each primitive action in the relevant action set.

示例的，已训练评估模型为利用强化学习训练好的模型。具体的，在给出的环境下，对基元动作进行评估。若该基元动作在该环境下执行成功，给予评估模型奖励，反之给予惩罚。当评估模型的损失值小于预设损失值，获得已训练评估模型。For example, the trained evaluation model is a model trained using reinforcement learning. Specifically, primitive actions are evaluated under a given environment. If the primitive action is successfully executed in this environment, the evaluation model will be rewarded, otherwise it will be punished. When the loss value of the evaluation model is less than the preset loss value, the trained evaluation model is obtained.

S23：利用第一指令，指示已训练LLM对相关动作集中各个基元动作进行打分，获得第二打分结果。S23: Use the first instruction to instruct the trained LLM to score each primitive action in the relevant action set, and obtain the second scoring result.

其中，第二打分结果中基元动作的分数表示基元动作对用户命令执行的有利程度。The score of the primitive action in the second scoring result indicates how beneficial the primitive action is to the execution of the user command.

在一种可能的实现方式中，步骤S23，包括：In a possible implementation, step S23 includes:

S231：将第一指令和相关动作集中各个基元动作输入至已训练LLM，获得已训练LLM输出的第二打分结果。S231: Input the first instruction and each primitive action in the related action set to the trained LLM, and obtain the second scoring result output by the trained LLM.

其中，LLM的输入格式为{用户命令+目标检测信息+相关动作集中基元动作}，第一指令为{用户命令+目标检测信息}。Among them, the input format of LLM is {user command + target detection information + related action set primitive action}, and the first instruction is {user command + target detection information}.

在应用中，已训练LLM根据第一指令，对相关动作集中各个基元动作进行打分，获得第二打分结果。In the application, the trained LLM scores each primitive action in the relevant action set according to the first instruction, and obtains the second scoring result.

S232：在智能体执行待执行动作后，循环执行下述步骤，直至用户命令完成。S232: After the agent executes the action to be executed, the following steps are executed in a loop until the user command is completed.

S233：将第一指令、智能体的已执行动作和相关动作集中各个基元动作输入至已训练LLM，获得已训练LLM输出的第二打分结果。S233: Input the first instruction, the executed action of the agent, and each primitive action in the set of related actions to the trained LLM, and obtain the second scoring result output by the trained LLM.

在应用中，若智能体执行一次动作后完成用户命令，则不执行步骤S232及S233。若智能体执行一次动作后未完成用户命令，则需执行步骤S232及S233。In the application, if the agent performs an action and then completes the user command, steps S232 and S233 will not be executed. If the agent fails to complete the user command after performing an action, steps S232 and S233 need to be performed.

具体的，将第一指令、智能体的已执行动作和相关动作集中各个基元动作输入至已训练LLM，包括：Specifically, the first instruction, the executed actions of the agent, and each primitive action in the set of related actions are input to the trained LLM, including:

S41：将第一指令、智能体的已执行动作和相关动作集中各个基元动作进行拼接，获得重构指令。S41: Splice the primitive actions of the first instruction, the executed action of the agent, and the related action set to obtain the reconstruction instruction.

其中，重构指令表示为{用户命令+目标检测信息+1.智能体已执行动作+2.智能体已执行动作+…+相关动作集中基元动作}，第一指令为{用户命令+目标检测信息}。Among them, the reconstruction instruction is expressed as {user command + target detection information + 1. The action has been performed by the agent + 2. The action has been performed by the agent +... + the primitive action in the relevant action set}, and the first instruction is {user command + target Detection information}.

S42：将重构指令输入至已训练LLM。S42: Input the reconstruction instructions to the trained LLM.

在应用中，重构指令输入至已训练LLM，已训练LLM根据重构指令和已执行动作对相关动作集中各个基元动作进行打分，获得第二打分结果。In the application, the reconstruction instruction is input to the trained LLM, and the trained LLM scores each primitive action in the relevant action set based on the reconstruction instruction and the executed action, and obtains the second scoring result.

S24：根据第一打分结果和第二打分结果确定待执行动作。S24: Determine the action to be executed based on the first scoring result and the second scoring result.

在一种可能的实现方式中，步骤S24，包括：In a possible implementation, step S24 includes:

S241：根据第一打分结果和第二打分结果，确定相关动作集中各个基元动作的分数。S241: Determine the score of each primitive action in the relevant action set based on the first scoring result and the second scoring result.

S242：选取目标分数对应的基元动作。S242: Select the primitive action corresponding to the target score.

其中，目标分数对应的基元动作为待执行动作，目标分数为在相关动作集的各个基元动作的分数中最高分数。Among them, the primitive action corresponding to the target score is the action to be executed, and the target score is the highest score among the scores of each primitive action in the relevant action set.

在应用中，整合模型接收第一打分结果和第二打分结果后，利用模型的超参数将第一打分结果和第二打分结果进行整合，然后选取目标分数对应的基元动作。其中，超参数可以人为设定，也可训练得到。In the application, after the integrated model receives the first scoring result and the second scoring result, it uses the hyperparameters of the model to integrate the first scoring result and the second scoring result, and then selects the primitive action corresponding to the target score. Among them, the hyperparameters can be set manually or trained.

示例的，获取用户命令为{我要吃三明治}。对图像进行处理，获得目标检测信息{火腿(x1，y1)，盘子(x2，y2)，生菜(x3，y3)，煎蛋(x4，y4)，人类(x5，y5)}。拼接用户命令和目标检测信息，获得第一指令为{我要吃三明治，以下为物体在图像中的位置，火腿(x1，y1)，盘子(x2，y2)，生菜(x3，y3)，煎蛋(x4，y4)，人类(x5，y5)}。For example, the user command to obtain is {I want to eat a sandwich}. The image is processed to obtain target detection information {ham (x1, y1), plate (x2, y2), lettuce (x3, y3), fried egg (x4, y4), human (x5, y5)}. Splicing user commands and target detection information, the first instruction obtained is {I want to eat a sandwich, the following is the position of the object in the image, ham (x1, y1), plate (x2, y2), lettuce (x3, y3), fried Egg(x4,y4), human(x5,y5)}.

假设基元动作有1000个，已训练动作初筛模型接收输入的基元动作，对每个基元动作打上发生在用户命令执行过程的概率，通过选取概率大于阈值的基元动作，获得相关动作集。Assume that there are 1,000 primitive actions. The trained action preliminary screening model receives the input primitive actions, marks each primitive action with a probability of occurring during the execution of the user command, and obtains relevant actions by selecting primitive actions whose probability is greater than the threshold. set.

拼接第一指令和相关动作集的基元动作，获得{我要吃三明治，以下为物体在图像中的位置，火腿(x1，y1)，盘子(x2，y2)，生菜(x3，y3)，煎蛋(x4，y4)，人类(x5，y5)，请对以下动作基元进行打分[拿盘子，拿三明治，拿火腿，拿碗，…]}。By splicing the primitive actions of the first instruction and related action sets, we obtain {I want to eat a sandwich, the following is the position of the object in the image, ham (x1, y1), plate (x2, y2), lettuce (x3, y3), Omelette (x4, y4), human (x5, y5), please rate the following action primitives [get plate, get sandwich, get ham, get bowl,...]}.

将上述拼接后的指令输入至已训练评估模型，获得已训练评估模型输出的第一打分结果。评分流程示例如下：在当前环境中已训练评估模型判断如果执行拿盘子，可能发生碰撞或机械臂达不到盘子的位置，但拿碗却可以，已训练评估模型对拿碗的评分比拿盘子的评分高，即拿碗的概率值比拿盘子的概率值高。Input the above spliced instructions into the trained evaluation model to obtain the first scoring result output by the trained evaluation model. An example of the scoring process is as follows: In the current environment, the evaluation model has been trained to judge that if fetching the plate is performed, a collision may occur or the robot arm cannot reach the position of the plate, but it can fetch the bowl. The trained evaluation model scores the fetching of the bowl better than fetching the plate. The score is high, that is, the probability value of taking the bowl is higher than the probability value of taking the plate.

将上述拼接后的指令输入至已训练LLM，获得已训练LLM输出的第二打分结果。评分流程示例如下：在当前环境中，已训练LLM因环境中无三明治对拿三明治评分低，对拿盘子评分高，对拿碗的评分比拿盘子的评分低。Input the above spliced instructions into the trained LLM to obtain the second scoring result output by the trained LLM. An example of the scoring process is as follows: In the current environment, the trained LLM gives a low score to taking a sandwich because there is no sandwich in the environment, a high score to taking a plate, and a lower score to taking a bowl than taking a plate.

结合第一打分结果和第二打分结果，确定待执行动作为拿碗。向智能体发送待执行动作拿碗。Combining the first scoring result and the second scoring result, it is determined that the action to be performed is to take the bowl. Send the action to be executed to the agent to get the bowl.

因环境中无三明治，需执行多个动作才能完成用户命令，所以重复执行选取待执行动作的步骤，直至用户命令完成：Since there is no sandwich in the environment, multiple actions need to be executed to complete the user command, so the steps of selecting the action to be executed are repeated until the user command is completed:

拼接第一指令、智能体的已执行动作和相关动作集的基元动作，获得{我要吃三明治，以下为物体在图像中的位置，火腿(x1，y1)，盘子(x2，y2)，生菜(x3，y3)，煎蛋(x4，y4)，人类(x5，y5)，执行了1.拿碗，请对以下动作基元进行打分[拿盘子，拿三明治，拿火腿，拿碗，…]}。将其输入至已训练LLM，获得已训练LLM输出的第二打分结果。其中，对放下碗评分最高。Splicing the first instruction, the executed action of the agent and the primitive actions of the related action set, we obtain {I want to eat a sandwich, the following is the position of the object in the image, ham (x1, y1), plate (x2, y2), Lettuce (x3, y3), fried egg (x4, y4), human (x5, y5), executed 1. Get the bowl, please rate the following action primitives [get the plate, get the sandwich, get the ham, get the bowl, …]}. Input it into the trained LLM to obtain the second scoring result output by the trained LLM. Among them, Putting Down the Bowl received the highest rating.

根据该第二打分结果和第一打分结果，确定待执行动作为放下碗。According to the second scoring result and the first scoring result, it is determined that the action to be performed is to put down the bowl.

拼接第一指令、智能体的已执行动作和相关动作集的基元动作，获得{我要吃三明治，以下为物体在图像中的位置，火腿(x1，y1)，盘子(x2，y2)，生菜(x3，y3)，煎蛋(x4，y4)，人类(x5，y5)，执行了1.拿碗，2.放下碗，请对以下动作基元进行打分[拿盘子，拿三明治，拿火腿，拿碗，…]}。将其输入至已训练LLM，获得已训练LLM输出的第二打分结果。其中，对拿生菜评分最高。Splicing the first instruction, the executed action of the agent and the primitive actions of the related action set, we obtain {I want to eat a sandwich, the following is the position of the object in the image, ham (x1, y1), plate (x2, y2), Lettuce (x3, y3), fried egg (x4, y4), human (x5, y5), executed 1. Take the bowl, 2. Put down the bowl, please rate the following action primitives [take the plate, take the sandwich, take Ham, get the bowl,…]}. Input it into the trained LLM to obtain the second scoring result output by the trained LLM. Among them, lettuce has the highest score.

根据该第二打分结果和第一打分结果，确定待执行动作为拿生菜。以此类推，直至任务成功。According to the second scoring result and the first scoring result, it is determined that the action to be performed is to get lettuce. And so on until the task is successful.

本申请实施例通过将第一指令输入至已训练动作初筛模型，获得已训练动作初筛模型输出的相关动作集，相关动作集包括与用户命令相关的基元动作，减少不相关的基元动作对已训练LLM的推理产生影响，降低已训练LLM因某些因素而选取了不相关的基元动作的概率。In the embodiment of the present application, the first instruction is input into the trained action preliminary screening model to obtain the relevant action set output by the trained action preliminary screening model. The relevant action set includes primitive actions related to user commands, reducing irrelevant primitives. Actions have an impact on the inference of the trained LLM, reducing the probability that the trained LLM selects irrelevant primitive actions due to certain factors.

以及通过利用已训练评估模型，对相关动作集中各个基元动作进行打分，获得第一打分结果；利用第一指令，指示已训练LLM对相关动作集中各个基元动作进行打分，获得第二打分结果；根据第一打分结果和第二打分结果确定待执行动作，结合两个打分结果确定待执行动作，能够准确选取待执行动作。and by using the trained evaluation model to score each primitive action in the relevant action set to obtain the first scoring result; using the first instruction to instruct the trained LLM to score each primitive action in the relevant action set to obtain the second scoring result ; Determine the action to be executed based on the first scoring result and the second scoring result, and determine the action to be executed by combining the two scoring results, so that the action to be executed can be accurately selected.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the sequence number of each step in the above embodiment does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

对应于上文实施例所述的方法，为了便于说明，仅示出了与本申请实施例相关的部分。Corresponding to the methods described in the above embodiments, for convenience of explanation, only the parts related to the embodiments of the present application are shown.

图3是本申请一实施例提供的智能体控制装置的结构示意图。如图3所示，所述装置，包括：Figure 3 is a schematic structural diagram of an intelligent agent control device provided by an embodiment of the present application. As shown in Figure 3, the device includes:

获取模块10，用于获取用户命令及外部环境的视觉信息；The acquisition module 10 is used to acquire user commands and visual information of the external environment;

目标检测模块11，用于对视觉信息进行目标检测，获得目标检测信息；The target detection module 11 is used to perform target detection on visual information and obtain target detection information;

拼接模块12，用于将用户命令和目标检测信息进行拼接，获得第一指令；The splicing module 12 is used to splice the user command and the target detection information to obtain the first instruction;

执行模块13，用于利用第一指令，指示已训练LLM从基元动作集的各个基元动作中选取待执行动作；The execution module 13 is configured to use the first instruction to instruct the trained LLM to select an action to be executed from each primitive action in the primitive action set;

还用于向智能体发送待执行动作，以使智能体执行待执行动作。It is also used to send actions to be executed to the agent so that the agent can execute the action to be executed.

在一个实施例中，所述装置，还包括：In one embodiment, the device further includes:

动作初筛模块，用于将第一指令输入至已训练动作初筛模型，获得已训练动作初筛模型输出的相关动作集，相关动作集包括与用户命令相关的基元动作。The action preliminary screening module is used to input the first instruction to the trained action preliminary screening model and obtain the relevant action set output by the trained action preliminary screening model. The relevant action set includes primitive actions related to the user command.

评估模块，用于利用已训练评估模型，对相关动作集中各个基元动作进行打分，获得第一打分结果。The evaluation module is used to use the trained evaluation model to score each primitive action in the relevant action set and obtain the first scoring result.

执行模块，还用于利用第一指令，指示已训练LLM对相关动作集中各个基元动作进行打分，获得第二打分结果；根据第一打分结果和第二打分结果确定待执行动作。The execution module is also used to use the first instruction to instruct the trained LLM to score each primitive action in the relevant action set to obtain the second scoring result; and determine the action to be executed based on the first scoring result and the second scoring result.

训练模块，用于获取正样本及负样本，正样本包括已训练LLM针对任务样本选取的动作，负样本包括已训练LLM针对任务样本未选取的动作；利用正样本及负样本，训练动作初筛模型，获得已训练动作初筛模型。The training module is used to obtain positive samples and negative samples. Positive samples include actions selected by the trained LLM for task samples, and negative samples include actions not selected by the trained LLM for task samples. Using positive samples and negative samples, training actions are initially screened. model to obtain a preliminary screening model of trained actions.

在一个实施例中，评估模块，具体用于将相关动作集的各个基元动作输入至已训练评估模型，获得已训练评估模型输出的相关动作集中各个基元动作的概率值，概率值用于表征在外部环境下基元动作执行成功的概率，第一打分结果包括相关动作集中各个基元动作的概率值。In one embodiment, the evaluation module is specifically configured to input each primitive action of the relevant action set to the trained evaluation model, and obtain the probability value of each primitive action in the relevant action set output by the trained evaluation model. The probability value is used for It represents the probability of successful execution of primitive actions in the external environment. The first scoring result includes the probability value of each primitive action in the relevant action set.

在一个实施例中，执行模块，具体用于将第一指令和相关动作集中各个基元动作输入至已训练LLM，获得已训练LLM输出的第二打分结果；在智能体执行待执行动作后，循环执行下述步骤，直至用户指令完成；将第一指令、智能体的已执行动作和相关动作集中各个基元动作输入至已训练LLM，获得已训练LLM输出的第二打分结果。In one embodiment, the execution module is specifically configured to input the first instruction and each primitive action in the related action set to the trained LLM, and obtain the second scoring result output by the trained LLM; after the agent executes the action to be executed, The following steps are executed in a loop until the user instruction is completed; input the first instruction, the executed actions of the agent, and each primitive action in the set of related actions into the trained LLM to obtain the second scoring result output by the trained LLM.

在一个实施例中，执行模块，具体用于根据第一打分结果和第二打分结果，确定相关动作集中各个基元动作的分数；选取目标分数对应的基元动作，目标分数对应的基元动作为待执行动作，目标分数为在相关动作集的各个基元动作的分数中最高分数。In one embodiment, the execution module is specifically configured to determine the score of each primitive action in the relevant action set based on the first scoring result and the second scoring result; select the primitive action corresponding to the target score, and select the primitive action corresponding to the target score. For an action to be executed, the target score is the highest score among the scores of each primitive action in the relevant action set.

在一个实施例中，执行模块，具体用于将第一指令、智能体的已执行动作和相关动作集中各个基元动作进行拼接，获得重构指令；将重构指令输入至已训练LLM。In one embodiment, the execution module is specifically configured to splice the first instruction, the executed action of the agent, and the primitive actions in the related action set to obtain the reconstruction instruction; and input the reconstruction instruction into the trained LLM.

图4为本申请一实施例提供的电子设备的结构示意图。如图4所示，该实施例的电子设备2包括：至少一个处理器20(图4中仅示出一个)、存储器21以及存储在所述存储器21中并可在所述至少一个处理器20上运行的计算机程序22，所述处理器20执行所述计算机程序22时实现上述任意各个方法实施例中的步骤。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 4 , the electronic device 2 of this embodiment includes: at least one processor 20 (only one is shown in FIG. 4 ), a memory 21 and data stored in the memory 21 and available in the at least one processor 20 The computer program 22 running on the processor 20 implements the steps in any of the above method embodiments when the processor 20 executes the computer program 22 .

所述电子设备2可以是桌上型计算机及云端服务器等计算设备。该电子设备2可包括，但不仅限于，处理器20、存储器21。本领域技术人员可以理解，图4仅仅是电子设备2的举例，并不构成对电子设备2的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如还可以包括输入输出设备、网络接入设备等。The electronic device 2 may be a computing device such as a desktop computer or a cloud server. The electronic device 2 may include, but is not limited to, a processor 20 and a memory 21 . Those skilled in the art can understand that FIG. 4 is only an example of the electronic device 2 and does not constitute a limitation on the electronic device 2. It may include more or fewer components than shown in the figure, or some components may be combined, or different components may be used. , for example, it may also include input and output devices, network access devices, etc.

所称处理器20可以是中央处理单元(Central Processing Unit，CPU)，该处理器20还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现成可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 20 may be a central processing unit (Central Processing Unit, CPU). The processor 20 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit). , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

所述存储器21在一些实施例中可以是所述电子设备2的内部存储单元，例如电子设备2的硬盘或内存。所述存储器21在另一些实施例中也可以是所述电子设备2的外部存储设备，例如所述电子设备2上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。进一步地，所述存储器21还可以既包括所述电子设备2的内部存储单元也包括外部存储设备。所述存储器21用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等，例如所述计算机程序的程序代码等。所述存储器21还可以用于暂时地存储已经输出或者将要输出的数据。The memory 21 may be an internal storage unit of the electronic device 2 in some embodiments, such as a hard disk or memory of the electronic device 2 . In other embodiments, the memory 21 may also be an external storage device of the electronic device 2, such as a plug-in hard disk, a smart memory card (SMC), or a secure digital card equipped on the electronic device 2. (Secure Digital, SD) card, Flash Card, etc. Further, the memory 21 may also include both an internal storage unit of the electronic device 2 and an external storage device. The memory 21 is used to store operating systems, application programs, boot loaders, data and other programs, such as program codes of the computer programs. The memory 21 can also be used to temporarily store data that has been output or is to be output.

需要说明的是，上述装置/单元之间的信息交互、执行过程等内容，由于与本申请方法实施例基于同一构思，其具体功能及带来的技术效果，具体可参见方法实施例部分，此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For details of their specific functions and technical effects, please refer to the method embodiments section. No further details will be given.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将所述装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above system, please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.

本申请实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时可实现上述各个方法实施例中的步骤。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the steps in each of the above method embodiments can be implemented.

本申请实施例提供了一种计算机程序产品，当计算机程序产品在电子设备上运行时，使得电子设备执行时可实现上述各个方法实施例中的步骤。Embodiments of the present application provide a computer program product. When the computer program product is run on an electronic device, the steps in each of the above method embodiments can be implemented when the electronic device is executed.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实现上述实施例方法中的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括：能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，RandomAccess Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区，根据立法和专利实践，计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps of each of the above method embodiments may be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to the camera device/terminal device, recording media, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, RandomAccess Memory), electrical carrier signals, telecommunications signals, and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not detailed or documented in a certain embodiment, please refer to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

在本申请所提供的实施例中，应该理解到，所揭露的装置/网络设备和方法，可以通过其它的方式实现。例如，以上所描述的装置/网络设备实施例仅仅是示意性的，例如，所述模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed devices/network devices and methods can be implemented in other ways. For example, the apparatus/network equipment embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components can be combined or can be integrated into another system, or some features can be omitted, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

以上所述实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。The above-described embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of this application, and should be included in within the protection scope of this application.

Claims

1. An agent control method, comprising:

acquiring a user command and visual information of an external environment;

performing target detection on the visual information to obtain target detection information;

splicing the user command and the target detection information to obtain a first instruction;

using a first instruction to instruct the trained LLM to select an action to be executed from all primitive actions in the primitive action set;

and sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed.

2. The method as recited in claim 1, further comprising:

inputting the first instruction into a trained action prescreening model to obtain a related action set output by the trained action prescreening model, wherein the related action set comprises the primitive actions related to the user command;

scoring each primitive action in the related action set by using a trained evaluation model to obtain a first scoring result;

using a first instruction to instruct the trained LLM to score each primitive action in the related action set, and obtaining a second scoring result;

and determining the action to be executed according to the first scoring result and the second scoring result.

3. The method of claim 2, wherein before inputting the first instruction into the trained motion screening model, further comprising:

obtaining a positive sample and a negative sample, wherein the positive sample comprises actions selected by the trained LLM for a task sample, and the negative sample comprises actions not selected by the trained LLM for the task sample;

and training an action preliminary screening model by utilizing the positive sample and the negative sample to obtain the trained action preliminary screening model.

4. The method of claim 2, wherein the instructing the trained LLM to score each of the primitive actions in the set of related actions with a first instruction to obtain a second scoring result comprises:

inputting the first instruction and each primitive action in the related action set to the trained LLM to obtain a second scoring result output by the trained LLM;

after the intelligent agent executes the action to be executed, the following steps are executed in a circulating way until the user instruction is completed;

and inputting the first instruction, the executed actions of the intelligent agent and the primitive actions in the related action set to the trained LLM to obtain the second scoring result output by the trained LLM.

5. The method according to any of claims 2 to 4, wherein said determining said action to be performed based on said first scoring result and said second scoring result comprises:

determining the score of each primitive action in the related action set according to the first scoring result and the second scoring result;

and selecting the primitive actions corresponding to the target scores, wherein the primitive actions corresponding to the target scores are the actions to be executed, and the target scores are the highest scores among the scores of the primitive actions in the related action set.

6. The method of claim 4, wherein inputting the first instruction, the executed actions of the agent, and the respective primitive actions in the set of related actions into the trained LLM comprises:

splicing the first instruction, the executed actions of the intelligent agent and the primitive actions in the relevant action set to obtain a reconstruction instruction;

and inputting the reconfiguration instruction to the trained LLM.

7. The method of claim 2, wherein scoring each of the primitive actions in the set of related actions using a trained evaluation model to obtain a first scoring result comprises:

inputting each primitive action of the related action set into the trained evaluation model to obtain a probability value of each primitive action in the related action set output by the trained evaluation model, wherein the probability value is used for representing the probability of successful execution of the primitive action in the external environment, and the first scoring result comprises the probability value of each primitive action in the related action set.

8. An agent control device, comprising:

the acquisition module is used for acquiring a user command and visual information of an external environment;

the target detection module is used for carrying out target detection on the visual information to obtain target detection information;

the splicing module is used for splicing the user command and the target detection information to obtain a first instruction;

the execution module is used for indicating the trained LLM to select actions to be executed from all primitive actions in the primitive action set by using the first instruction;

and the method is also used for sending the action to be executed to the intelligent agent so that the intelligent agent executes the action to be executed.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.