CN117690002A

CN117690002A - Information interaction method, device, electronic equipment and storage medium

Info

Publication number: CN117690002A
Application number: CN202311694607.XA
Authority: CN
Inventors: 岳海潇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-03-12

Abstract

The disclosure provides an information interaction method, an information interaction device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, large models and the like, and can be applied to scenes such as content generation, man-machine interaction and the like of artificial intelligence. The specific implementation scheme is as follows: in response to obtaining the demand description text, processing the demand description text by using a large language model to obtain a visual task attribute matched with the image processing intention represented by the demand description text, wherein the demand description text is associated with the image to be processed; determining an image processing result related to the image to be processed according to the visual task attribute; generating feedback information according to the image processing result; and displaying the feedback information on the interactive interface.

Description

Information interaction method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, large models and the like, and can be applied to scenes such as content generation, man-machine interaction and the like of artificial intelligence.

Background

With the rapid development of computer vision technology, images such as photographs, videos, etc. can be processed by computer vision technology, for example, images can be processed based on computer vision functions such as object detection, image classification, etc. Computer vision technology is widely applied to scenes such as film and television product production, intelligent security and protection.

Disclosure of Invention

The disclosure provides an information interaction method, an information interaction device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided an information interaction method, including: in response to obtaining the demand description text, processing the demand description text by using a large language model to obtain a visual task attribute matched with the image processing intention represented by the demand description text, wherein the demand description text is associated with the image to be processed; determining an image processing result related to the image to be processed according to the visual task attribute; generating feedback information according to the image processing result; and displaying the feedback information on the interactive interface.

According to another aspect of the present disclosure, there is provided an information interaction apparatus including: the visual task attribute obtaining module is used for responding to the obtained demand description text, processing the demand description text by utilizing a large language model, and obtaining visual task attributes matched with image processing intentions represented by the demand description text, wherein the demand description text is associated with the image to be processed; the image processing result determining module is used for determining an image processing result related to the image to be processed according to the visual task attribute; the feedback information generation module is used for generating feedback information according to the image processing result; and the display module is used for displaying the feedback information on the interactive interface.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method provided according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which information interaction methods and apparatus may be applied, according to embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a method of information interaction according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a schematic diagram of an information interaction method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates an application scenario diagram of an information interaction method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates an application scenario diagram of an information interaction method according to another embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of an information interaction device according to an embodiment of the disclosure; and

fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement the information interaction method according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated.

The embodiment of the disclosure provides an information interaction method, an information interaction device, electronic equipment and a storage medium, wherein the information interaction method comprises the following steps: in response to obtaining the demand description text, processing the demand description text by using a large language model to obtain a visual task attribute matched with the image processing intention represented by the demand description text, wherein the demand description text is associated with the image to be processed; determining an image processing result related to the image to be processed according to the visual task attribute; generating feedback information according to the image processing result; and displaying the feedback information on the interactive interface.

According to the embodiment of the disclosure, the demand description text is acquired, the semantics of the natural language representation in the demand description text are understood by utilizing the large language model, so that the acquired visual task attribute can be matched with the demand intention of the demand description text representation, the corresponding visual task can be executed through the visual task attribute, the image processing result is conveniently determined, the target object is prevented from acquiring the image processing result corresponding to the demand intention through diversified and complex visual processing resources, the complex operation process generated by selecting the visual processing resources is saved, the image processing efficiency can be improved, the timeliness of acquiring the processing result by the target object can be improved by generating and displaying feedback information according to the image processing result, and the user experience is improved.

The large language model (LLM: large Language Model) may include a deep learning model trained using a large amount of text data, which can be used to understand the meaning of language text, and to generate natural language text. The large language model may handle a variety of natural language tasks such as text classification, questions and answers, conversations, and the like. Since large language models typically contain billions of parameters, large scale parameters can help large language models learn complex patterns in natural language data, with a significant performance on natural language processing (NLP: natural Language Processing) tasks. In addition, the embodiment of the disclosure can develop an extended plug-in tool (service resource) based on the large language model, so that the large language model is combined with the computer vision service resource, and the large language model can perform computer vision tasks such as detection, classification, understanding and the like of pictures and videos.

Fig. 1 schematically illustrates an exemplary system architecture to which information interaction methods and apparatuses may be applied according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the information interaction method and apparatus may be applied may include a terminal device, but the terminal device may implement the information interaction method and apparatus provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

Or, the server may also be a cloud server, also called a cloud computing server or a cloud host, which is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, the information interaction method provided by the embodiments of the present disclosure may also be generally executed by the server 105. Accordingly, the information interaction device provided by the embodiments of the present disclosure may be generally disposed in the server 105. The information interaction method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the information interaction device provided by the embodiments of the present disclosure may also be provided in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

Alternatively, the information interaction method provided by the embodiments of the present disclosure may be generally performed by the terminal device 101, 102, or 103. Accordingly, the information interaction device provided by the embodiment of the present disclosure may also be provided in the terminal device 101, 102, or 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flow chart of an information interaction method according to an embodiment of the disclosure.

As shown in fig. 2, the information interaction method includes operations S210 to S240.

In response to obtaining the demand description text, processing the demand description text with a large language model to obtain visual task attributes matching with image processing intents characterized by the demand description text, wherein the demand description text is associated with the image to be processed in operation S210.

In operation S220, an image processing result related to the image to be processed is determined according to the visual task attribute.

In operation S230, feedback information is generated according to the image processing result.

In operation S240, feedback information is presented at the interactive interface.

According to an embodiment of the present disclosure, the requirement description text may be natural language information for characterizing an image processing requirement of the target object, and the requirement description text may be acquired based on an input operation of the target object with respect to an information input box in the interactive interface. But not limited to this, the demand description text may be obtained in other manners, for example, the demand description text represented by the sound information may be obtained by recognition through a voice recognition algorithm based on the sound information of the demand description text expressed by the target object acquired by the sound acquisition device.

According to the embodiment of the present disclosure, the image to be processed may include an image to be processed describing the demand text characterization, the image to be processed may be acquired through an image input operation of the target object, or may further include a query from an associated storage device or a storage apparatus based on the semantics describing the demand text characterization, and the embodiment of the present disclosure does not limit a specific manner of acquiring the image to be processed, as long as the image to be processed describing the demand text characterization can be acquired.

According to the embodiment of the disclosure, the large language model (LLM: large Language Model) can comprise a pre-trained model, the image processing intention of the demand description text representation can be identified based on the natural language understanding capability of the large language model, and then the visual task attribute matched with the image processing intention of the demand description text representation is generated, so that time consumption of operation of a target object through selection operation on a large number of image processing service resources is avoided, operation time length and operation complexity are saved, and overall image processing efficiency is improved.

According to embodiments of the present disclosure, visual task attributes may include attribute information related to performing an image processing task (or visual task), such as task configuration parameters of the visual task, visual task type, and so forth.

According to the embodiment of the disclosure, determining the image processing result related to the image to be processed according to the visual task attribute may include executing the visual task generated based on the visual task attribute, thereby realizing the image processing of the image to be processed and obtaining the image processing result.

According to an embodiment of the present disclosure, the image processing result may include any type of information, for example, may include text information describing an image to be processed, or may further include an image or an image block obtained after performing any visual task of object detection, image classification, image clipping, or the like on the image to be processed. The embodiment of the present disclosure does not limit the specific information type of the image processing result.

According to embodiments of the present disclosure, the feedback information may be used to characterize the image processing results, e.g., the feedback information may include any type of information, such as text, logos, image blocks, etc., that characterize the image processing results. The embodiment of the present disclosure does not limit the specific information type of the feedback information, and those skilled in the art may select according to actual requirements.

According to embodiments of the present disclosure, the interactive interface may include an interface for a target object to browse information, such as a display screen of a smart phone. Displaying the feedback information on the interactive interface may include displaying the feedback information by rendering the feedback information on the interactive interface, or may further include playing the feedback information in an audio format by an audio playing device associated with the interactive interface, so that the feedback information in the audio format may assist the target object to conveniently obtain an image processing result for the image to be processed.

It should be noted that, the information processing operations in any embodiment of the disclosure, including but not limited to, the information acquisition operation and the image processing operation, are performed after the authorization of the relevant user is acquired. And after the information is acquired, the security of the information is protected by adopting necessary encryption measures, so that information leakage is avoided. The data obtained by the method provided by the embodiment of the disclosure, including but not limited to image processing results and feedback information, are displayed after being checked according to relevant laws and regulations or specifications and being checked to be qualified.

According to an embodiment of the present disclosure, the information interaction method may further include: and responding to the received image to be processed input by the target object, and carrying out image processing intention detection on the image to be processed to obtain a requirement description text.

According to the embodiment of the disclosure, the target detection can be performed on the image to be processed based on the target detection algorithm, and the image processing intention detection of the image to be processed can be realized according to the obtained detection result, so that the requirement description text is obtained.

For example, the image to be processed may include a bridge a, and the target detection may be performed on the image to be processed, and the obtained detection result may be "bridge a". From the detection result, it can be determined that the demand description text characterizing the image processing intention is "building identification".

According to the embodiment of the disclosure, the image processing intention detection can be performed on the image to be processed based on other types of manners, for example, text in the image to be processed can be recognized based on OCR technology, and the recognized text is used as a requirement description text.

According to the embodiment of the disclosure, the image processing intention detection is performed on the image to be processed, which is input by the target object, so that the requirement description text is obtained, and the preliminary identification can be automatically performed on the image processing intention, which is represented by the image to be processed, under the condition that the target object does not execute the input operation of the input requirement description text, so that the operation steps of image processing can be saved, and the subsequent image processing efficiency and information interaction efficiency can be improved.

According to an embodiment of the present disclosure, the information interaction method may further include: and updating the received demand description text according to a preset demand prompt template to obtain a new demand description text.

According to embodiments of the present disclosure, a demand hint template (or promtt template) may be an intent semantic attribute for helping a large language model understand demand description text characterization and control the large language model to accurately predict a visual task attribute hint tag sequence that matches an image processing intent, which may include any type of hint tag, characters, fields, words, etc.

According to the embodiment of the disclosure, the received demand description text is updated according to the preset demand prompt template, keywords related to the image processing intention in the demand description text can be added to the demand prompt template, and the obtained new demand description text can contain a prompt mark sequence for controlling the large language model to accurately predict, so that the large language model can more accurately understand the image processing intention represented by the demand description text, and control the large language model to accurately predict a visual attribute task matched with the image processing intention, and further, an image processing result matched with the demand description text can be determined by accurately predicting the visual task attribute, and accurate and efficient processing of an image to be processed can be realized.

According to an embodiment of the present disclosure, determining an image processing result related to an image to be processed according to a visual task attribute may include: generating a visual task according to the visual task attribute and the image to be processed; executing the visual task according to at least one service resource associated with the visual task attribute to obtain a task execution result; and obtaining an image processing result according to the task execution result.

According to the embodiment of the disclosure, the visual task may include a visual task type and a visual task configuration parameter for performing image processing on the image to be processed, and the visual task is executed by calling a service resource associated with a visual task attribute, so that the visual task is executed on the image to be processed according to the image processing intention of the text description, thereby avoiding the problems that the target object calls the service resource by executing the selection operation or calls the service resource interface by manually sending the control instruction to execute the operation step overlength, the service resource call error and the like generated by executing the visual task, saving the learning cost required by the target object for performing the image processing on the image to be processed, reducing the complexity of the operation flow, and improving the image processing efficiency and accuracy.

According to an embodiment of the present disclosure, the visual task attributes include a plurality of subtask attributes, and execution dependencies among the plurality of subtask attributes, the visual task includes a subtask corresponding to the subtask attributes, and the service resource is associated with the subtask attributes.

According to the embodiment of the disclosure, the subtask attribute can represent the attribute of subtasks such as a task configuration parameter of the subtask of the visual task, a subtask object, a visual processing type of the subtask, a service resource identifier of a service resource required to be called by executing the subtask and the like, and the plurality of subtasks represent the logic relationship for executing the visual task through executing the dependency relationship. The requirement description text is processed by utilizing the large language model, so that the visual task attribute comprising a plurality of subtask attributes and execution dependency relationships is obtained, and the execution of the visual task can be finely disassembled to clearly represent the execution process of executing the visual task. By calling the service resource associated with the subtask attribute to execute the subtask, the subtask can be executed in a fine granularity, and further, the execution precision of the subtask is improved.

According to an embodiment of the present disclosure, performing a visual task according to at least one service resource associated with a visual task attribute, obtaining a task execution result includes: and according to the execution dependency relationship, calling a kth service resource associated with the kth subtask attribute to execute the kth subtask to obtain a kth subtask execution result, wherein k is more than 1, and k is an integer.

According to an embodiment of the present disclosure, the kth sub-visual task may include a task execution result determined according to the kth sub-task attribute, and the task execution result may include one or more sub-task execution results. The kth subtask attribute may include a kth service resource identifier of a service resource suitable for executing the kth subtask, the kth service resource is called through the kth service resource identifier, and the kth subtask may be executed based on a configuration parameter of the kth subtask attribute, so as to obtain a kth subtask execution result matched with the image processing intention represented by the requirement description text.

For example, where the kth subtask attribute indicates an image segmentation task and indicates an image region parameter that requires image segmentation of the image to be processed, the kth subtask may be generated based on the image region parameter and the subtask type of the visual subtask. And calling the kth service resource through the kth service resource identifier to execute the kth sub-visual task, so as to obtain a kth sub-task execution result.

According to an embodiment of the present disclosure, the kth sub-visual task may further include a determination based on the kth-1 sub-task execution result and the kth sub-task attribute.

For example, the kth subtask attribute may characterize the signal light state of the traffic signal light in the image to be processed, the kth-1 subtask execution result may be a signal light image block (image segmentation subtask execution result) obtained by image segmentation of the traffic signal light image region in the image to be processed, and the signal light state recognition subtask (kth subtask) may be generated based on the signal light image block obtained by segmentation and the subtask attribute parameter characterized by the kth subtask attribute.

According to an embodiment of the present disclosure, the sub-visual task includes at least one of: target detection subtask, image editing subtask, image description subtask.

According to embodiments of the present disclosure, the target detection subtask may include a subtask that detects a person, a building, a signal light waiting for a target in an image to be processed, and may include a signal light state detection subtask, a building name detection subtask, and the like, for example.

It should be noted that, the subtask execution result of the target detection subtask may include a recognition result of an attribute such as a type of the target to be detected, and may further include description information related to the target to be detected. For example, in the case that the object detection subtask is to detect a building in an image block, the subtask execution result of the object detection subtask may include information about the name of the building, building technology introduction information related to the building, and the like.

According to embodiments of the present disclosure, the image clipping subtask may include a subtask that clips consecutive multi-frame video frame images, or may further include a subtask that clips any one frame image, contrast adjustment, and the like.

In one embodiment of the present disclosure, the image to be processed may be a continuous multi-frame video frame image and the image clipping subtask may be a subtask that clips the continuous multi-frame video frame image. The subtask execution result of the image clipping subtask may include a clipped multi-frame video frame image, or may further include description information of the clipping operation procedure.

According to embodiments of the present disclosure, the image description subtask may include a subtask that describes text in an image to be processed, or may further include a subtask that describes a layout of objects contained in the image to be processed.

In one embodiment of the present disclosure, the image to be processed may include a scanned image for the ticket, and the image description subtask may include a subtask that describes the ticket type of the ticket.

According to embodiments of the present disclosure, the service resources suitable for performing the sub-visual tasks may be software service resources such as plug-ins with visual sub-task service capabilities, or may also include hardware service resources such as chips with image processing capabilities.

According to an embodiment of the present disclosure, obtaining an image processing result according to a task execution result may include: and fusing a plurality of subtask execution results based on the execution dependency relationship to obtain an image processing result.

According to the embodiment of the disclosure, the execution dependency relationship may represent the execution logic sequence of a plurality of visual subtasks, some or all of the execution results of the subtasks may be arranged based on the dependency relationship, so as to realize fusion of the execution results of the subtasks based on the execution dependency relationship, and the obtained execution results of the arranged subtasks are used as image processing results.

According to the embodiment of the disclosure, the execution dependency relationship is based on the fusion of the plurality of subtask execution results, the execution dependency relationship among the plurality of subtask execution results can be processed based on the neural network model, and further the full fusion of the plurality of subtask execution results according to the execution dependency relationship can be achieved, for example, the neural network model can fuse the image blocks, the image identifications and the building names respectively contained in the plurality of subtask execution results according to the execution dependency relationship, so that the image identifications and the building names corresponding to the image identifications are generated in the image to be processed, the fused image processing result is obtained, and therefore the target object can quickly learn the image processing result corresponding to the image to be processed according to feedback information generated by the fused image processing result.

Fig. 3 schematically illustrates a schematic diagram of an information interaction method according to an embodiment of the present disclosure.

As shown in fig. 3, the input information 301 may be pushed to the service module 300, and the input information 301 may include a requirement description text: "clothing brand of right person 1", and image to be processed 3011. The service module 300 may include a large language model 310 and a plurality of service resources. The visual task attributes may be output by entering the demand description text into the large language model 310. The visual task attributes may include a plurality of subtask attributes. The plurality of subtask attributes are a target detection subtask attribute 311, an image block determination subtask attribute 312, an image segmentation subtask attribute 313, and a garment identification subtask attribute 314, respectively. The execution dependency relationship between the plurality of subtask attributes may be characterized based on the order of the object detection subtask attribute 311, the image block determination subtask attribute 312, the image segmentation subtask attribute 313, and the garment identification subtask attribute 314.

As shown in fig. 3, for the 1 st sub-visual task, a target detection sub-task R311 (1 st sub-visual task) may be generated based on the target detection sub-task attribute 311 and the image to be processed 3011. Based on the target detection subtask attribute 311, a service resource (target detection resource 321) associated with the target detection subtask attribute 311 may be invoked to execute the target detection subtask R311, resulting in a 1 st subtask execution result. The 1 st sub-task execution result may be, for example, performing target detection on the image to be processed, to obtain a detection frame corresponding to each task.

As shown in fig. 3, for the 2 nd sub-visual task, an image block determination sub-task R312 (2 nd sub-visual task) may be generated based on the image block determination sub-task attribute 312 and the 1 st sub-task execution result. Based on image block determination subtask attribute 312, a service resource (image block determination resource 322) associated with image block determination subtask attribute 312 may be invoked to execute image block determination subtask R312, resulting in a sub-task execution result of No. 2. The 2 nd sub-task execution result may be, for example, a detection frame corresponding to the rightmost person in the 1 st sub-task execution result determined as a detection frame in which image division is required.

As shown in fig. 3, for the 3 rd sub-visual task, an image segmentation sub-task R313 (3 rd sub-visual task) may be generated based on the image segmentation sub-task attribute 313 and the 2 nd sub-task execution result. Based on the image division subtask attribute 313, a service resource (image division resource 323) associated with the image division subtask attribute 313 may be called to execute the image division subtask R313, resulting in a 3 rd subtask execution result. The 3 rd sub-task execution result may be, for example, image segmentation according to the detection frame determined to be required for image segmentation in the 2 nd sub-task execution result, so as to obtain the rightmost character image block in the image 3011 to be processed.

As shown in fig. 3, for the 4 th sub-visual task, a garment recognition sub-task R314 (4 th sub-visual task) may be generated based on the garment recognition sub-task attribute 314 and the 3 rd sub-task execution result. Based on the garment identification subtask attribute 314, a service resource (garment identification resource 324) associated with the garment identification subtask attribute 314 may be invoked to execute the garment identification subtask R314, resulting in a 4 th subtask execution result. The 4 th sub-task execution result may be, for example, clothing detection is performed on the right-most character image block obtained after segmentation according to the 3 rd sub-task execution result, so as to obtain information related to clothing brands.

It should be noted that, the information processing operations in the embodiments of the present disclosure, including but not limited to, the information acquisition operation and the image processing operation, are performed after the authorization of the relevant user is acquired. And after the information is acquired, the security of the information is protected by adopting necessary encryption measures, so that information leakage is avoided. The data obtained by the method provided by the embodiment of the disclosure, including but not limited to image processing results and feedback information, are displayed after being checked according to relevant laws and regulations or specifications and being checked to be qualified.

According to an embodiment of the present disclosure, generating feedback information according to an image processing result may include: processing the image processing result by using the large language model to obtain a processing result description text; and generating feedback information according to the result description text.

According to the embodiment of the disclosure, the image processing result is processed by using the large language model, so that the result description text for describing the image processing result can be generated based on the analysis capability and the text prediction capability of the large language model, the image processing result can be represented by natural language through the feedback information generated according to the result description text, the understanding difficulty of the image processing result is further reduced, and the browsing efficiency of the target object is improved.

According to an embodiment of the present disclosure, processing an image processing result using a large language model, obtaining a processing result description text may include: updating a preset feedback prompt template based on an image processing result to obtain feedback prompt information; and processing the feedback prompt information by using the large language model to obtain a processing result description text.

According to embodiments of the present disclosure, the feedback hint template may include a feedback hint flag sequence that may be used to control the large language model to accurately predict natural language for describing the image processing result, thereby generating processing result description text that describes the image processing result in natural language.

In one embodiment of the present disclosure, the feedback hint information may include the following paragraphs enclosed with "//":

I/I try to compile a description of the results of multiple sub-visual task executions as follows.

The current sub-visual task execution results are respectively:

″{picture1}″、″{picture2}″；

please describe the execution sub-results respectively, and fuse the description contents to obtain the description text of the image processing result. //

Note that "{ picture }", "{ picture2 }" in the feedback prompt message may be a sub-task execution result, and the feedback prompt template may be a feedback prompt tag sequence formed by other fields and characters except "{ picture }", "{ picture2 }" in the feedback prompt message.

According to an embodiment of the present disclosure, the requirement description text is acquired according to an input operation of the target object with respect to the interactive interface.

According to an embodiment of the present disclosure, the information interaction method may further include: a demand description floating window representing demand description text is generated on the interactive interface.

According to the embodiment of the disclosure, the target object can generate the requirement description text by inputting the text, the voice and other information aiming at the requirement description floating window, so that the target object can conveniently process the image to be processed in a chat interaction mode, the visual task aiming at the image to be processed is prevented from being executed by selecting the visual processing module, and the operation steps are saved.

According to an embodiment of the present disclosure, displaying feedback information at the interactive interface may include: and generating a feedback information floating window suitable for displaying feedback information at a second position which is separated from the first position of the demand description floating window by a preset distance range in the interactive interface.

Fig. 4 schematically illustrates an application scenario diagram of an information interaction method according to an embodiment of the present disclosure.

As shown in fig. 4, a demand description float 410 may be included in the interactive interface 400 that characterizes the demand description text, and the demand description float 410 may include the demand description text "person 1's clothing brand" and a pending image 411 associated with the demand description text. According to the requirement description text, the information interaction method provided by the embodiment of the disclosure can be executed, and feedback information corresponding to the requirement description text is obtained. Feedback information may be presented in feedback information floating window 420, and the feedback information may include result description text: "person 1 on the right side in the lower drawing is a person in a broken line box in the lower drawing, the brand of clothing is XXX", and the image processing result 421 may be included. The image processing result 421 can annotate the object to be detected, which needs to be identified, of the requirement description text based on the dotted line box, so as to facilitate browsing and viewing of the target object.

As shown in fig. 4, the interactive interface 400 may further include an input box 430, and the target object may input the required description text by inputting the text in the input box 430, or may further upload the image to be processed by dragging the image to be processed to the input box 430. It should be appreciated that the target object may implement the natural language based request for visual task services by entering the demand description text and the image to be processed at input box 430.

The information interaction method provided by the embodiment of the disclosure can integrate diversified computer vision functions, and generate the instruction for controlling the computer vision service resources based on the dialogue interaction mode, so as to realize fine control of the service resources to carry out fine granularity classification on the vision tasks of the images, and effectively expand the capability range of a large language model.

According to an embodiment of the present disclosure, determining an image processing result related to the image to be processed according to the visual task attribute may further include: generating a service call request according to the visual task attribute; the cloud service end is configured to call cloud service resources corresponding to the visual task attributes according to the service call request; and processing the image to be processed according to the called cloud service resource to obtain an image processing result.

According to embodiments of the present disclosure, the service invocation request may include a task attribute parameter of the visual task and a service resource identification of the service resource required to perform the visual task. By sending the service call request to the cloud, the cloud server can generate a visual task according to the service call request, and call cloud service resources corresponding to the visual task to execute the visual task, so that the image processing of the image to be processed is realized, and the image processing result is obtained. And further, an image processing result is obtained from the cloud server, cloud deployment of the visual task can be realized, the computing overhead of the local service resource for executing the visual task is reduced, and the execution efficiency of the visual task is improved.

According to an embodiment of the disclosure, the visual task attribute may include a plurality of subtask attributes and execution dependencies among the plurality of subtask attributes, and the service invocation request may also correspondingly include the plurality of subtask attributes and the execution dependencies among the subtask attributes. After receiving the service call request, the cloud service end can respectively generate a plurality of sub-visual tasks based on the plurality of sub-task attributes and call cloud service resources respectively corresponding to the plurality of sub-task attributes. And further, sub-visual tasks can be executed at the cloud server according to the execution dependency relationship, so that the cloud server can generate an image processing result according to the method provided by the embodiment of the disclosure.

According to the embodiment of the disclosure, the service call request can also comprise part of subtask attributes in the visual task attributes, so that the cloud service resources of the cloud service end can be called, the service resources of the local service end are called to jointly execute the visual task, and the application range of the information interaction method is improved.

According to an embodiment of the present disclosure, the information interaction method may further include: sending an image to be processed to a cloud server; and receiving an image processing result sent by the cloud server.

According to an embodiment of the disclosure, sending the image to be processed to the cloud server may include generating an image packet based on the image to be processed, and asynchronously sending the image packet to the cloud server through a message queue, so as to send the image to be processed to the cloud server.

According to an embodiment of the present disclosure, receiving the image processing result transmitted from the cloud server may include asynchronously acquiring the image processing result from the cloud server through a message queue.

Fig. 5 schematically illustrates an application scenario diagram of an information interaction method according to another embodiment of the present disclosure.

As shown in fig. 5, a client 510, a local server 520, and a cloud server 530 may be included in the application scenario 500. The target object may enter the demand description text 501 and the image to be processed 502 via the client 510. The client 510 may send the demand description text 501 to the server 520 and the pending image 502 to the cloud server 530.

The local server 520 may include a large language model service module and a plug-in service module. The large language model service module can be constructed based on a pre-trained large language model, and in the case that the local server 520 acquires the requirement description text 501, the large language model service module processes the requirement description text 501 based on the large language model and generates visual task attributes. The visual task attributes may include a plurality of subtask attributes and execution dependencies between the plurality of subtask attributes, and the subtask attributes may include cloud service resource identifications of invoked cloud service resources. Execution dependencies between the plurality of subtask attributes and the plurality of subtask attributes may be stored in the visual task list to facilitate structured storage of the subtask attributes and the execution dependencies. The plugin service module may generate a service invocation request 521 based on the execution dependencies between the plurality of subtask attributes and the plurality of subtask attributes generated by the large language model service module. The local server 520 sends a service invocation request 521 to the cloud server 530. The cloud service end 530 receives the service call request 521, and may generate a plurality of sub-visual tasks based on execution dependency relationships between a plurality of sub-task attributes and a plurality of sub-task attributes in the service call request 521, and call cloud service resources corresponding to the sub-task attributes according to the execution dependency relationships to sequentially execute the plurality of sub-visual tasks, so as to obtain a plurality of sub-task execution results. The cloud service 530 may generate an image processing result message 531 based on the execution results of the plurality of subtasks. The local server 520 may push the execution results of the plurality of subtasks in the image processing result message 531. And the plug-in service module fuses the execution results of the subtasks according to the execution dependency relationship and pushes the fused image processing result to the large language model service module. The large language model service module may process the image processing result based on the large language model and output a result description text. The local server 520 may generate feedback information 522 based on the image processing result and the result description text, and send the feedback information 522 to the client 510, so as to present the feedback information 522 at the interactive interface of the client 510. According to the information interaction method provided by the embodiment of the disclosure, the target object can request to execute the visual task in a natural language interaction mode similar to chat, visual services do not need to be selected from a plurality of visual function lists, a plurality of complicated visual service interfaces do not need to be accessed, and the overall efficiency of executing the visual task is improved.

In another embodiment of the present disclosure, the local server may generate service call requests corresponding to the subtask attributes one to one for the plurality of subtask attributes, and send the plurality of service call requests to the cloud server. The cloud server may also send the execution results of the multiple subtasks to the local server, respectively.

It should be noted that, the local server shown in fig. 5 may be a server or a server cluster, and the large language model service module and the plug-in service module may be deployed in any server or server cluster of the local server.

In another embodiment of the present disclosure, the cloud server may include an object server corresponding to the authority of the target object, where the object server may deploy a privately-owned service resource related to the target object, so that the target object may conveniently and rapidly execute a visual task by calling the privately-owned service resource, and implement a personalized processing procedure of the image to be processed.

According to an embodiment of the present disclosure, the information interaction method may further include: performing authority authentication on a target object related to the input demand description text to obtain an authority authentication result; and determining an object server corresponding to the authority of the target object according to the authority authentication result, wherein the cloud server comprises the object server.

According to the embodiment of the disclosure, the target object can be subjected to authority authentication according to the authority token related to the target object, so that an authority authentication result can be obtained. By determining the object server corresponding to the authority authentication result, a service call request can be further sent to the object server related to the authority of the target object, and an image processing result can be obtained from the object server. The image service end can be adapted according to the authority attribute of the target object, so that the target object can execute a visual task by calling cloud service resources in the object service end, information leakage of the image to be processed caused by the fact that the image to be processed is distributed to other cloud service ends is avoided, and information safety of the target object is guaranteed.

Fig. 6 schematically shows a block diagram of an information interaction device according to an embodiment of the disclosure.

As shown in fig. 6, the information interaction device 600 includes: a visual task attribute obtaining module 610, an image processing result determining module 620, a feedback information generating module 630 and a presentation module 640.

The visual task attribute obtaining module 610 is configured to process, in response to obtaining the requirement description text, the requirement description text with a large language model, and obtain a visual task attribute matched with an image processing intention represented by the requirement description text, where the requirement description text is associated with an image to be processed.

The image processing result determining module 620 is configured to determine an image processing result related to the image to be processed according to the visual task attribute.

The feedback information generating module 630 is configured to generate feedback information according to the image processing result.

And the display module 640 is used for displaying the feedback information on the interactive interface.

According to an embodiment of the present disclosure, the image processing result determining module includes: the system comprises a visual task generating sub-module, a task execution result determining sub-module and an image processing result obtaining sub-module.

And the visual task generating sub-module is used for generating a visual task according to the visual task attribute and the image to be processed.

And the task execution result determining sub-module is used for executing the visual task according to at least one service resource associated with the visual task attribute to obtain a task execution result.

And the image processing result obtaining sub-module is used for obtaining an image processing result according to the task execution result.

According to an embodiment of the present disclosure, the task execution result determination submodule includes a sub-visual task execution unit.

The sub-visual task execution unit is used for calling a kth service resource associated with the kth sub-task attribute to execute the kth sub-visual task according to the execution dependency relationship to obtain a kth sub-task execution result, wherein k is more than 1, k is an integer, the kth sub-visual task is determined according to the kth sub-task attribute, and the task execution result comprises a sub-task execution result.

According to an embodiment of the present disclosure, the image processing result obtaining submodule includes an image processing result obtaining unit.

And the image processing result obtaining unit is used for fusing a plurality of subtask execution results based on the execution dependency relationship to obtain an image processing result.

According to an embodiment of the present disclosure, a feedback information generation module includes: the processing result describes a text obtaining sub-module and a feedback information generating sub-module.

And the processing result description text obtaining sub-module is used for processing the image processing result by using the large language model to obtain the processing result description text.

And the feedback information generation sub-module is used for generating feedback information according to the result description text.

According to an embodiment of the present disclosure, a processing result description text obtaining submodule includes: and the feedback prompt information obtaining unit and the processing result description text obtaining unit.

The feedback prompt information obtaining unit is used for updating a preset feedback prompt template based on the image processing result to obtain feedback prompt information.

And the processing result description text obtaining unit is used for processing the feedback prompt information by using the large language model to obtain the processing result description text.

The information interaction device further includes: the requirements describe the floating window generation module.

And the demand description floating window generation module is used for generating a demand description floating window representing the demand description text on the interactive interface.

According to an embodiment of the disclosure, the presentation module includes a feedback information floating window generation sub-module.

And the feedback information floating window generation sub-module is used for generating a feedback information floating window suitable for displaying feedback information at a second position which is away from the first position of the demand description floating window by a preset distance range in the interactive interface.

According to the embodiment of the disclosure, the information interaction device further comprises a demand description book determining module.

The demand description script determining module is used for responding to the to-be-processed image input by the target object, and carrying out image processing intention detection on the to-be-processed image to obtain a demand description text.

According to an embodiment of the disclosure, the information interaction device further comprises an updating module.

And the updating module is used for updating the received demand description text according to the preset demand prompt template to obtain a new demand description text.

According to an embodiment of the present disclosure, the image processing result determining module includes: a request generation module and a first transmission module.

And the request generation module is used for generating a service call request according to the visual task attribute.

The cloud service end is configured to call cloud service resources corresponding to the visual task attributes according to the service call request; and processing the image to be processed according to the called cloud service resource to obtain an image processing result.

According to an embodiment of the present disclosure, the information interaction device further includes: and the second sending module and the receiving module.

And the second sending module is used for sending the image to be processed to the cloud server.

And the receiving module is used for receiving the image processing result sent by the cloud server.

According to an embodiment of the present disclosure, the information interaction device further includes: and the permission authentication result acquisition module and the object server side determination module.

And the permission authentication result obtaining module is used for performing permission authentication on the target object related to the input demand description text to obtain a permission authentication result.

And the object server determining module is used for determining an object server corresponding to the authority of the target object according to the authority authentication result, wherein the cloud server comprises the object server.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement the information interaction method according to an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as an information interaction method. For example, in some embodiments, the information interaction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, e.g., storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the information interaction method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the information interaction method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An information interaction method, comprising:

in response to obtaining a demand description text, processing the demand description text by using a large language model to obtain a visual task attribute matched with an image processing intention represented by the demand description text, wherein the demand description text is associated with an image to be processed;

determining an image processing result related to the image to be processed according to the visual task attribute;

Generating feedback information according to the image processing result; and

and displaying the feedback information on an interactive interface.

2. The method of claim 1, wherein the determining, according to the visual task attribute, an image processing result related to the image to be processed comprises:

generating a visual task according to the visual task attribute and the image to be processed;

executing the visual task according to at least one service resource associated with the visual task attribute to obtain a task execution result; and

and obtaining the image processing result according to the task execution result.

3. The method of claim 2, wherein the visual task attributes comprise a plurality of subtask attributes, and execution dependencies among the plurality of subtask attributes, the visual task comprising a subtask corresponding to the subtask attributes, the service resource being associated with the subtask attributes;

wherein the executing the visual task according to the at least one service resource associated with the visual task attribute, and obtaining a task execution result includes:

and according to the execution dependency relationship, invoking a kth service resource associated with a kth subtask attribute to execute a kth subtask to obtain a kth subtask execution result, wherein k is more than 1, k is an integer, the kth subtask is determined according to the kth subtask attribute, and the task execution result comprises the subtask execution result.

4. A method according to claim 3, wherein said obtaining said image processing result according to said task execution result comprises:

and fusing a plurality of subtask execution results based on the execution dependency relationship to obtain the image processing result.

5. A method according to claim 3, wherein the sub-visual task comprises at least one of:

target detection subtask, image editing subtask, image description subtask.

6. The method of claim 1, wherein the generating feedback information from the image processing result comprises:

processing the image processing result by using the large language model to obtain a processing result description text; and

and generating the feedback information according to the result description text.

7. The method of claim 6, wherein said processing said image processing results using said large language model to obtain processing result description text comprises:

updating a preset feedback prompt template based on the image processing result to obtain feedback prompt information; and

and processing the feedback prompt information by using the large language model to obtain the processing result description text.

8. The method of claim 6, wherein the demand description script is obtained from an input operation of a target object with respect to the interactive interface;

the method further comprises the steps of:

generating a requirement description floating window representing the requirement description text on the interactive interface;

wherein, the displaying the feedback information on the interactive interface includes:

and generating a feedback information floating window which is suitable for displaying the feedback information at a second position which is away from the first position of the demand description floating window by a preset distance range in the interactive interface.

9. The method of claim 1, further comprising:

and responding to the image to be processed which is input by the target object, and carrying out image processing intention detection on the image to be processed to obtain the requirement description text.

10. The method of claim 1, further comprising:

and updating the received demand description text according to a preset demand prompt template to obtain a new demand description text.

11. The method of claim 1, wherein the determining, according to the visual task attribute, an image processing result related to the image to be processed comprises:

generating a service call request according to the visual task attribute; and

The service calling request is sent to a cloud service end, wherein the cloud service end is configured to call cloud service resources corresponding to the visual task attributes according to the service calling request; and processing the image to be processed according to the called cloud service resource to obtain the image processing result.

12. The method of claim 11, further comprising:

sending the image to be processed to the cloud server; and

and receiving the image processing result sent by the cloud server.

13. The method of claim 11, further comprising:

performing authority authentication on a target object related to the input requirement description text to obtain an authority authentication result; and

and determining an object server corresponding to the authority of the target object according to the authority authentication result, wherein the cloud server comprises the object server.

14. An information interaction device, comprising:

the visual task attribute obtaining module is used for responding to the acquired demand description text, processing the demand description text by utilizing a large language model to obtain visual task attributes matched with image processing intents represented by the demand description text, wherein the demand description text is associated with an image to be processed;

The image processing result determining module is used for determining an image processing result related to the image to be processed according to the visual task attribute;

the feedback information generation module is used for generating feedback information according to the image processing result; and

and the display module is used for displaying the feedback information on the interactive interface.

15. The apparatus of claim 14, wherein the image processing result determination module comprises:

the visual task generating sub-module is used for generating a visual task according to the visual task attribute and the image to be processed;

the task execution result determining sub-module is used for executing the visual task according to at least one service resource associated with the visual task attribute to obtain a task execution result; and

and the image processing result obtaining sub-module is used for obtaining the image processing result according to the task execution result.

16. The apparatus of claim 15, wherein the visual task attributes comprise a plurality of subtask attributes, and execution dependencies among the plurality of subtask attributes, the visual task comprising a subtask corresponding to the subtask attributes, the service resource being associated with the subtask attributes;

Wherein, the task execution result determining submodule comprises:

the sub-visual task execution unit is used for calling a kth service resource associated with a kth sub-task attribute to execute the kth sub-visual task according to the execution dependency relationship to obtain a kth sub-task execution result, wherein k is more than 1, k is an integer, the kth sub-visual task is determined according to the kth sub-task attribute, and the task execution result comprises the sub-task execution result.

17. The apparatus of claim 16, wherein the image processing result obtaining submodule comprises:

and the image processing result obtaining unit is used for fusing a plurality of subtask execution results based on the execution dependency relationship to obtain the image processing result.

18. The apparatus of claim 16, wherein the sub-visual task comprises at least one of:

target detection subtask, image editing subtask, image description subtask.

19. The apparatus of claim 14, wherein the feedback information generation module comprises:

the processing result description text obtaining sub-module is used for processing the image processing result by using the large language model to obtain a processing result description text; and

And the feedback information generation sub-module is used for generating the feedback information according to the result description text.

20. The apparatus of claim 19, wherein the processing result description text obtaining submodule comprises:

the feedback prompt information obtaining unit is used for updating a preset feedback prompt template based on the image processing result to obtain feedback prompt information; and

21. The apparatus of claim 19, wherein the demand description text is obtained from an input operation of a target object for the interactive interface;

the apparatus further comprises:

the demand description floating window generation module is used for generating a demand description floating window representing the demand description text on the interactive interface;

wherein, the show module includes:

and the feedback information floating window generation sub-module is used for generating a feedback information floating window suitable for displaying the feedback information at a second position which is distant from the first position of the demand description floating window by a preset distance range in the interactive interface.

22. The apparatus of claim 14, further comprising:

and the demand description text determining module is used for responding to the to-be-processed image input by the target object, and carrying out image processing intention detection on the to-be-processed image to obtain the demand description text.

23. The apparatus of claim 14, wherein the apparatus further comprises:

24. The apparatus of claim 14, wherein the image processing result determination module comprises:

the request generation module is used for generating a service call request according to the visual task attribute; and

the first sending module is used for sending the service calling request to a cloud service end, wherein the cloud service end is configured to call cloud service resources corresponding to the visual task attributes according to the service calling request; and processing the image to be processed according to the called cloud service resource to obtain the image processing result.

25. The apparatus of claim 24, further comprising:

the second sending module is used for sending the image to be processed to the cloud server; and

26. The apparatus of claim 24, further comprising:

the permission authentication result obtaining module is used for performing permission authentication on the target object related to the input of the requirement description text to obtain a permission authentication result; and

27. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 13.

28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 13.

29. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 13.