CN116975336A

CN116975336A - Image processing method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN116975336A
Application number: CN202310898516.1A
Authority: CN
Inventors: 赖仕凡; 陈玉娴; 李美慧; 陈志杰; 陈爽; 邵领; 李强强; 刘晓芬; 刘艳红; 孙赫; 徐梓茹; 马可欣
Original assignee: Baidu International Technology Shenzhen Co ltd
Current assignee: Baidu International Technology Shenzhen Co ltd
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-31

Abstract

The disclosure provides an image processing method, device, equipment and storage medium based on artificial intelligence, and relates to the technical fields of deep learning, natural language processing, intelligent searching and the like. The image processing method based on artificial intelligence comprises the following steps: processing a target image input by a user to obtain target description information; responding to an inquiry request of the target description information, and initiating a current session according to an inquiry statement in the inquiry request; and constructing a dialogue context for the current session according to the target description information, calling a text generation model based on artificial intelligence, and answering the pursuit statement in the current session according to the dialogue context. By the aid of the technical scheme, accuracy of image processing can be improved.

Description

Image processing method, device, equipment and storage medium based on artificial intelligence

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, natural language processing, intelligent searching and the like. And in particular to an image processing method, device, equipment and storage medium based on artificial intelligence.

Background

Searching is an important way for users to find information and is a quick way for object cognition and extension inquiry, but traditional searching needs standardized expression requirements of users, such as 'Tatarian aster' in life, but does not know what is, and can not directly search and ask, and needs to be continuously described. If the method is completed by shooting search or supplementing typing search after shooting, only the existing contents on the internet can be searched, and summary cannot be summarized and new contents can be generated.

Therefore, how to use the image lookup information is important.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus, device and storage medium based on artificial intelligence.

According to an aspect of the present disclosure, there is provided an image processing method based on artificial intelligence, including:

processing a target image input by a user to obtain target description information;

responding to an inquiry request of the target description information, and initiating a current session according to an inquiry statement in the inquiry request;

and constructing a dialogue context for the current session according to the target description information, calling a text generation model based on artificial intelligence, and answering the pursuit statement in the current session according to the dialogue context.

According to an aspect of the present disclosure, there is provided an artificial intelligence based image processing apparatus including:

the image description module is used for processing the target image input by the user to obtain target description information;

the current session module is used for responding to the inquiry request of the target description information and initiating a current session according to an inquiry statement in the inquiry request;

and the session processing module is used for constructing a dialogue context for the current session according to the target description information, calling a text generation model based on artificial intelligence and answering the inquiry statement in the current session according to the dialogue context.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method provided by any of the embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1a is a flow chart of an artificial intelligence based image processing method provided in accordance with an embodiment of the present disclosure;

FIG. 1b is a schematic illustration of an artificial intelligence dialog container provided in accordance with an embodiment of the disclosure;

FIG. 2a is a flow chart of another artificial intelligence based image processing method provided in accordance with an embodiment of the present disclosure;

FIGS. 2 b-2 c are schematic diagrams of an artificial intelligence dialog, respectively, provided in accordance with an embodiment of the present disclosure;

FIG. 3a is a flow chart of yet another artificial intelligence based image processing method provided in accordance with an embodiment of the present disclosure;

FIG. 3b is a schematic diagram of processing a target image provided in accordance with an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image processing apparatus based on artificial intelligence provided according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an electronic device for implementing an artificial intelligence based image processing method of an embodiment of the present disclosure.

Detailed Description

FIG. 1a is a flow chart of an artificial intelligence based image processing method provided in accordance with an embodiment of the present disclosure. The method is suitable for the situation of searching information by adopting images. The method may be performed by an artificial intelligence based image processing apparatus, which may be implemented in software and/or hardware and may be integrated in an electronic device. As shown in fig. 1a, the image processing method based on artificial intelligence of the present embodiment may include:

s101, processing a target image input by a user to obtain target description information;

s102, responding to an inquiry request of the target description information, and initiating a current session according to an inquiry statement in the inquiry request;

s103, establishing a dialogue context for the current session according to the target description information, calling a text generation model based on artificial intelligence, and answering the overtaking statement in the current session according to the dialogue context.

The target description information is used for performing cognition description on the target image, namely describing what the content in the target image is, so that the cognition requirement of a user is met. Specifically, the user side can input the target image in modes of photographing, album uploading and the like, judge and process the content in the target image based on images such as image recognition, image search and the like, and generate target description information so as to realize the cognition requirement on the target image.

The user may also conduct a divergent inquiry of the extended demand, for example, in the case where the target image is a certain plant, may continue to inquire about "how to nourish", "what to function", "what to means"; in the case where the target image is a person, the user can continue to inquire about "more data", "what is represented", and the like. Specifically, the user may initiate an inquiry request by continuing to enter an inquiry sentence of text.

The text generation model based on artificial intelligence (Artificial Intelligence, AI) is a knowledge enhancement large language model, can be obtained by fusion learning from mass data and large-scale knowledge, has the technical characteristics of knowledge enhancement, retrieval enhancement and dialogue enhancement, and can be constructed based on a deep learning platform and knowledge enhancement technology; the sample data volume and model parameters of the text generation model based on artificial intelligence can be in mass level, and the text generation capability is good. The dialog context used to characterize the dialog context in which the current session is located may include historical sessions that precede the current session, question sentences, answer sentences in the historical sessions, and other related information.

Specifically, in response to an inquiry request for target description information, an inquiry sentence is extracted from the inquiry request, the inquiry sentence is used as a question sentence in a current session, and a session context is constructed for the current session according to the target description information; and responding to the inquiry statement in the current session according to the dialogue context based on the text generation model of the artificial intelligence, and generating a response statement in the current session so as to realize the extension requirement on the target image.

That is, after the user inputs the query sentence with respect to the target description information, the dialogue context may be generated according to the target description information, and the dialogue context and the query sentence in the current session may be transferred to the artificial intelligence-based text generation model, and the artificial intelligence dialogue link may be performed based on the artificial intelligence-based text generation model. The user can challenge many times until the extension requirement is met, ending the artificial intelligence dialogue.

In the case of having a demand for a challenge to the target descriptive information, by constructing a dialogue context based on the target descriptive information by taking the challenge statement as a question statement in the current session, invoking an artificial intelligence-based text generation model, and answering the challenge statement in the current session based on the dialogue context, that is, by taking the target descriptive information as the dialogue context for understanding of the artificial intelligence-based text generation model, interference to the artificial intelligence-based text generation model can be reduced, thereby improving accuracy of employing image search information. It should be noted that, if the user does not have a requirement for inquiring the target description information, the operation is ended, and the artificial intelligence dialogue does not need to be conducted by calling the artificial intelligence-based text generation model.

According to the technical scheme provided by the embodiment of the disclosure, when the target description information of the target image has a query requirement, the query statement is used as a problem statement in the current session, the dialogue context is constructed according to the target description information, the text generation model based on artificial intelligence is called, and the query statement in the current session is answered according to the dialogue context, so that the interference on the text generation model can be reduced, and the accuracy of image processing is improved.

In an alternative embodiment, the method further comprises: and acquiring an image input by the user through the artificial intelligence dialogue container as the target image.

Referring to fig. 1b, embodiments of the present disclosure may provide an image input interface 11 in an artificial intelligence dialog container, a photographing control, an album control may be presented in the artificial intelligence dialog container in response to operation of the image input interface 11, and a user may upload a target image by clicking on the photographing control, or the album control. By providing the image uploading interface in the artificial intelligence dialogue container, the convenience of image uploading can be improved, and the efficiency of searching information by adopting images is improved.

In addition, the image searching tool in the related technology can only meet the cognitive requirement of a user on images, namely the 'what is in the images', but if the search result is unsatisfactory or the searching is hoped to continue, the searching can only be described and searched again in a re-searching mode, the searching cannot be directly performed on the current page, the flow is too complicated, namely the result page lacks content expansion, the information diversity is poor, and the user requirement cannot be well met. According to the embodiment of the disclosure, not only can the cognition requirement of a user on the target image be met through the AI dialogue, but also the inquiry requirement on the target description information can be met, and the user experience is improved by supporting inquiry and diverging the expansion information.

In an alternative embodiment, the method further comprises: and replacing the general description information in the preset standard description statement by using the target description information to obtain the target description statement.

The preset standard description statement not only comprises general description information, but also comprises dialogue guiding information. Specifically, the preset standard description sentence may be: this may be an image of "general description information," what can help you? In the case where the object description information is a plush toy, the object description sentence may be: this may be an image of a "plush toy," what can help you? The target description information is adopted to replace general description information in the preset standard description statement to obtain the target description statement, and the target description statement is displayed in the artificial intelligent dialogue container, so that the cognition requirement on the target image can be met, and the user can be guided to continuously input the extension requirement on the target image.

In addition, the following inquiry threshold is reduced and the inquiry success rate is improved by displaying the inquiry function of the artificial intelligent dialogue container and the inquiry function not of the artificial intelligent dialogue container in the artificial intelligent dialogue container as inquiry auxiliary prompts. For example, while presenting the target description sentence, the following information may be presented: i currently support drawing questions, drawing text, multi-drawing continuous conversations, drawing, etc.; i do not currently support finding image provenance, etc. In addition, the inquiry auxiliary reminding display can be carried out only when the artificial intelligent dialogue container is used for the first time in the day, and the inquiry auxiliary reminding display is not repeated in the subsequent use process in the day, so that the interference of the inquiry auxiliary reminding to the user is reduced.

FIG. 2a is a flow chart of another artificial intelligence based image processing method provided in accordance with an embodiment of the present disclosure. Referring to fig. 2a, the artificial intelligence based image processing method of the present embodiment may include:

s201, processing a target image input by a user to obtain target description information;

s202, responding to an inquiry request of the target description information, and initiating a current session according to an inquiry statement in the inquiry request;

s203, constructing a history session according to the target description information to obtain a plurality of rounds of sessions including the history session and the current session;

s204, inputting the multiple rounds of conversations, generating a model based on the text of the artificial intelligence, and answering the overtaking sentences in the current conversations according to the output result of the model.

Specifically, under the condition that a user inputs a target image, the target image can be processed to obtain target description information, and the target description information can be displayed in the artificial intelligence dialogue container so as to meet the cognitive requirement on the target image. Under the condition that a user continuously inputs pursuit sentences aiming at target description information, the pursuit sentences can be used as problem sentences in target conversations, and a history conversation is built according to the target description information, so that a multi-round conversation comprising the history conversation and the current conversation is obtained; inputting multiple rounds of conversations into a text generation model based on the artificial intelligence, and answering the overtaking sentences in the current conversations according to the output result of the model, namely, performing AI multiple rounds of conversations based on the text generation model of the artificial intelligence.

The multi-wheel conversation is constructed according to the target description information and the overtaking sentences, the AI multi-wheel conversation link is carried out based on the text generation model of the artificial intelligence, compared with the process of splicing the target description information and the overtaking sentences to obtain spliced contents, the spliced contents are input into the text generation model of the artificial intelligence, the target description information is taken as the output of the text generation model of the artificial intelligence in a multi-wheel conversation mode, so that the understanding of the model to the target description information is more original, and the accuracy of the AI multi-wheel conversation can be further improved. And by generating a history session according to the target description information and generating a current session according to the inquiry statement, the AI multi-round dialogue supports the graphics context to be sent out respectively, the user can send the graphics context independently, or can send the graphics context first and then inquire, compared with the related AI dialogue products, the method has the advantages that the graphics context must be sent out synchronously, and the convenience of the user can be further improved.

In an alternative embodiment, the building a history session according to the target description information includes: taking a preset image query statement as a question statement in a history session; and taking the target description information as an answer sentence in the history session.

The image query statement is used for representing the cognitive requirement on the target image, and can be predetermined, independent of the target image input by the user, for example, please simply describe the image content as a preset image query statement. Referring to fig. 2b, in case that the target image is an azelaic acid acne-removing gel image, the question sentence in the history session may be "please simply describe the image content", and the answer sentence in the history session may be "azelaic acid acne-removing gel". By taking the preset image query statement as a question statement in the history session and taking the target description information as an answer statement in the history session, the history session can be simplified, so that interference information is prevented from being introduced into subsequent AI multi-round dialogues, and the accuracy of the AI multi-round dialogues is further improved.

According to the technical scheme provided by the embodiment of the disclosure, in response to the inquiry request for the target description information, the multi-round conversation is constructed according to the target description information and the inquiry statement, and the AI multi-round conversation link is carried out based on the text generation model of the artificial intelligence, so that the understanding of the text generation model of the artificial intelligence to the target description information is more original, and the accuracy of the AI multi-round conversation can be further improved.

In an optional embodiment, before the processing of the target image input by the user, the method further includes: processing other images input by a user to obtain other description information; the construction of the history session according to the target description information comprises the following steps: and selecting the target description information or the other description information to construct a first-round historical session in the multi-round session according to the input behavior data.

The input behavior data may include, among other things, a user input mode, user input content, user input time, etc. If the user also inputs other images in the artificial intelligence dialog container before inputting the target image in the artificial intelligence dialog container, the other images are also processed to obtain other descriptive information. Specifically, under the condition that the user inputs other images, the other images are processed to obtain other description information; and under the condition that a user inputs the target image, processing the target image to obtain target description information.

In response to the inquiry request for the target description information, other description information is selected or the target description information is selected to determine the first-round historical session according to the input behavior data, that is, whether the other description information belongs to the multi-round session or not is determined according to the input behavior data, the accuracy of the multi-round session can be improved, and the accuracy of the inquiry result can be further improved.

In an alternative embodiment, the selecting the target description information or the other description information to construct a first-round history session in the multiple-round sessions according to the input behavior data includes: determining whether text question sentences input by a user exist between the other images and the target image; filtering other description information corresponding to other images under the condition of existence, and constructing a first-round historical session in the multi-round session by adopting target description information corresponding to the target image; and if the first-round historical session does not exist, other description information corresponding to other images is reserved, and the first-round historical session in the multi-round session is constructed by adopting the other description information.

Specifically, under the condition that text problem sentences are also input between other images and target images, it is determined that other description information does not belong to multiple rounds of sessions, and the first round of history sessions in the multiple rounds of sessions are constructed by adopting target description information corresponding to the target images. This case classifies other descriptive information and text question statements into the same multi-round session, and target descriptive information and current session into another multi-round session. Under the condition that text problem sentences are not input between other images and the target image, determining that other description information belongs to a multi-round session, and constructing a first-round historical session in the multi-round session by adopting the other description information. In the case of constructing a first-round history session by using other description information, the multi-round session also comprises a history session constructed by using the target description information, and the other description information, the target description information and the current session are classified into the same multi-round session.

Referring to fig. 2c, in the case that the user sequentially uploads the peach blossom image and the plum blossom image, the "please simply describe the image content" may be used as a question sentence, and other description information "peach blossom" may be used as an answer sentence to construct a first-round history session; the 'please simply describe the image content' is used as a question sentence, and the target description information 'plum' is used as an answer sentence to construct a previous history session, so as to obtain a plurality of sessions including a first history session, a previous history session and a current session. And generating a model based on the text of the artificial intelligence by inputting multiple rounds of conversations, and answering the overtime sentences in the current conversation by adopting the model input. The continuous dialogue capability of other images and target images can be realized through the processing, namely the multi-image continuous dialogue capability is realized, and the inquiry requirement can be further enriched.

FIG. 3a is a flow chart of yet another artificial intelligence based image processing method provided in accordance with an embodiment of the present disclosure. Referring to fig. 3a, the artificial intelligence based image processing method of the present embodiment may include:

s301, carrying out intention recognition on a target image input by a user, and obtaining a target scene of a subject in the target image according to an intention recognition result;

S302, selecting a target description model for a target scene from the candidate description models according to the association relation between the candidate scene and the candidate description models;

s303, processing the target image by adopting a target description model to obtain target description information;

s304, responding to an inquiry request of the target description information, and initiating a current session according to an inquiry statement in the inquiry request;

s305, establishing a dialogue context for the current session according to the target description information, calling a text generation model based on artificial intelligence, and answering the overtaking statement in the current session according to the dialogue context.

In the embodiment of the disclosure, an image may be divided into multiple candidate scenes, and an association relationship between the candidate scenes and the candidate description model is established, that is, description paths corresponding to the candidate scenes are different. Specifically, intention recognition is carried out on a target image input by a user, and a target scene of a main body in the target image is obtained according to the result of the intention recognition; according to the association relation between the candidate scene and the candidate description model, the candidate description model associated with the target scene is used as a target description model, the target image is input into the target description model, and the target description information is obtained according to the output result of the target description model.

The target description information is determined by determining the target scene to which the main body belongs in the target image and selecting the target description model associated with the target scene, namely, the description path matched with the target scene is adopted to generate the target description information for the target image, so that the accuracy of the target description information can be improved, and the accuracy of answering the inquiry statement is improved. In addition, in the description process of the target image, the main body in the target image is mainly described, and compared with the description of the main body and the background in the target image respectively, the divergent extension description of the target image is even carried out, and the interference on the text generation model based on artificial intelligence can be reduced in the subsequent AI dialogue process.

Referring to fig. 3b, in case the target scene is a text or a title, an OCR (Optical Character Recognition ) model is taken as the target description model; in the case that the target scene is a person or plant, the image recognition search model in the image search tool can be used as a target description model; in the case where the target scene is a material, a face, an animal, an expression, a commodity, or others, an MLLM (Multimodal Large Language Models, a multimodal large language model) may be used as the target description model. The material can be poster, cartoon, etc.

Referring to fig. 3b, after the target image input by the user is obtained, it may also be determined whether the target image belongs to a graphic code image, for example, whether the target image includes a two-dimensional code or a bar code, if yes, the code recognition model is used as the target description model; otherwise, an intention recognition operation is performed on the target image. In addition, before the target image is identified, whether the target image belongs to a fuzzy graph or not can be determined, and an image replacement prompt is generated under the condition that the definition does not meet the requirement of belonging to the fuzzy graph; sensitive information screening can be carried out on the target image or the target description information, and the target image related to the sensitive information is filtered so as to ensure wind control safety.

According to the technical scheme provided by the embodiment of the disclosure, the description path matched with the target scene is adopted to generate the target description information for the target image, so that the accuracy of the target description information can be improved; under the condition that the target description information of the target image has a demand for inquiring, the AI multi-round dialogue link is carried out based on the text generation model of the artificial intelligence, so that the accuracy of the dialogue is improved.

In an optional implementation manner, after the processing is performed on the target image input by the user to obtain the target description information, the method further includes: acquiring candidate questions asking the target description information from a search engine; screening the candidate questions based on artificial intelligence dialogue rules, and selecting target questions with preset values from the screened residual candidate questions; and in the artificial intelligence dialogue container, the target problem is displayed as the overtime guide information of the target description information.

In embodiments of the present disclosure, candidate questions asking for the target description information are also obtained from the search engine, and may be filtered using blacklist vocabulary in artificial intelligence dialog rules, e.g., the blacklist vocabulary may include videos, files, links, documents, forms, or punctuation marks other than question marks. By filtering candidate questions, the extension requirements that cannot be met are prevented from being transmitted into the artificial intelligence-based text generation model. And, candidate questions may also be recalled using whitelist vocabulary in artificial intelligence dialog rules, e.g., whitelist vocabulary may include how, what, is, how much, how long, how, which, how best, not good, methods, prices, costs, distinctions, efficacy, roles, comparisons, quotations, varieties, classifications, implications, features, reasons, introductions, parameters, aggressions, configurations, uses, policies, symptoms, and so forth. The hit rate of the inquiry guidance information can be further improved by adopting the white list vocabulary recall candidate problem. It should be noted that, if the number of the candidate questions remaining after the filtering is less than the preset value, the configuration words in the vertical class to which the target description information belongs may be supplemented.

For example, in the case where the target scene is an animal, the inquiry guidance information may be: how the animal is raised, what its life habit is, what breed it belongs to; in the case where the target scene is a plant, the inquiry guidance information may be: how the plant is cultivated, what the effect it has, how the plant can eat; in the case where the target scene is a commodity, the inquiry guidance information may be: what brand this is, how much money this is, write a certain style of document; in the case where the target scene is other, the inquiry guidance information may be: how this is done, what the effect is, and a section of style document is written; in the case where the target scene is a face, the inquiry guidance information may be: writing a section of medium sharing document and writing a section of style document; in the case where the target scene is a person, the inquiry guidance information may be: more data, year, month, day of birth, and what is represented; in the case where the target scene is a material, the inquiry guidance information may be: what this means, where this comes from, how this is drawn; in the case where the target scene is an expression, the inquiry guidance information may be: what this means, where this comes from, how this is drawn; in the case where the target scene is text, the inquiry guidance information may be: what this is, how this topic does, translates this section.

Fig. 4 is a schematic structural view of an image processing apparatus based on artificial intelligence according to an embodiment of the present disclosure. The embodiment is applicable to the case of using image search information. The apparatus may be implemented in software and/or hardware. As shown in fig. 4, the artificial intelligence based image processing apparatus 400 of the present embodiment may include:

the target image description module 410 is configured to process a target image input by a user to obtain target description information;

a current session module 420, configured to respond to an inquiry request for the target description information, and initiate a current session according to an inquiry statement in the inquiry request;

and the session processing module 430 is configured to construct a dialogue context for the current session according to the target description information, call a text generation model based on artificial intelligence, and answer the query statement in the current session according to the dialogue context.

In an alternative embodiment, the session processing module 430 includes:

a history session unit, configured to construct a history session according to the target description information, so as to obtain a multi-round session including the history session and the current session;

and the session answering unit is used for inputting the multiple rounds of sessions, generating a model based on the text of the artificial intelligence, and answering the inquiry statement in the current session according to the output result of the model.

In an alternative embodiment, the history session unit includes:

a question subunit, configured to take a preset image query statement as a question statement in a history session;

and the answer subunit is used for taking the target description information as an answer sentence in the history session.

In an alternative embodiment, the image processing apparatus 400 based on artificial intelligence further includes other description modules, which are used for processing other images input by the user to obtain other description information;

the history session unit is specifically configured to:

and selecting the target description information or the other description information to construct a first-round historical session in the multi-round session according to the input behavior data.

In an alternative embodiment, the history session unit is specifically configured to:

determining whether text question sentences input by a user exist between the other images and the target image;

filtering other description information corresponding to other images under the condition of existence, and constructing a first-round historical session in the multi-round session by adopting target description information corresponding to the target image;

and if the first-round historical session does not exist, other description information corresponding to other images is reserved, and the first-round historical session in the multi-round session is constructed by adopting the other description information.

In an alternative embodiment, the image processing apparatus 400 based on artificial intelligence further includes:

and the image input module is used for acquiring an image input by a user through the artificial intelligence dialogue container as the target image.

and the description statement module is used for replacing general description information in the preset standard description statement by adopting the target description information to obtain a target description statement.

In an alternative embodiment, the image processing apparatus 400 based on artificial intelligence further includes a challenge guidance module including:

the problem acquisition unit is used for acquiring candidate problems asking the target description information from a search engine;

the problem screening unit is used for screening the candidate problems based on the artificial intelligence dialogue rules and selecting target problems with preset values from the screened residual candidate problems;

and the inquiry guiding unit is used for displaying the target questions as inquiry guiding information of the target description information in the artificial intelligent dialogue container.

In an alternative embodiment, the target image description module 410 includes:

The scene recognition unit is used for carrying out intention recognition on the target image input by the user and obtaining a target scene of the main body in the target image according to the intention recognition result;

the model selection unit is used for selecting a target description model for the target scene from the candidate description models according to the association relation between the candidate scene and the candidate description models;

and the image description unit is used for processing the target image by adopting a target description model to obtain target description information.

According to the technical scheme, various core technologies of intelligent search are combined, a text generation model based on artificial intelligence and multi-path image description are combined, and an AI dialogue product with a more technological sense is landed. In the process that the user uses the AI dialogue container, the user can recognize and extend the inquiry through the dialogue form of one-to-one inquiry with the AI dialogue container after photographing, thereby improving the answer effect and the user experience and meeting the user demands in a more diverse and natural mode.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), an image processing unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as image processing methods based on artificial intelligence. For example, in some embodiments, the artificial intelligence based image processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the artificial intelligence based image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the artificial intelligence based image processing method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image processing method based on artificial intelligence, comprising:

2. The method of claim 1, wherein the constructing a dialogue context for the current session according to the target description information, and invoking a text generation model of artificial intelligence, answering an additional sentence in the current session according to the dialogue context, comprises:

Constructing a history session according to the target description information to obtain a multi-round session comprising the history session and the current session;

and generating a model based on the text of the artificial intelligence by inputting the multiple rounds of conversations, and answering the overtime sentences in the current conversation according to the output result of the model.

3. The method of claim 2, wherein the constructing a history session from the target description information comprises:

taking a preset image query statement as a question statement in a history session;

and taking the target description information as an answer sentence in the history session.

4. The method of claim 2, further comprising, prior to processing the target image input by the user: processing other images input by a user to obtain other description information;

the construction of the history session according to the target description information comprises the following steps:

5. The method of claim 4, wherein the selecting the target descriptive information or the other descriptive information to construct a first-round history session of a multi-round session based on the input behavioral data comprises:

6. The method of claim 1, the method further comprising:

and acquiring an image input by the user through the artificial intelligence dialogue container as the target image.

7. The method of claim 1, the method further comprising:

and replacing the general description information in the preset standard description statement by adopting the target description information to obtain the target description statement.

8. The method according to claim 1, wherein after processing the target image input by the user to obtain the target description information, the method further comprises:

acquiring candidate questions asking the target description information from a search engine;

screening the candidate questions based on artificial intelligence dialogue rules, and selecting target questions with preset values from the screened residual candidate questions;

And in the artificial intelligence dialogue container, the target problem is displayed as the overtime guide information of the target description information.

9. The method of claim 1, wherein the processing the target image input by the user to obtain the target description information includes:

carrying out intention recognition on a target image input by a user, and obtaining a target scene of a main body in the target image according to an intention recognition result;

selecting a target description model for a target scene from the candidate description models according to the association relation between the candidate scene and the candidate description models;

and processing the target image by adopting a target description model to obtain target description information.

10. An artificial intelligence based image processing apparatus comprising:

the target image description module is used for processing the target image input by the user to obtain target description information;

11. The apparatus of claim 10, wherein the session processing module comprises:

12. The apparatus of claim 11, wherein the history session unit comprises:

13. The device of claim 11, further comprising an other description module for processing other images input by a user to obtain other description information;

the history session unit is specifically configured to:

14. The apparatus of claim 13, wherein the history session unit is specifically configured to:

15. The apparatus of claim 10, the apparatus further comprising:

16. The apparatus of claim 10, the apparatus further comprising:

17. The apparatus of claim 10, further comprising an inquiry guidance module, the inquiry guidance module comprising:

18. The apparatus of claim 10, wherein the target image description module comprises:

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.