CN115934987A

CN115934987A - Sample text generation method and device and electronic equipment

Info

Publication number: CN115934987A
Application number: CN202211490695.7A
Authority: CN
Inventors: 余欣彤; 刘佳祥; 冯仕堃; 冯智达; 陈徐屹; 方晔玮; 李岚欣; 张振宇; 尹维冲; 孙宇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-04-07

Abstract

The utility model provides a sample text generation method, a device and an electronic device, which relate to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as image generation, and the method comprises the following steps: the method comprises the steps of obtaining an initial text corresponding to a sample image, determining image description information of the sample image according to the initial text, wherein the image description information is used for describing the sample image based on content dimensions and/or attribute dimensions, and generating the sample text corresponding to the sample image according to the image description information.

Description

Sample text generation method and device and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning, image processing, and computer vision technologies, which can be applied to scenes such as image generation, and in particular, to a method and an apparatus for generating a sample text, and an electronic device.

Background

Artificial intelligence is the subject of research that causes computers to simulate certain human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

In the related technology, when the historical relic graph model is pre-trained, the description of the used sample text on the corresponding sample image is not accurate enough, and the description effect is not good.

Disclosure of Invention

The disclosure provides a sample text generation method, a sample text generation device, an electronic device, a storage medium and a computer program product.

According to a first aspect of the present disclosure, there is provided a sample text generation method, including: acquiring an initial text corresponding to the sample image; determining image description information of the sample image according to the initial text, wherein the image description information is used for describing the sample image based on a content dimension and/or an attribute dimension; and generating a sample text corresponding to the sample image according to the image description information.

According to a second aspect of the present disclosure, there is provided a sample text generation apparatus including: the acquisition module is used for acquiring an initial text corresponding to the sample image; a determining module, configured to determine image description information of the sample image according to the initial text, where the image description information is used to describe the sample image based on a content dimension and/or an attribute dimension; and the generating module is used for generating a sample text corresponding to the sample image according to the image description information.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the sample text generation method according to the first aspect of the disclosure.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the sample text generation method according to the first aspect of the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the sample text generation method according to the first aspect of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic illustration according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a process for generating sample text according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;

fig. 8 shows a schematic block diagram of an example electronic device that may be used to implement the sample text generation method of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that an execution subject of the sample text generation method of this embodiment is a sample text generation apparatus, which may be implemented in a software and/or hardware manner, and the apparatus may be configured in an electronic device, where the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision and the like, and can be applied to scenes such as image generation and the like.

Wherein, artificial Intelligence (Artificial Intelligence), english is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Deep learning is to learn the intrinsic rules and expression levels of sample data, and the information obtained in the learning process is helpful to the interpretation of data such as characters, images and sounds. The final goal of deep learning is to make a machine capable of human-like analytical learning, and to recognize data such as characters, images, and sounds.

Computer vision means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eye observation or transmitted to an instrument for detection.

In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the common customs of public order.

As shown in fig. 1, the sample text generation method includes:

s101: an initial text corresponding to the sample image is obtained.

The sample image is an image that is used as training data in a model training process of the biometric model. The initial text refers to the text in the initial state used to describe the related information of the sample image.

In the embodiment of the present disclosure, when the initial text corresponding to the sample image is obtained, a web crawler technology may be used to automatically crawl a picture from an internet page as the sample image, and obtain a corresponding description text (such as Alt-text) in the web page as the initial text corresponding to the sample image, or a communication link between an execution subject and a big data server in the embodiment of the present disclosure may be established in advance, and then the sample image and the initial text corresponding to the sample image are obtained from the big data server, which is not limited herein.

It can be understood that the original purpose of the obtained initial text may be greatly different from that of the training process of the biometric model, which results in a poor training effect of the initial text on the biometric model.

S102: and determining image description information of the sample image according to the initial text, wherein the image description information is used for describing the sample image based on the content dimension and/or the attribute dimension.

The content dimension refers to a dimension for describing content with respect to the sample image, such as dimensions for describing a subject and describing details. The attribute dimension refers to a dimension for the attribute of the sample image, such as the style, type, style, etc. of the sample image.

The image description information refers to description information of a specified dimension for a sample image in an initial text. Such as, without limitation, theme a, description details B, artist C, type of paintings, impressionism style, etc.

In this embodiment of the present disclosure, when determining the image description information of the sample image according to the initial text, the initial text may be input into a pre-trained artificial intelligence AI model to obtain corresponding image description information, or the initial text may be processed by a third-party image description information determining apparatus to obtain the image description information, which is not limited thereto.

In the implementation of the disclosure, when the image description information of the sample image is determined according to the initial text, the description accuracy of the obtained image description information on the sample image can be ensured.

S103: and generating a sample text corresponding to the sample image according to the image description information.

The sample text refers to a text generated based on the image description information, and can be used as training data of the text-to-graph model.

In the implementation of the present disclosure, the number of the obtained image description information may be multiple, and when a sample text corresponding to a sample image is generated according to the image description information, priority information of each image description information may be determined, and then the image description information whose priority information meets a preset condition is used as the sample text according to the obtained priority information, or identification information corresponding to the image description information may be separately generated, and then the image description information including the identification information is used as the sample text, where the identification information may be used by a text generation model to identify the image description information, and of course, any other possible method may be used to generate the sample text corresponding to the sample image according to the image description information, which is not limited.

In this embodiment, the initial text corresponding to the sample image is obtained, the image description information of the sample image is determined according to the initial text, wherein the image description information is used for describing the sample image based on the content dimension and/or the attribute dimension, and the sample text corresponding to the sample image is generated according to the image description information, so that the description accuracy of the obtained sample text on the sample image can be ensured, the description richness of the obtained sample text is effectively improved based on the image description information of different dimensions, and the description effect of the sample text on the sample image is effectively improved.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 2, the sample text generating method includes:

s201: an initial text corresponding to the sample image is obtained.

For the description of S201, reference may be made to the foregoing embodiments, which are not described herein again.

S202: at least one description granularity of the content dimension is determined, wherein the description granularity represents a level of fineness of a description of the sample image.

The description granularity may be used to determine a level of fineness of describing the sample image, and may be, for example, a description subject and a description detail, where the description granularity of the description subject is greater than that of the description detail.

In the embodiment of the disclosure, the content dimension of the sample image may be described in the initial text based on a plurality of fine degree levels, and when at least one description granularity of the content dimension is determined, reliable execution basis can be provided for the subsequent determination of the first description information of the sample image.

S203: the initial text is processed according to at least one description granularity to determine first description information for the sample image.

The first description information may be description information describing the sample image based on the content dimension.

In this embodiment of the disclosure, when the initial text is processed according to at least one description granularity to determine the first description information of the sample image, multiple pieces of content description information for a content dimension in the initial text may be obtained, reference description granularities corresponding to the multiple pieces of content description information are respectively determined, and then the content description information with the reference description granularity being the same as the description granularity is used as the first description information, or a description information identification model corresponding to the description granularity may also be obtained to determine the first description information of the sample image from the initial text, which is not limited.

Optionally, in some embodiments, when the initial text is processed according to the at least one description granularity to determine the first description information of the sample image, a first matching result between the description granularity and the at least one content description information may be determined, and the first description information is determined from the at least one content description information according to the first matching result, so that the degree of matching between the obtained first description information and the corresponding description granularity may be effectively improved based on the first matching result between the determined description granularity and the at least one content description information, thereby ensuring the reliability of the obtained first description information.

The first matching result is a matching result between the level of fineness of the description of the sample image by the content description information and the description granularity.

In the embodiment of the present disclosure, when determining the first description information from the at least one content description information according to the first matching result, a matching condition may be preset, and then the first description information is determined from the at least one content description information according to the matching condition and the first matching result, or the first description information may be determined from the at least one content description information based on a number-form combination method, which is not limited to this.

Optionally, in some embodiments, the first matching result includes: when the first description information is determined from the at least one content description information according to the first matching result, the content description information corresponding to the maximum matching value in the at least one matching value may be used as the first description information, and therefore, the accuracy of the obtained first description information may be effectively improved.

The matching value may be used to indicate a matching degree level between the description granularity and the content description information, and a value range of the matching degree level may be, for example, 0 to 1.

For example, the at least one piece of content description information may include content z, content x, and content c, and the matching values of the description granularity 1 and the content z, the content x, and the content c are 0.5, 0.8, and 0.2, respectively, then the matching value corresponding to the content x is the largest, and the content x may be taken as the first description information.

That is, after the initial text corresponding to the sample image is obtained, at least one description granularity of the content dimension may be determined, where the description granularity represents a level of fineness of description of the sample image, and the initial text is processed according to the at least one description granularity to determine the first description information of the sample image, so that the first description information corresponding to the description granularity may be accurately obtained based on the at least one description granularity, thereby ensuring applicability of the obtained first description information.

S204: and generating a sample text corresponding to the sample image according to the image description information.

For the description of S204, reference may be made to the foregoing embodiments specifically, and details are not repeated here.

In this embodiment, by determining at least one description granularity of a content dimension, where the description granularity represents a level of fineness of a description on a sample image, and processing an initial text according to the at least one description granularity to determine first description information of the sample image, the first description information corresponding to the description granularity can be accurately obtained based on the at least one description granularity, so as to ensure applicability of the obtained first description information. The first description information is determined from the at least one content description information according to the first matching result by determining the first matching result of the description granularity and the at least one content description information, so that the matching degree between the obtained first description information and the corresponding description granularity can be effectively improved based on the first matching result between the determined description granularity and the at least one content description information, and the reliability of the obtained first description information is ensured. By using the content description information corresponding to the maximum matching value in the at least one matching value as the first description information, the accuracy of the obtained first description information can be effectively improved.

Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure.

As shown in fig. 3, the sample text generating method includes:

s301: an initial text corresponding to the sample image is obtained.

For the description of S301, reference may be made to the above embodiments, which are not described herein again.

S302: and determining the attribute to be described of the sample image.

The attribute to be described refers to the attribute of the sample image required to be described in the sample text. Such as, without limitation, the genre, type, artist, etc. of the sample image.

It can be understood that the sample image has multiple attributes, and when the attribute to be described of the sample image is determined, an accurate execution basis can be provided for subsequently acquiring the second description information.

S303: and acquiring second description information according to the attribute to be described.

The second description information is description information for describing the sample image based on the attribute dimension. For example, may be an L-type hyperrealistic drawing of artist H.

Optionally, in some embodiments, when second description information is obtained according to the attribute to be described, at least one reference image may be obtained, where the attribute to be described of the reference image has a corresponding reference attribute value, a second matching result between the reference image and the sample image is determined according to the attribute to be described, if the second matching result satisfies a preset condition, the reference attribute value is used as the second description information, and if the second matching result does not satisfy the preset condition, text information corresponding to the attribute to be described of the sample image is generated as the second description information, so that the obtaining efficiency of the second description information can be improved by effectively combining the reference attribute value of the reference image, and when the second matching result does not satisfy the preset condition, text information corresponding to the attribute to be described of the sample image is generated as the second description information, so that the generation effect of the sample text is effectively prevented from being affected by the absence of the corresponding second description information of the sample image, and the robustness of the sample text generation process can be effectively improved.

The reference image may be a sample image corresponding to the initial text and containing a reference attribute value of the attribute to be described.

That is to say, the number of the sample images in the embodiment of the present disclosure may be multiple, and when the second description information is obtained, the sample image having the second description information may be used as a reference image, and the reference attribute value may be used as the second description information according to a matching result between the reference image and the sample image.

Optionally, in some embodiments, when determining the second matching result between the reference image and the sample image according to the attribute to be described, a first feature vector corresponding to the reference image may be generated according to the attribute to be described, a second feature vector corresponding to the sample image may be generated according to the attribute to be described, a clustering distance value between the first feature vector and the second feature vector is determined, and the clustering distance value is used as the second matching result; the clustering distance value is a distance value between the first characteristic vector and the second characteristic vector under the condition that the first characteristic vector is a clustering center, so that the obtained clustering distance value can accurately and vividly represent a matching result between the reference image and the sample image aiming at the attribute to be described, and the reliability of the obtained second matching result is effectively improved.

That is to say, after the initial text corresponding to the sample image is acquired, the attribute to be described of the sample image may be determined, and the second description information may be acquired according to the attribute to be described, so that the second description information required by the sample text may be accurately acquired based on the attribute to be described, and acquisition of redundant information may be avoided, thereby effectively improving the practicability of the acquired second description information.

S304: and generating a sample text corresponding to the sample image according to the image description information.

For the description of S304, reference may be made to the foregoing embodiments, and details are not repeated herein.

In this embodiment, the attribute to be described of the sample image is determined, and the second description information is obtained according to the attribute to be described, so that the second description information required by the sample text can be accurately obtained based on the attribute to be described, and redundant information is prevented from being obtained, so that the practicability of the obtained second description information is effectively improved. The method comprises the steps of obtaining at least one reference image, wherein the attribute to be described of the reference image has a corresponding reference attribute value, determining a second matching result of the reference image and a sample image according to the attribute to be described, taking the reference attribute value as second description information if the second matching result meets a preset condition, and generating text information corresponding to the attribute to be described of the sample image as the second description information if the second matching result does not meet the preset condition, so that the reference attribute value of the reference image can be effectively combined to improve the obtaining efficiency of the second description information, and when the second matching result does not meet the preset condition, the text information corresponding to the attribute to be described of the sample image is generated as the second description information, so that the problem that the sample image lacks the corresponding second description information to influence the generation effect of the sample text is effectively avoided, and the robustness of the sample text generation process can be effectively improved. Generating a first feature vector corresponding to the reference image according to the attribute to be described, generating a second feature vector corresponding to the sample image according to the attribute to be described, determining a clustering distance value between the first feature vector and the second feature vector, and taking the clustering distance value as a second matching result; the clustering distance value is a distance value between the first characteristic vector and the second characteristic vector under the condition that the first characteristic vector is a clustering center, so that the obtained clustering distance value can accurately and vividly represent a matching result between the reference image and the sample image aiming at the attribute to be described, and the reliability of the obtained second matching result is effectively improved.

Fig. 4 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 4, the sample text generating method includes:

s401: an initial text corresponding to the sample image is obtained.

S402: determining image description information of the sample image according to the initial text, wherein the image description information comprises: the image processing device comprises first description information and/or second description information, wherein the first description information describes the sample image based on the content dimension, and the second description information describes the sample image based on the attribute dimension.

For the description of S401 and S402, reference may be made to the above embodiments, which are not described herein again.

S403: a structured template is obtained.

The structured template refers to a template used for distinguishing functions of different parts in the text.

For example, the structured template has "fillable contents" and "() optional contents. The structured template may be a plain text template, for example:

[ picture contents subject description ], [ picture contents detail description ] (, artist: [ name ]) (, type: [ name ]) (, genre: [ name ])

The pure text template is intuitive and convenient to understand in the using process.

Alternatively, the structured template may also be a special mark template comprising special marks:

[ picture contents subject description ], [ picture contents detail description ] (, < a > [ name ]) (, < t > [ name ]) (, < s > [ name ])

Wherein < a >, < t >, < s > respectively represent the special lemmas that mark the artist, genre, and style in the text model.

The special mark template borrows the sensitivity of the language model to the special mark, and is easier for the model to learn. Besides the above lemmas, special lemmas can be added according to other additional information, which is not limited.

S404: and identifying the sample text from the first description information and/or the second description information according to the structured template.

In the embodiment of the present disclosure, when the sample text is identified from the first description information and/or the second description information according to the structured template, the first description information, and/or the second description information may be input into a pre-trained machine learning model to obtain the sample text, or the sample text may also be identified from the first description information and/or the second description information according to the structured template by using an engineering or mathematical method, which is not limited.

Optionally, in some embodiments, the structured template comprises: and when the sample text is identified from the first description information and/or the second description information according to the structured template, identifying the target description information corresponding to the target identifier from the first description information and/or the second description information, aggregating the at least one target description information, and using the aggregated description information as the sample text, so that the target description information corresponding to the target identifier can be accurately and quickly identified from the first description information and/or the second description information based on the target identifier, and the obtained at least one target description information is aggregated, thereby effectively improving the structured degree of the obtained sample text.

The target identifier refers to an identifier used for identifying target description information in the structured template.

The object description information refers to description information corresponding to the object identifier in the first description information and/or the second description information.

That is to say, according to the embodiment of the present disclosure, after the image description information of the sample image is determined according to the initial text, the structured template may be obtained, and according to the structured template, the sample text is identified and obtained from the first description information and/or the second description information, so that the obtaining efficiency and the structuring degree of the sample text may be effectively improved based on the structured template, so as to effectively improve the practicability of the obtained sample text.

In this embodiment, the sample text is identified and obtained from the first description information and/or the second description information by obtaining the structured template according to the structured template, so that the obtaining efficiency and the structured degree of the sample text can be effectively improved based on the structured template, and the practicability of the obtained sample text is effectively improved. By identifying the target description information corresponding to the target identification from the first description information and/or the second description information, aggregating at least one piece of target description information, and using the aggregated description information as the sample text, the target description information corresponding to the target identification can be accurately and quickly identified from the first description information and/or the second description information based on the target identification, and the obtained at least one piece of target description information is aggregated, so that the structuralization degree of the obtained sample text can be effectively improved.

For example, as shown in fig. 5, fig. 5 is a schematic diagram of a generation process of a sample text according to an embodiment of the present disclosure, and a generation flow of the sample text is as follows:

acquiring original training data, wherein the original training data comprises: sample images with (e.g., impressive and hyperrealistic) and without) grid information;

taking a sample image with style information as a clustering center, clustering the expression vectors of the pictures, regarding the distance between the clustered pictures and the clustering center within a certain threshold value as the pictures with the style, and if the text description of the pictures does not have the style, adding the keywords of the style into the text description;

acquiring an original description (star space) of a sample image, namely the initial text;

adding the cluster center style words (impressionism) in the original description;

the "starry sky" and "impression" are nested into a structured template (either a plain text template or a specially labeled template) to obtain sample text.

In the embodiment of the disclosure, the text analysis can be automatically performed on the picture description input by the user, the description of the picture content and the style information is split from the text description, the picture content and the style information are structured by using the template, and then the model is input.

Fig. 6 is a schematic diagram according to a fifth embodiment of the present disclosure.

As shown in fig. 6, the sample text generating apparatus 60 includes:

an obtaining module 601, configured to obtain an initial text corresponding to a sample image;

a determining module 602, configured to determine image description information of the sample image according to the initial text, where the image description information is used to describe the sample image based on the content dimension and/or the attribute dimension; and

a generating module 603, configured to generate a sample text corresponding to the sample image according to the image description information.

In some embodiments of the present disclosure, as shown in fig. 7, fig. 7 is a schematic diagram according to a sixth embodiment of the present disclosure, and the sample text generating apparatus 70 includes: an obtaining module 701, a determining module 702, and a generating module 703, where the image description information includes: first description information describing the sample image based on the content dimension;

the determining module 702 includes:

a first determining sub-module 7021 configured to determine at least one description granularity of the content dimension, where the description granularity represents a level of fineness of a description of the sample image;

a second determining sub-module 7022 is configured to process the initial text according to the at least one description granularity to determine the first description information of the sample image.

In some embodiments of the present disclosure, the initial text comprises: at least one content description information; the second determining sub-module 7022 is specifically configured to:

determining a first matching result of the description granularity and the at least one content description information;

and determining first description information from at least one content description information according to the first matching result.

In some embodiments of the present disclosure, the first matching result comprises: at least one matching value indicating a level of matching between the description granularity and the content description information; wherein the second determining submodule 7022 is further configured to:

and taking the content description information corresponding to the maximum matching value in the at least one matching value as the first description information.

In some embodiments of the disclosure, the description information includes: second description information describing the sample image based on the attribute dimension;

the determining module 702 further includes:

a third determining submodule 7023, configured to determine an attribute to be described of the sample image;

the first obtaining sub-module 7024 is configured to obtain the second description information according to the attribute to be described.

In some embodiments of the present disclosure, the first obtaining sub-module 7024 is specifically configured to:

acquiring at least one reference image, wherein the attribute to be described of the reference image has a corresponding reference attribute value;

determining a second matching result of the reference image and the sample image according to the attribute to be described;

if the second matching result meets the preset condition, the reference attribute value is used as second description information;

and if the second matching result does not meet the preset condition, generating text information corresponding to the attribute to be described of the sample image as second description information.

In some embodiments of the present disclosure, among others, the first obtaining sub-module 7024 is further configured to:

generating a first feature vector corresponding to the reference image according to the attribute to be described;

generating a second feature vector corresponding to the sample image according to the attribute to be described;

determining a clustering distance value between the first characteristic vector and the second characteristic vector, and taking the clustering distance value as a second matching result; and the clustering distance value is the distance value between the first characteristic vector and the second characteristic vector under the condition that the first characteristic vector is the clustering center.

In some embodiments of the present disclosure, the image description information includes: the first description information describes the sample image based on the content dimension, and/or the second description information describes the sample image based on the attribute dimension;

the generating module 703 includes:

a second obtaining submodule 7031, configured to obtain a structured template;

and the identifying sub-module 7032 is configured to identify the sample text from the first description information and/or the second description information according to the structured template.

In some embodiments of the present disclosure, the structured template comprises: at least one target identification; the identifier module 7032 is specifically configured to:

identifying target description information corresponding to the target identification from the first description information and/or the second description information;

and aggregating at least one target description information, and taking the description information obtained by aggregation as a sample text.

It is understood that the sample text generating device 70 in fig. 7 of this embodiment may have the same functions and structures as the sample text generating device 60, the obtaining module 701, the obtaining module 601, the determining module 702, the determining module 602, and the generating module 703 in the above embodiments.

It should be noted that the foregoing explanation of the sample text generation method is also applicable to the sample text generation apparatus of the present embodiment.

In this embodiment, the initial text corresponding to the sample image is acquired, the image description information of the sample image is determined according to the initial text, wherein the image description information is used for describing the sample image based on the content dimension and/or the attribute dimension, and the sample text corresponding to the sample image is generated according to the image description information.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 8 shows a schematic block diagram of an example electronic device that may be used to implement the sample text generation method of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, for example, performs a sample text generation method. For example, in some embodiments, performing the sample text generation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, a computer program may perform one or more of the steps of the method of performing sample text generation described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the sample text generation method in any other suitable manner (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A sample text generation method, comprising:

acquiring an initial text corresponding to the sample image;

determining image description information of the sample image according to the initial text, wherein the image description information is used for describing the sample image based on a content dimension and/or an attribute dimension; and

and generating a sample text corresponding to the sample image according to the image description information.

2. The method of claim 1, the image description information comprising: first description information describing the sample image based on the content dimension;

wherein the determining image description information of the sample image according to the initial text comprises:

determining at least one description granularity of the content dimension, wherein the description granularity represents a level of fineness at which the sample image is described;

processing the initial text according to the at least one description granularity to determine first description information of the sample image.

3. The method of claim 2, the initial text comprising: at least one content description information; wherein the processing the initial text according to the at least one description granularity to determine first description information of the sample image comprises:

and determining the first description information from the at least one content description information according to the first matching result.

4. The method of claim 3, the first match result comprising: at least one match value indicating a level of match between the description granularity and the content description information; wherein the determining the first description information from the at least one content description information according to the first matching result includes:

5. The method of claim 1, the description information comprising: second description information describing the sample image based on the attribute dimension;

determining the attribute to be described of the sample image;

and acquiring the second description information according to the attribute to be described.

6. The method according to claim 5, wherein the obtaining the second description information according to the attribute to be described includes:

if the second matching result meets a preset condition, taking the reference attribute value as the second description information;

and if the second matching result does not meet the preset condition, generating text information corresponding to the attribute to be described of the sample image as the second description information.

7. The method of claim 6, wherein the determining a second matching result of the reference image and the sample image according to the attribute to be described comprises:

determining a clustering distance value between the first feature vector and the second feature vector, and taking the clustering distance value as the second matching result; wherein the clustering distance value is a distance value between the first feature vector and the second feature vector when the first feature vector is a clustering center.

8. The method of claim 1, the image description information comprising: first description information and/or second description information, the first description information describing the sample image based on the content dimension, the second description information describing the sample image based on the attribute dimension;

wherein the generating a sample text corresponding to the sample image according to the image description information includes:

obtaining a structured template;

and identifying the sample text from the first description information and/or the second description information according to the structured template.

9. The method of claim 8, the structured template comprising: at least one target identification; wherein the identifying the sample text from the first description information and/or the second description information according to the structured template includes:

and aggregating at least one piece of target description information, and taking the description information obtained by aggregation as the sample text.

10. A sample text generation apparatus, comprising:

the acquisition module is used for acquiring an initial text corresponding to the sample image;

a determining module, configured to determine image description information of the sample image according to the initial text, where the image description information is used to describe the sample image based on a content dimension and/or an attribute dimension; and

and the generating module is used for generating a sample text corresponding to the sample image according to the image description information.

11. The apparatus of claim 10, the image description information comprising: first description information describing the sample image based on the content dimension;

wherein the determining module comprises:

a first determining sub-module for determining at least one description granularity of the content dimension, wherein the description granularity represents a level of fineness at which the sample image is described;

a second determining sub-module, configured to process the initial text according to the at least one description granularity to determine first description information of the sample image.

12. The apparatus of claim 11, the initial text comprising: at least one content description information; wherein the second determining submodule is specifically configured to:

13. The apparatus of claim 12, the first match result comprising: at least one match value indicating a level of match between the description granularity and the content description information; wherein the second determining submodule is further configured to:

14. The apparatus of claim 10, the description information comprising: second description information describing the sample image based on the attribute dimension;

wherein the determining module further comprises:

the third determining submodule is used for determining the attribute to be described of the sample image;

and the first obtaining submodule is used for obtaining the second description information according to the attribute to be described.

15. The apparatus according to claim 14, wherein the first obtaining submodule is specifically configured to:

16. The apparatus of claim 15, wherein the first acquisition submodule is further configured to:

17. The apparatus of claim 10, the image description information comprising: first description information and/or second description information, the first description information describing the sample image based on the content dimension, the second description information describing the sample image based on the attribute dimension;

wherein the generating module comprises:

the second acquisition submodule is used for acquiring the structured template;

and the identification submodule is used for identifying and obtaining the sample text from the first description information and/or the second description information according to the structured template.

18. The apparatus of claim 17, the structured template comprising: at least one target identification; wherein, the identification submodule is specifically configured to:

and aggregating at least one piece of target description information, and taking the aggregated description information as the sample text.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1-9.