CN111897950A

CN111897950A - Method and apparatus for generating information

Info

Publication number: CN111897950A
Application number: CN202010744729.5A
Authority: CN
Inventors: 周旻平
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-07-29
Filing date: 2020-07-29
Publication date: 2020-11-06

Abstract

The embodiment of the disclosure discloses a method and a device for generating information. One embodiment of the method comprises: acquiring a text and an image associated with the information for target presentation as a target text and a target image respectively; inputting a target text and a target image into a pre-trained feature extraction model to obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image; and generating a target feature vector corresponding to the information for target presentation based on the text feature vector and the image feature vector. This embodiment provides support for subsequent processing of the information for presentation; moreover, more information features can be extracted, and the extracted features are beneficial to representing the content of the information for presentation more accurately and effectively.

Description

Method and apparatus for generating information

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for generating information.

Background

With the development of science and technology, users can browse information for presentation by using electronic devices such as mobile phones and computers. The information for presentation may be information for presentation to the user (e.g., advertisement, news, etc.), and may specifically include text, images, and the like.

In practice, there are many scenarios where it is necessary to extract the features of the information for presentation, for example, in order to implement personalized information recommendation, it is necessary to extract the features of the information for presentation so as to match the features of the user.

Conventionally, it is common to extract, as a feature of presentation information, a feature of a keyword in text included in the presentation information.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatuses for generating information.

In a first aspect, an embodiment of the present disclosure provides a method for generating information, the method including: acquiring a text and an image associated with the information for target presentation as a target text and a target image respectively; inputting a target text and a target image into a pre-trained feature extraction model to obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image; and generating a target feature vector corresponding to the information for target presentation based on the text feature vector and the image feature vector.

In some embodiments, generating a target feature vector corresponding to the target presentation information based on the text feature vector and the image feature vector comprises: and combining the text feature vector and the image feature vector to obtain a target feature vector corresponding to the information for target presentation.

In some embodiments, the feature extraction model is obtained by training: acquiring a preset information set for sample presentation; for sample presentation information in the sample presentation information set, extracting a sample text and a sample image which are associated with the sample presentation information to form a first sample image-text group; determining sample presentation information corresponding to the sample presentation information as candidate sample presentation information from other sample presentation information except the sample presentation information; forming a second sample image-text group by the sample image related to the sample presenting information and the sample text related to the candidate sample presenting information corresponding to the sample presenting information; acquiring an initial image-text matching model, wherein the initial image-text matching model comprises an initial feature extraction model and an initial similarity calculation model, the initial feature extraction model is used for extracting a text feature vector of an input text and an image feature vector of an input image, and the initial calculation model is used for performing similarity calculation on the input text feature vector and the input image feature vector; and training the initial image-text matching model by taking the formed first sample image-text group as a positive sample and the formed second sample image-text group as a negative sample to obtain a trained image-text matching model, wherein the trained image-text matching model comprises a trained feature extraction model.

In some embodiments, training the initial teletext matching model using the composed first sample teletext group as positive samples and the composed second sample teletext group as negative samples comprises: respectively inputting the positive sample and the negative sample into the initial image-text matching model to obtain a first similarity pre-estimated value corresponding to the positive sample and a second similarity pre-estimated value corresponding to the negative sample; performing difference calculation on the first similarity estimated value and the second similarity estimated value to obtain estimated positive and negative sample difference values; determining a loss value of the estimated positive and negative sample difference value relative to a preset positive and negative sample difference value; and training the initial image-text matching model by using the determined loss value.

In some embodiments, determining, as the candidate sample presentation information, the sample presentation information corresponding to the sample presentation information from the sample presentation information other than the sample presentation information includes: respectively determining the similarity between the sample texts related to the information for presenting other samples and the sample texts related to the information for presenting the samples; extracting the minimum similarity from the determined similarities as a target similarity; and determining the information for presenting other samples corresponding to the target similarity as the information for presenting candidate samples corresponding to the information for presenting sample presentation.

In some embodiments, the text associated with the target presentation information includes at least one of: title text, body text, landing page text.

In some embodiments, the method further comprises: acquiring user information of a target user; determining whether the user information matches the target presentation information based on the obtained target feature vector; in response to determining that the user information matches the target presentation information, pushing the target presentation information to the target user.

In a second aspect, an embodiment of the present disclosure provides an apparatus for generating information, the apparatus including: a first acquisition unit configured to acquire a text and an image associated with the target presentation information as a target text and a target image, respectively; the input unit is configured to input the target text and the target image into a pre-trained feature extraction model, and obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image; and a generating unit configured to generate a target feature vector corresponding to the information for target presentation based on the text feature vector and the image feature vector.

In some embodiments, the generating unit is further configured to: and combining the text feature vector and the image feature vector to obtain a target feature vector corresponding to the information for target presentation.

In some embodiments, the apparatus further comprises: a second acquisition unit configured to acquire user information of a target user; a determination unit configured to determine whether the user information matches the target presentation information based on the obtained target feature vector; a pushing unit configured to push the target presentation information to the target user in response to determining that the user information matches the target presentation information.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for generating information described above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which when executed by a processor, implements the method of any of the above-described methods for generating information.

According to the method and the device for generating the information, the text and the image related to the target presenting information are respectively used as the target text and the target image, the target text and the target image are input into a pre-trained feature extraction model, the text feature vector corresponding to the target text and the image feature vector corresponding to the target image are obtained, and finally the target feature vector corresponding to the target presenting information is generated based on the text feature vector and the image feature vector, so that the features of the presenting information can be effectively extracted, the subsequent processing of classifying, recommending and the like on the presenting information by using the extracted features is facilitated, and the support is provided for the subsequent processing of the presenting information; moreover, the method and the device can determine the characteristics of the presenting information based on the text characteristics of the text associated with the presenting information and the image characteristics of the associated image, and can extract more information characteristics compared with the scheme of extracting the characteristics of the keywords in the text included in the presenting information as the characteristics of the presenting information in the prior art, thereby being beneficial to utilizing the extracted characteristics, representing the content of the presenting information more accurately and effectively and further being beneficial to carrying out more accurate and effective information processing on the presenting information subsequently.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of a method for generating information, according to the present disclosure;

FIG. 3 is a schematic diagram of one application scenario of a method for generating information in accordance with an embodiment of the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method for generating information according to the present disclosure;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the disclosed method for generating information or apparatus for generating information may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various client applications installed thereon, such as a web browser application, a shopping-like application, a search-like application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices, including but not limited to smart phones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as an information push server that pushes information for presentation to the

terminal apparatuses

101, 102, 103. The information push server may obtain the target presentation information, analyze data such as the target presentation information, and obtain a processing result (e.g., a target feature vector corresponding to the target presentation information). In addition, after the processing result is obtained, the information pushing server can also push the target presenting information to the terminal equipment based on the processing result.

It should be noted that the method for generating information provided by the embodiment of the present disclosure may be executed by the server 105, and may also be executed by the

terminal devices

101, 102, and 103 (for example, may be executed by a terminal device that presents information for target presentation or information for target presentation to be presented); accordingly, the means for generating information may be provided in the server 105, or may be provided in the

terminal devices

101, 102, 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where data used in generating the target feature vector corresponding to the target presentation information does not need to be acquired from a remote location, the system architecture may include only a terminal device or a server without a network.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating information in accordance with the present disclosure is shown. The method for generating information comprises the following steps:

step 201, acquiring a text and an image associated with the target presenting information as a target text and a target image respectively.

In this embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for generating information may acquire a text and an image associated with target presentation information from a remote or local place as a target text and a target image, respectively, by a wired connection manner or a wireless connection manner. Specifically, the execution body may acquire a text associated with the target presentation information as a target text, and acquire an image associated with the target presentation information as a target image. The target presentation information may be presentation information whose features are to be extracted. The presentation information may be information for presentation to the user, e.g., the targeted presentation information may be advertisements, news, etc. for presentation to the user. Specifically, the information for target presentation may include various types of information, for example, the information for target presentation may include text, images, audio, video, and the like.

In this embodiment, the information for target presentation may previously associate at least one text and at least one image, and further, the execution main body may acquire the text associated with the information for target presentation as the target text and the image associated with the information for target presentation as the target image.

In this embodiment, the text associated with the target presentation information may be various texts having an association relationship with the target presentation information. As an example, the text associated with the target presentation information may be a request text corresponding to the target presentation information. The request text may be text input by the user for requesting information for presentation of the target, for example, "how to do insomnia".

In some optional implementations of this embodiment, the text associated with the target presentation information may include, but is not limited to, at least one of: title text, body text, landing page text.

In this implementation, the target presentation information may include a text, and the title text may be a text corresponding to a title of the text included in the target presentation information; the body text may be text corresponding to the body of text included in the target presentation information.

In this implementation, the information for target presentation may also be used for user clicks. The landing page text is a text included in a page presented to the user after the user clicks the target presentation information.

It can be understood that the text in the implementation manner can indicate the content presented by the target presentation information, and further, the associated text in the implementation manner is helpful for more accurately and effectively extracting the features of the target presentation information in the subsequent steps.

In the present embodiment, the image associated with the target presentation information may be various images having an association relationship with the target presentation information. As an example, the image associated with the target presentation information may be an image included in the target presentation information, or may be an image included in a landing page obtained by clicking the target presentation information.

Step 202, inputting the target text and the target image into a pre-trained feature extraction model, and obtaining a text feature vector corresponding to the target text and an image feature vector corresponding to the target image.

In this embodiment, based on the target text and the target image obtained in step 201, the executing entity may input the target text and the target image into a feature extraction model trained in advance, and obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image. The text feature vector corresponding to the target text can be used for indicating the feature of the target text. The image feature vector corresponding to the target image may be used to indicate features of the target image. The feature extraction model can be used for representing the corresponding relation between text feature vectors corresponding to texts and the corresponding relation between image feature vectors corresponding to images.

In this embodiment, the feature extraction model may be a machine learning model, and may be specifically obtained by training in various ways. As an example, a sample text and a sample text feature vector pre-labeled for the sample text may be obtained, a sample image and a sample image feature vector pre-labeled for the sample image may be obtained, and then a supervised learning manner is adopted to train and obtain the feature extraction model by using the sample text, the labeled sample text feature vector, and the sample image and the labeled sample image feature vector.

In some optional implementations of this embodiment, the feature extraction model may be obtained by the execution subject or other electronic device through the following training steps:

first, a preset sample presentation information set is acquired.

Wherein, the information set for sample presentation may include a plurality of information for sample presentation. The sample presentation information may be presentation information used to train the model. Each sample presentation information in the set of sample presentation information may associate a sample text and a sample image. The sample text associated with the sample presentation information may be any of various texts having an association relationship with the sample presentation information. Such as header text, body text, landing page text, etc. The sample image associated with the sample presentation information may be any of various images having an association relationship with the sample presentation information. For example, an image included in the sample presentation information, an image included in a landing page corresponding to the sample presentation information, and the like.

Then, for (every or a certain) sample presentation information in the sample presentation information set, extracting a sample text and a sample image which are related to the sample presentation information to form a first sample graph-text group; determining sample presentation information corresponding to the sample presentation information as candidate sample presentation information from other sample presentation information except the sample presentation information; and forming a second sample image-text group by the sample image related to the sample presentation information and the sample text related to the candidate sample presentation information corresponding to the sample presentation information.

Wherein the sample teletext group refers to a combination of a sample image and a sample text.

Here, the candidate sample presentation information corresponding to the sample presentation information may be determined from the other sample presentation information by various methods. For example, one piece of sample presentation information may be randomly determined as candidate sample presentation information corresponding to the sample presentation information from the other pieces of sample presentation information other than the sample presentation information in the sample presentation information set.

In some optional implementations of this embodiment, the executing entity or other electronic device may determine the candidate sample presentation information corresponding to the sample presentation information by: first, the similarity between the sample text associated with the other sample presentation information and the sample text associated with the sample presentation information is determined. Then, the minimum similarity is extracted from the determined similarities as a target similarity. Finally, the information for presenting other samples corresponding to the target similarity is determined as the information for presenting candidate samples corresponding to the information for presenting sample.

As an example, the sample presentation information set includes sample presentation information a, sample presentation information B, and sample presentation information C. For the information a for sample presentation, it may be determined that the sample text a1 associated with the information a for sample presentation has a similarity of 0.8 to the sample text B1 associated with the information B for sample presentation; it is also determined that the sample text a1 associated with the sample presentation information a has a similarity of 0.9 to the sample text C1 associated with the sample presentation information C. Since "0.8" is smaller than "0.9", the sample presentation information B corresponding to "0.8" can be used as the candidate sample presentation information corresponding to the sample presentation information a.

In the implementation manner, the sample presentation information with the largest difference between the sample text associated with the other sample presentation information and the sample text associated with the sample presentation information can be used as the candidate sample presentation information corresponding to the sample presentation information, and since the content of the sample text representation associated with the sample presentation information is similar to the content of the sample image representation associated with the sample presentation information, the larger the difference between the sample text associated with the other sample presentation information and the sample text associated with the sample presentation information is, the larger the difference between the represented content and the content represented by the sample image associated with the sample presentation information is, so that the difference between the sample text in the second sample image group and the content represented by the sample image can be increased, thereby facilitating obtaining a more effective negative sample and further facilitating utilizing a more effective negative sample, and a more accurate and effective feature extraction model is obtained through training.

Then, an initial image-text matching model is obtained.

Specifically, the initial image-text matching model may be an untrained model, or may be a trained but untrained model. And training the initial image-text matching model to obtain the image-text matching model. A teletext matching model may be used to determine whether the entered text and image match. Here, matching may refer to the content of the tokens being similar or identical.

In this implementation, the initial teletext matching model may comprise various types of models, for example, a deep learning model (e.g., a residual network), a language model (e.g., a GPT (general Pre-Training) model), or both a deep learning model and a language model. The initial teletext matching model may comprise an initial feature extraction model and an initial similarity calculation model. The initial feature extraction model may be connected to the initial similarity calculation model. The initial feature extraction model may be used to extract text feature vectors of the entered text and image feature vectors of the entered image. The initial similarity calculation model can be used for receiving the text feature vectors and the image feature vectors extracted by the initial feature extraction model and calculating the similarity of the text feature vectors and the image feature vectors.

And finally, training the initial image-text matching model by taking the formed first sample image-text group as a positive sample and the formed second sample image-text group as a negative sample to obtain a trained image-text matching model, wherein the trained image-text matching model comprises a trained feature extraction model.

It will be appreciated that the content of the sample text and the sample image representations associated with the same sample presentation information are generally similar, while the content of the sample text and the sample image representations associated with different sample presentation information are generally different. Therefore, the implementation manner can train the initial image-text matching model by taking a first sample image-text group consisting of the sample text and the sample image associated with the same sample presentation information as a positive sample and taking a second sample image-text group consisting of the sample text and the sample image associated with different sample presentation information as a negative sample.

It should be noted that the initial image-text matching model is trained to obtain a trained image-text matching model, and the trained image-text matching model may include a trained feature extraction model and a trained similarity calculation model. And then, a feature extraction model after training can be obtained by training the image-text matching model.

Specifically, in this implementation, based on the positive and negative samples of the composition, the execution subject may train the initial image-text matching model by using various methods. As an example, the desired similarity for positive samples may be set to 1 and the desired similarity for negative samples may be set to 0. When the positive sample is used for training, the positive sample can be input into an initial feature extraction model of an initial image-text matching model, a text feature vector corresponding to a sample text in the positive sample and an image feature vector corresponding to a sample image are obtained, the text feature vector and the image feature vector are input into an initial similarity calculation model, actual similarity corresponding to the positive sample is obtained, a difference is obtained between the actual similarity corresponding to the positive sample and the expected similarity '1' to obtain a loss value corresponding to the positive sample, and the initial image-text matching model is trained by using the loss value corresponding to the positive sample. Similarly, when training is performed by using the negative sample, the negative sample can be input into the initial image-text matching model to obtain the actual similarity corresponding to the negative sample, the difference between the actual similarity corresponding to the negative sample and the expected similarity "0" is obtained to obtain the loss value corresponding to the negative sample, and the initial image-text matching model is trained by using the loss value corresponding to the negative sample.

Here, training the initial graph-text matching model using the loss value means: and adjusting parameters of the initial image-text matching model by using the loss value. The direction of the specific parameter adjustment may be a direction in which the loss value is decreased. It should be noted that, adjusting the parameters of the model by using the loss value is a well-known technique that is widely researched and applied at present, and is not described herein again.

The realization mode can generate a positive sample and a negative sample by constructing the image-text group, and further can obtain a feature extraction model by the positive sample and the negative sample through unsupervised learning mode training, so that the diversity of the feature extraction model generation is improved; compared with a mode of training a model by using a text feature vector and an image feature vector labeled by a labeling person, the method can reduce the influence of human factors and is beneficial to training to obtain a more accurate feature extraction model.

In some optional implementations of this embodiment, based on the positive and negative examples of the composition, the executing subject or other electronic device may train the initial graph-text matching model by: firstly, inputting a positive sample and a negative sample into an initial image-text matching model respectively to obtain a first similarity estimated value corresponding to the positive sample and a second similarity estimated value corresponding to the negative sample. And then, carrying out difference calculation on the first similarity estimated value and the second similarity estimated value to obtain estimated positive and negative sample difference values. Then, a loss value of the estimated positive and negative sample difference value relative to the preset positive and negative sample difference value is determined. And finally, training the initial image-text matching model by using the determined loss value.

Specifically, the difference between the estimated positive and negative sample difference and the preset positive and negative sample difference can be obtained, and a loss value of the estimated positive and negative sample difference relative to the preset positive and negative sample difference is obtained.

In contrast, in the above example, the difference between the expected similarity corresponding to the positive sample and the expected similarity corresponding to the negative sample is 1(1 ═ 1-0), and the target of the model training may be to make the actual similarity corresponding to the positive sample close to the expected similarity corresponding to the positive sample; and enabling the actual similarity corresponding to the negative sample to be close to the expected similarity corresponding to the negative sample. In this implementation, the goal of the model training may be to enable the difference (i.e., the estimated difference between the positive and negative samples) between the actual similarity (i.e., the first similarity estimated value) corresponding to the positive sample and the actual similarity (i.e., the second similarity estimated value) corresponding to the negative sample to approach the expected difference (i.e., the preset difference between the positive and negative samples).

The implementation mode provides another method for training the initial image-text matching model based on the positive and negative samples, which is beneficial to improving the flexibility and diversity of model training; in addition, the implementation mode can perform primary training of the initial image-text matching model aiming at a group of positive and negative samples, and compared with the scheme of performing primary training of the initial image-text matching model aiming at each positive sample or negative sample, the implementation mode can reduce the training times and is beneficial to improving the efficiency of model training.

And step 203, generating a target feature vector corresponding to the information for target presentation based on the text feature vector and the image feature vector.

In this embodiment, based on the text feature vector and the image feature vector obtained in step 202, the execution subject may generate a target feature vector corresponding to the target presentation information. The target feature vector corresponding to the target presentation information can be used for indicating the feature of the target presentation information.

Specifically, based on the text feature vector and the image feature vector, the execution body may generate a target feature vector corresponding to the target presentation information by using various methods. For example, the execution subject may select one feature vector from the text feature vector and the image feature vector as a target feature vector corresponding to the target presentation information.

In some optional implementation manners of this embodiment, the execution main body may combine the text feature vector and the image feature vector to obtain a target feature vector corresponding to the target presentation information.

The target feature vector in the implementation manner can simultaneously comprise the text feature of the text associated with the information for target presentation and the image feature of the image associated with the information for target presentation, so that the content of the information for target presentation can be more effectively and comprehensively represented, and the accuracy and the effectiveness of processing the information for presentation can be further improved.

It is understood that, when the feature of the target presentation information is obtained, the target presentation information can be classified, recommended, and the like by using the feature of the target presentation information.

As an example, the target feature vector corresponding to the information for target presentation obtained by the present disclosure may be used in a click rate prediction scenario of the information for presentation, a training learning scenario of a click rate prediction model, a classification scenario of the information for presentation, and the like, which is not limited in this application.

It can be understood that the target feature vector of the present disclosure can more accurately and effectively represent the content of the target presentation information, and further, for any scene that needs to use the feature of the target presentation information, a more accurate and effective result can be obtained by using the target feature vector of the present disclosure.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 3, server 301 may first obtain text (e.g., landing page text) 303 and an image (e.g., included image) 304 associated with information (e.g., advertisement information) 302 for targeted presentation as targeted text 305 and targeted image 306, respectively. Then, the server 301 may input the target text 305 and the target image 306 into the pre-trained feature extraction model 307, and obtain a text feature vector 308 corresponding to the target text 305 and an image feature vector 309 corresponding to the target image 306. Finally, the server 301 may generate a target feature vector 310 corresponding to the target presentation information 302 based on the text feature vector 308 and the image feature vector 309, for example, the server 301 may directly combine the text feature vector 308 and the image feature vector 309 into the target feature vector 310 corresponding to the target presentation information 302.

The method provided by the embodiment of the disclosure can effectively extract the characteristics of the information for presentation, is beneficial to subsequently utilizing the extracted characteristics to classify, recommend and the like the information for presentation, and provides support for the subsequent processing of the information for presentation; moreover, the method and the device can determine the characteristics of the presenting information based on the text characteristics of the text associated with the presenting information and the image characteristics of the associated image, and can extract more information characteristics compared with the scheme of extracting the characteristics of the keywords in the text included in the presenting information as the characteristics of the presenting information in the prior art, thereby being beneficial to utilizing the extracted characteristics, representing the content of the presenting information more accurately and effectively and further being beneficial to carrying out more accurate and effective information processing on the presenting information subsequently.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for generating information is shown. The flow 400 of the method for generating information comprises the steps of:

step 401, acquiring a text and an image associated with the target presenting information as a target text and a target image, respectively.

In this embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for generating information may acquire a text and an image associated with target presentation information from a remote or local place as a target text and a target image, respectively, by a wired connection manner or a wireless connection manner. Specifically, the execution body may acquire a text associated with the target presentation information as a target text, and acquire an image associated with the target presentation information as a target image. The target presentation information may be presentation information whose features are to be extracted. The presentation information may be information for presentation to a user.

In this embodiment, the text associated with the target presentation information may be various texts having an association relationship with the target presentation information. The image associated with the target presentation information may be any of various images having an association relationship with the target presentation information.

Step 402, inputting the target text and the target image into a pre-trained feature extraction model, and obtaining a text feature vector corresponding to the target text and an image feature vector corresponding to the target image.

In this embodiment, based on the target text and the target image obtained in step 401, the executing entity may input the target text and the target image into a feature extraction model trained in advance, and obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image. The text feature vector corresponding to the target text can be used for indicating the feature of the target text. The image feature vector corresponding to the target image may be used to indicate features of the target image. The feature extraction model can be used for representing the corresponding relation between text feature vectors corresponding to texts and the corresponding relation between image feature vectors corresponding to images.

And step 403, generating a target feature vector corresponding to the information for target presentation based on the text feature vector and the image feature vector.

In this embodiment, based on the text feature vector and the image feature vector obtained in step 402, the execution subject may generate a target feature vector corresponding to the target presentation information. The target feature vector corresponding to the target presentation information can be used for indicating the feature of the target presentation information.

Step 401, step 402, and step 403 may be performed in a manner similar to that of step 201, step 202, and step 203 in the foregoing embodiment, respectively, and the above description for step 201, step 202, and step 203 also applies to step 401, step 402, and step 403, and is not described herein again.

Step 404, acquiring user information of a target user;

in this embodiment, the execution subject may obtain user information of the target user. Wherein the target user may be a user to whom information is to be pushed. The user information may be various information of the user, and may be, for example, attribute information (e.g., age, sex, hobbies, etc.), behavior information (e.g., web browsing records), and the like.

It should be noted that, this step may be executed before the

above steps

401, 402, and 403 are executed, or may be executed after the

above steps

401, 402, and 403 are executed, and here, only one execution manner is given, and the present application is not limited thereto.

Step 405, based on the obtained target feature vector, determines whether the user information matches the target presentation information.

In this embodiment, based on the target feature vector obtained in step 403 and the user information obtained in step 404, the execution subject may determine whether the user information matches the target presentation information.

Specifically, the execution agent may match the user information with the target presentation information by using various methods. As an example, the executing body may first extract a user feature vector for characterizing a user feature from the user information, then perform similarity calculation on the user feature vector and the target feature vector, and if the similarity obtained through calculation is greater than or equal to a preset similarity threshold, it may be determined that the user information matches the target presentation information.

It should be noted that the method for extracting the user features from the user information is a well-known technology widely studied and used at present, and is not described herein again.

Step 406, in response to determining that the user information matches the target presentation information, pushing the target presentation information to the target user.

In this embodiment, the execution subject may push the target presentation information to the target user in response to determining that the user information matches the target presentation information.

It should be noted that, besides the above-mentioned contents, the embodiment of the present disclosure may also include the same or similar features and effects as the embodiment corresponding to fig. 2, and no further description is provided herein.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating information in this embodiment highlights a step of determining whether the user information of the target user matches the target presentation information based on the obtained target feature vector after obtaining the target feature vector corresponding to the target presentation information, and further pushing the target presentation information to the target user in response to determining that the user information matches the target presentation information. Therefore, the scheme described in the embodiment can realize more accurate and effective information push based on more accurate and effective target characteristic vectors.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for generating information of the present embodiment includes: a first acquisition unit 501, an input unit 502, and a generation unit 503. Wherein the first obtaining unit 501 is configured to obtain a text and an image associated with the target presentation information as a target text and a target image, respectively; the input unit 502 is configured to input a target text and a target image into a pre-trained feature extraction model, and obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image; the generating unit 503 is configured to generate a target feature vector corresponding to the target presentation information based on the text feature vector and the image feature vector.

In this embodiment, the first acquiring unit 501 of the apparatus 500 for generating information may acquire the text and the image associated with the information for target presentation as the target text and the target image, respectively, from a remote location or a local location by a wired connection or a wireless connection. Specifically, the first acquisition unit 501 may acquire a text associated with the information for target presentation as a target text, and acquire an image associated with the information for target presentation as a target image. The target presentation information may be presentation information whose features are to be extracted. The presentation information may be information for presentation to a user.

In this embodiment, the information for target presentation may previously associate at least one text and at least one image, and further, the first obtaining unit 501 may obtain the text associated with the information for target presentation as the target text and the image associated with the information for target presentation as the target image.

In this embodiment, based on the target text and the target image obtained by the first obtaining unit 501, the input unit 502 may input the target text and the target image into a feature extraction model trained in advance, and obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image. The text feature vector corresponding to the target text can be used for indicating the feature of the target text. The image feature vector corresponding to the target image may be used to indicate features of the target image. The feature extraction model can be used for representing the corresponding relation between text feature vectors corresponding to texts and the corresponding relation between image feature vectors corresponding to images.

In this embodiment, based on the text feature vector and the image feature vector obtained by the input unit 502, the generation unit 503 may generate a target feature vector corresponding to the target presentation information. The target feature vector corresponding to the target presentation information can be used for indicating the feature of the target presentation information.

In some optional implementations of this embodiment, the generating unit 503 may be further configured to: and combining the text feature vector and the image feature vector to obtain a target feature vector corresponding to the information for target presentation.

In some optional implementations of this embodiment, the feature extraction model may be obtained by training through the following steps: acquiring a preset information set for sample presentation; for sample presentation information in the sample presentation information set, extracting a sample text and a sample image which are associated with the sample presentation information to form a first sample image-text group; determining sample presentation information corresponding to the sample presentation information as candidate sample presentation information from other sample presentation information except the sample presentation information; forming a second sample image-text group by the sample image related to the sample presenting information and the sample text related to the candidate sample presenting information corresponding to the sample presenting information; acquiring an initial image-text matching model, wherein the initial image-text matching model comprises an initial feature extraction model and an initial similarity calculation model, the initial feature extraction model is used for extracting a text feature vector of an input text and an image feature vector of an input image, and the initial calculation model is used for performing similarity calculation on the input text feature vector and the input image feature vector; and training the initial image-text matching model by taking the formed first sample image-text group as a positive sample and the formed second sample image-text group as a negative sample to obtain a trained image-text matching model, wherein the trained image-text matching model comprises a trained feature extraction model.

In some optional implementation manners of this embodiment, taking the formed first sample image-text group as a positive sample, taking the formed second sample image-text group as a negative sample, and training the initial image-text matching model includes: respectively inputting the positive sample and the negative sample into the initial image-text matching model to obtain a first similarity pre-estimated value corresponding to the positive sample and a second similarity pre-estimated value corresponding to the negative sample; performing difference calculation on the first similarity estimated value and the second similarity estimated value to obtain estimated positive and negative sample difference values; determining a loss value of the estimated positive and negative sample difference value relative to a preset positive and negative sample difference value; and training the initial image-text matching model by using the determined loss value.

In some optional implementation manners of this embodiment, determining, from the information for sample presentation other than the information for sample presentation, the information for sample presentation corresponding to the information for sample presentation as the information for candidate sample presentation includes: respectively determining the similarity between the sample texts related to the information for presenting other samples and the sample texts related to the information for presenting the samples; extracting the minimum similarity from the determined similarities as a target similarity; and determining the information for presenting other samples corresponding to the target similarity as the information for presenting candidate samples corresponding to the information for presenting sample presentation.

In some optional implementations of this embodiment, the text associated with the target presentation information includes at least one of: title text, body text, landing page text.

In some optional implementations of this embodiment, the apparatus 500 may further include: a second acquisition unit (not shown in the figure) configured to acquire user information of a target user; a determination unit (not shown in the figure) configured to determine whether the user information matches the target presentation information based on the obtained target feature vector; a pushing unit (not shown in the figure) configured to push the target presentation information to the target user in response to determining that the user information matches the target presentation information.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

The device 500 provided by the above embodiment of the present disclosure can effectively extract the features of the information for presentation, is helpful for performing processes such as classification and recommendation on the information for presentation by using the extracted features, and provides support for the subsequent processes of the information for presentation; moreover, the method and the device can determine the characteristics of the presenting information based on the text characteristics of the text associated with the presenting information and the image characteristics of the associated image, and can extract more information characteristics compared with the scheme of extracting the characteristics of the keywords in the text included in the presenting information as the characteristics of the presenting information in the prior art, thereby being beneficial to utilizing the extracted characteristics, representing the content of the presenting information more accurately and effectively and further being beneficial to carrying out more accurate and effective information processing on the presenting information subsequently.

Referring now to FIG. 6, a block diagram of an electronic device (e.g., the server of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a text and an image associated with the information for target presentation as a target text and a target image respectively; inputting a target text and a target image into a pre-trained feature extraction model to obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image; and generating a target feature vector corresponding to the information for target presentation based on the text feature vector and the image feature vector.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit, an input unit, and a generation unit. Here, the names of these units do not constitute a limitation to the unit itself in some cases, and for example, the generation unit may also be described as a "unit that generates a target feature vector corresponding to target presentation information".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for generating information, comprising:

acquiring a text and an image associated with the information for target presentation as a target text and a target image respectively;

inputting the target text and the target image into a pre-trained feature extraction model to obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image;

and generating a target feature vector corresponding to the target presentation information based on the text feature vector and the image feature vector.

2. The method of claim 1, wherein the generating, based on the text feature vector and the image feature vector, a target feature vector to which the target presentation information corresponds comprises:

and combining the text feature vector and the image feature vector to obtain a target feature vector corresponding to the information for target presentation.

3. The method of claim 1, wherein the feature extraction model is obtained by training:

acquiring a preset information set for sample presentation;

for sample presentation information in the sample presentation information set, extracting a sample text and a sample image which are associated with the sample presentation information to form a first sample image-text group; determining sample presentation information corresponding to the sample presentation information as candidate sample presentation information from other sample presentation information except the sample presentation information; forming a second sample image-text group by the sample image related to the sample presenting information and the sample text related to the candidate sample presenting information corresponding to the sample presenting information;

acquiring an initial image-text matching model, wherein the initial image-text matching model comprises an initial feature extraction model and an initial similarity calculation model, the initial feature extraction model is used for extracting a text feature vector of an input text and an image feature vector of an input image, and the initial calculation model is used for performing similarity calculation on the input text feature vector and the input image feature vector;

and training the initial image-text matching model by taking the formed first sample image-text group as a positive sample and the formed second sample image-text group as a negative sample to obtain a trained image-text matching model, wherein the trained image-text matching model comprises a trained feature extraction model.

4. The method of claim 3, wherein training the initial teletext matching model using the composed first sample teletext set as positive samples and the composed second sample teletext set as negative samples comprises:

respectively inputting the positive sample and the negative sample into the initial image-text matching model to obtain a first similarity pre-estimated value corresponding to the positive sample and a second similarity pre-estimated value corresponding to the negative sample;

performing difference calculation on the first similarity estimated value and the second similarity estimated value to obtain estimated positive and negative sample difference values;

determining a loss value of the estimated positive and negative sample difference value relative to a preset positive and negative sample difference value;

and training the initial image-text matching model by using the determined loss value.

5. The method according to claim 3, wherein the determining, as the candidate sample presentation information, the sample presentation information corresponding to the sample presentation information from the other sample presentation information except the sample presentation information comprises:

respectively determining the similarity between the sample texts related to the information for presenting other samples and the sample texts related to the information for presenting the samples;

extracting the minimum similarity from the determined similarities as a target similarity;

and determining the information for presenting other samples corresponding to the target similarity as the information for presenting candidate samples corresponding to the information for presenting sample presentation.

6. The method of claim 1, wherein the text associated with information for the target presentation comprises at least one of:

title text, body text, landing page text.

7. The method according to one of claims 1-6, wherein the method further comprises:

acquiring user information of a target user;

determining whether the user information matches the target presentation information based on the obtained target feature vector;

in response to determining that the user information matches the target presentation information, pushing the target presentation information to the target user.

8. An apparatus for generating information, comprising:

a first acquisition unit configured to acquire a text and an image associated with the target presentation information as a target text and a target image, respectively;

the input unit is configured to input the target text and the target image into a pre-trained feature extraction model, and obtain a text feature vector corresponding to the target text and an image feature vector corresponding to the target image;

a generating unit configured to generate a target feature vector corresponding to the target presentation information based on the text feature vector and the image feature vector.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.