CN117633212A

CN117633212A - Topic generation method, device and equipment

Info

Publication number: CN117633212A
Application number: CN202311605965.9A
Authority: CN
Inventors: 赵旭; 高龙成; 梁广东
Original assignee: Beijing Momo Information Technology Co ltd
Current assignee: Beijing Momo Information Technology Co ltd
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-03-01

Abstract

The application provides a topic generation method, device and equipment. The topic generation method provided by the application comprises the following steps: aiming at each piece of real-time release content released by a user in real time, carrying out feature extraction on the real-time release content by utilizing a pre-trained cross-modal aligned image-text model to obtain a real-time text feature vector and/or a real-time image feature vector of the real-time release content; constructing a feature vector library based on the real-time text feature vectors and/or the real-time image feature vectors of the plurality of real-time release contents; determining topics of real-time text feature vectors and real-time image feature vectors in a feature vector library; aiming at the target real-time release content of the topic to be determined, searching a matched feature vector matched with the target real-time release content from a feature vector library; and determining topics of the target real-time release content according to topics of the matched feature vectors. The topic generation method, the topic generation device and the topic generation equipment do not need to be manually participated, reduce the cost and are convenient to popularize and apply.

Description

Topic generation method, device and equipment

Technical Field

The application relates to the technical field of internet, in particular to a topic generation method, device and equipment.

Background

With the popularity of mobile internet, people increasingly rely on mobile phones for social and recreational activities. The rise of the social platform provides a convenient social network for users, so that the users can keep contact with others anytime and anywhere and share life drops.

For a social platform, release content released by a user in the platform is a main entrance for constructing social relations, and high-quality release content can attract content consumers to stay in the platform for a long time, so that payment willingness is improved. In order to enable content consumers to generate interest in release content, the release content in the platform needs to be subjected to content understanding, the subject content is extracted, and topics are generated, so that the consumption willingness of the content consumers interested in the topics is improved.

The existing topic generation method generally needs to be manually participated, consumes a large amount of manpower resources and is high in development cost. Therefore, how to reduce the labor and the cost is a urgent problem to be solved.

Disclosure of Invention

In view of this, the present application provides a topic generation method, device and equipment, so as to solve the problems that the existing topic generation method consumes a great deal of manpower resources and has high development cost.

Specifically, the application is realized by the following technical scheme:

a first aspect of the present application provides a topic generation method, the method including:

aiming at each piece of real-time release content released by a user in real time, carrying out feature extraction on the real-time release content by utilizing a pre-trained cross-modal aligned image-text model to obtain a real-time text feature vector and/or a real-time image feature vector of the real-time release content;

constructing a feature vector library based on the real-time text feature vectors and/or the real-time image feature vectors of the plurality of real-time release contents;

determining topics of real-time text feature vectors and real-time image feature vectors in the feature vector library;

aiming at the target real-time release content of the topic to be determined, searching a matched feature vector matched with the target real-time release content from the feature vector library;

and determining the topics of the target real-time release content according to the topics of the matching feature vector.

The second aspect of the application provides a topic generation device, which comprises an extraction module, a construction module, a determination module and a search module; wherein,

the extraction module is used for extracting the characteristics of each piece of real-time release content released by the user in real time by utilizing a pre-trained cross-modal aligned image-text model to obtain a real-time text characteristic vector and/or a real-time image characteristic vector of the real-time release content;

The construction module is used for constructing a feature vector library based on the real-time text feature vectors and/or the real-time image feature vectors of the plurality of real-time release contents;

the determining module is used for determining topics of the real-time text feature vectors and the real-time image feature vectors in the feature vector library;

the searching module is used for searching matched feature vectors matched with the target real-time release content of the topic to be determined from the feature vector library;

the determining module is further configured to determine, according to the topic of the matching feature vector, a topic of the target real-time release content.

A third aspect of the present application provides a topic generation device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods provided in the first aspect of the present application when the program is executed.

A fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the methods provided in the first aspect of the present application.

According to the topic generation method, the topic generation device and the topic generation equipment, for each piece of real-time release content released by a user in real time, feature extraction is carried out on the real-time release content by utilizing a pre-trained cross-modal aligned image-text model to obtain a real-time text feature vector and/or a real-time image feature vector of the real-time release content, a feature vector library is constructed based on the real-time text feature vectors and/or the real-time image feature vectors of the plurality of pieces of real-time release content, and then topics of the real-time text feature vectors and the real-time image feature vectors in the feature vector library are determined, so that target real-time release content of the topic to be determined is determined, a matched feature vector matched with the target real-time release content is searched from the feature vector library, and the topic of the target real-time release content is determined according to the topics of the matched feature vector. Therefore, a feature vector library can be constructed according to the plurality of pieces of real-time release content, and the topics of the target real-time release content of the topics to be determined can be automatically determined based on the feature vector library, so that manual participation is not needed, the cost is reduced, and the method is convenient to popularize and apply.

Drawings

Fig. 1 is a flowchart of a first embodiment of a topic generation method provided in the present application;

fig. 2 is a flowchart of a second embodiment of a topic generation method provided in the present application;

fig. 3 is a flowchart of a third embodiment of a topic generation method provided in the present application;

fig. 4 is a flowchart of a fourth embodiment of a topic generation method provided in the present application;

fig. 5 is a flowchart of a fifth embodiment of a topic generation method provided in the present application;

fig. 6 is a hardware configuration diagram of a topic generation device in which the topic generation device provided in the present application is located;

fig. 7 is a schematic structural diagram of a topic generation device according to a first embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

Specific examples are given below to describe the technical solutions of the present application in detail.

Fig. 1 is a flowchart of a topic generation method according to an embodiment of the present application. Referring to fig. 1, the method provided in this embodiment includes:

s101, aiming at each piece of real-time release content released by a user in real time, performing feature extraction on the real-time release content by utilizing a pre-trained cross-modal aligned image-text model to obtain a real-time text feature vector and/or a real-time image feature vector of the real-time release content.

Specifically, each piece of real-time distribution content includes text content and/or pictures/videos. For example, a real-time distribution content includes text and pictures, and for another example, a real-time distribution content includes only text, pictures, or video; for another example, a real-time distribution includes text and video.

It should be noted that the pre-trained cross-modal aligned graphics model is a feature extraction model, which is used for extracting features of text, pictures or videos to obtain corresponding feature vectors. In addition, the pre-trained cross-modality aligned teletext model can map data of different modalities (e.g., data of different modalities including text, pictures, and video) to a unified feature space to determine that their distance relationships reflect semantic similarity.

Optionally, in a possible implementation manner of the present application, the pre-training process of the pre-trained cross-modal aligned teletext model includes:

(1) Extracting sample data based on historical release content of historical release of a user; wherein the sample data is text and picture/video data pairs.

Specifically, the history distribution content of the user history distribution includes text content and/or pictures/videos.

In the specific implementation, based on the historical release content released by the user on the social platform, the data pair consisting of text content and pictures/videos in the historical release content is extracted as one piece of sample data.

(2) Training a pre-constructed cross-modal image-text model by utilizing the text and picture/video data to obtain the pre-trained cross-modal aligned image-text model; wherein the pre-trained cross-modality aligned teletext model maps text, pictures and video to a unified feature space.

Specifically, the pre-built cross-modal graphic model is selected according to actual needs, and in this embodiment, the cross-modal graphic model is not limited. The pre-built cross-modal teletext model may be a CLIP model, an ALBEF model or a BLIP model.

In particular, during the training process, an appropriate loss function may be selected to allow the model to learn how to achieve cross-modal alignment based on the loss function. For example, in one possible implementation, the loss function of the pre-constructed cross-modality teletext model may be MLM (Masked Language Modeling), single-teletext matching loss, contrast learning loss, etc.

The pre-trained cross-modality aligned teletext model maps text content and picture/video content into a unified feature space.

The dimensions of the real-time text feature vector and the real-time image feature vector are set according to actual needs, and in this embodiment, the dimensions are not limited, and may be 128-dimensional, 256-dimensional, 512-dimensional, 1024-dimensional, 2048-dimensional, and the like, for example. In specific implementation, the dimension of the system can be determined according to service requirements. The following description will take an example in which the dimensions of the real-time text feature vector and the real-time image feature vector are 256 dimensions.

Further, for the picture content in the real-time release content, by means of the pre-trained cross-modal aligned graphic model, an actual image feature vector can be extracted, and the dimension of the real-time image feature vector can be an n×256-dimensional vector, where n is the number of pictures contained in the real-time release content. For video content in real-time release content, equidistant frame extraction processing may be performed on the video content to obtain a preset number (the preset number is set according to actual needs, in this embodiment, the preset number is not limited, for example, in one possible implementation manner, the preset number may be 6) of frame extraction pictures, and then m×256-dimensional real-time image feature vectors may be extracted by using a pre-trained cross-modal aligned graphics-text model, where M is the preset number. Similarly, for text content in real-time release content, a pre-trained cross-modal aligned teletext model can be used to extract 1-256-dimensional real-time text feature vectors.

S102, constructing a feature vector library based on the real-time text feature vectors and/or the real-time image feature vectors of the real-time release contents.

Specifically, a plurality of real-time text feature vectors and/or real-time image feature vectors of the real-time release content are stored in a database in a real-time stream mode, and the database is a feature vector library. The basic unit stored in the feature vector library is a 1×256-dimensional feature vector, and each basic unit contains corresponding meta information, where the meta information includes a URL of text content/picture content of real-time release content corresponding to the basic unit, identification information of real-time release content corresponding to the basic unit (the identification information may be a user ID for releasing the real-time release content), and the like.

With reference to the foregoing description, the real-time release content released by the user includes text, pictures/videos, which have natural alignment relationships, and in this embodiment, the text, pictures and videos can be mapped into a unified feature space through a pre-trained cross-modal alignment graphics-text model, and the spatial distance relationships of the text, pictures and videos directly correspond to the semantic myopia relationships.

S103, determining topics of the real-time text feature vector and the real-time image feature vector in the feature vector library.

For example, in one possible implementation, topics may be determined based on a topic generation model.

Optionally, fig. 2 is a flowchart of a second embodiment of a topic generation method provided in the present application. Referring to fig. 2, in the method provided in this embodiment, on the basis of the foregoing embodiment, the determining topics of the real-time text feature vector and the real-time image feature vector in the feature vector library includes:

s201, searching target historical release contents carrying a designated mark from historical release contents of a user, and extracting features of the target historical release contents by using the image-text model to obtain historical text feature vectors of the target historical release contents; the target historical release content is text content.

Specifically, the specified mark is set according to actual needs. In the present embodiment, this is not limited. For example, in one embodiment, a tag labeled "#" is designated. The specific implementation process and implementation principle of the historical text feature vector for obtaining the target historical content by using the graphic model to perform feature extraction on the target historical release content can refer to the description in the above embodiment, and will not be repeated here.

For example, in an embodiment, in this step, the target history release content carrying the "#" tag may be searched for from the history release content released by the user on the social platform, for example, in an embodiment, the searched target history release content includes the history release content 1, the history release content 2 and the history release content 3, and further, the history text feature vector 11 of the history release content 1, the history text feature vector 21 of the history release content 2 and the history text feature vector 31 of the history release content 3 are obtained after feature extraction of the history release content 1, the history release content 2 and the history release content 3 by using the graphics-text model.

S202, inserting the historical text feature vector into the feature vector library, and clustering the feature vectors in the feature vector library to obtain a first clustering result.

In specific implementation, the historical text feature vectors are inserted into the feature vector library in batches, the basic units inserted into the feature vector library are still feature vectors with 1 x 256 dimensions, each basic unit comprises corresponding meta information, and the meta information comprises original content and identification information of target historical release content corresponding to the basic unit.

Further, in specific implementation, the feature vectors in the feature vector library can be clustered by Mini-batch Kmeans (Mini-batch K-means Clustering) based on a related Clustering method, so as to obtain a first Clustering result.

For example, in one embodiment, the historical text feature vector 11 of the historical published content 1, the historical text feature vector 21 of the historical published content 2, and the historical text feature vector 31 of the historical published content 3 are inserted into a feature vector library, and after the feature vectors in the feature vector library are clustered, a clustering result 1 is obtained, where the clustering result 1 includes 10 classes, and the first two classes in the clustering result 1 are described below as an example.

S203, determining topics of the real-time text feature vector and the real-time image feature vector in each class according to the text content corresponding to the historical text feature vector in the class.

Specifically, in one possible implementation manner, for each class in the first clustering result, text content corresponding to the historical text feature vector in the class may be directly determined as topics of the real-time text feature vector and the real-time image feature vector in the class.

For example, in connection with the above example, the first two classes in the clustering result 1 are described as an example. For the first class, for example, the first class includes a history text feature vector 11, where the text content corresponding to the first class is a history release content 1, and in this step, topics of the real-time text feature vector and the real-time image feature vector in the first class are determined as the history release content 1. For another example, for the second class, for example, the second class includes the history text feature vector 11, the history text feature vector 31, and the history text feature vector 51, in this step, the history text content 1 corresponding to the history text feature vector 1, the history text content 3 corresponding to the history text feature vector 3, and the history text content 5 corresponding to the history text feature vector 5 in the second class are determined as topics of the second class, that is, the topics of the real-time text feature vector and the real-time image feature vector in the second class are determined as the history text content 1, the history text content 3, and the history text content 5.

S104, aiming at the target real-time release content of the topic to be determined, searching the matched feature vector matched with the target real-time release content from the feature vector library.

Specifically, in one possible implementation manner, for a target real-time release content of a topic to be determined, a graphic model may be first utilized to perform feature extraction on the target release content to obtain a target feature vector thereof, further calculate a similarity between the target feature vector and each feature vector in the feature vector library, and finally select a target feature vector with a similarity greater than a preset threshold as a similar feature vector matched with the target feature vector. The preset threshold is set according to actual needs, and in this embodiment, the preset threshold is not limited.

For example, in one embodiment, for the real-time release content 1 of the topic to be determined, the matching feature vector matching the real-time release content 1 is found from the feature vector library to be the feature vector 3.

S105, determining topics of the target real-time release content according to the topics of the matching feature vector.

Specifically, in one possible implementation manner, the topic of the matching feature vector may be directly determined as the topic of the target real-time release content.

For example, in the present step, the topic of the feature vector 3 is determined as the topic of the real-time distribution content 1 in combination with the above example.

It should be noted that, referring to the foregoing description, the real-time release content includes text and/or pictures/videos, so after feature extraction is performed on the target real-time release content, at least one target feature vector of the target real-time release content may be obtained, for example, the target feature vector includes a target text feature vector corresponding to text and a target picture feature vector corresponding to pictures, further, when determining topics by using the target text feature vector and the target picture feature vector, the determined topics may not be consistent, alternatively, when determining topics of the target real-time release content according to topics of the matching feature vector, an intersection or union of topics of the matching feature vector may be determined, and the intersection or union may be determined as the topics of the target real-time release content.

According to the topic generation method provided by the embodiment, for each piece of real-time release content released by a user in real time, feature extraction is performed on the real-time release content by utilizing a pre-trained cross-modal aligned image-text model to obtain a real-time text feature vector and/or a real-time image feature vector of the real-time release content, a feature vector library is constructed based on the real-time text feature vectors and/or the real-time image feature vectors of the plurality of pieces of real-time release content, and then topics of the real-time text feature vectors and the real-time image feature vectors in the feature vector library are determined, so that target real-time release content of topics to be determined is determined, matching feature vectors matched with the target real-time release content are searched from the feature vector library, and topics of the target real-time release content are determined according to the topics of the matching feature vectors. Therefore, a feature vector library can be constructed according to the plurality of pieces of real-time release content, and the topics of the target real-time release content of the topics to be determined can be automatically determined based on the feature vector library, so that manual participation is not needed, the cost is reduced, and the method is convenient to popularize and apply.

Fig. 3 is a flowchart of a third embodiment of a topic generation method provided in the present application. Referring to fig. 3, in the method provided in this embodiment, for each class in the first clustering result, determining topics of a real-time text feature vector and a real-time image feature vector in the class by using text content corresponding to a historical text feature vector in the class includes:

s301, determining text contents corresponding to the historical text feature vectors in a first class as topics of the real-time text feature vectors and the real-time image feature vectors in the first class aiming at the first class with the number of the historical text feature vectors smaller than or equal to a preset threshold value.

Specifically, the preset threshold is set according to actual needs, and in this embodiment, this is not limited. For example, in one embodiment, the preset threshold may be 10.

For example, when the preset threshold is 10, the number of the historical text feature vectors in the first class of the clustering result 1 is 1, and in this step, the historical text content 1 corresponding to the historical text feature vector 11 is determined as the topics of the real-time text feature vector and the real-time image feature vector in the first class.

S302, clustering the historical text feature vectors in a second class aiming at the second class with the number of the historical text feature vectors larger than a preset threshold value to obtain a second clustering result of the second class, and determining the class center of each class in the second clustering result.

In particular, the specific implementation process and implementation principle of the clustering may refer to the description in the foregoing embodiments, which is not repeated herein.

For example, in combination with the above example, for a certain class in the clustering result 1, if the number of the historical text feature vectors in the class is greater than a preset threshold after the first clustering, in this step, the historical text feature vectors in the class are clustered to obtain a second clustering result.

It should be noted that, when clustering is performed, the clustering parameters may be adjusted to control the number of classes in the second clustering result. For example, in one embodiment, the number of classes in the second clustering result is controlled to be less than or equal to 5 by adjusting the clustering parameters.

In addition, the class center refers to a representative point of all samples in the class, and represents the center position of each class, which may represent the distribution of the class.

S303, determining text contents corresponding to the historical text feature vectors corresponding to the class centers of each class in the second class of second aggregation results as topics of the real-time text feature vectors and the real-time image feature vectors in the second class.

For example, in combination with the above example, for each class in the second class result, text content corresponding to the historical text feature vector corresponding to the class center of each class is determined as the topic of the real-time text feature vector and the real-time image feature vector in the second class.

According to the method, text contents corresponding to the historical text feature vectors in the first class are directly determined to be topics of the real-time text feature vectors and the real-time image feature vectors in the first class aiming at the first class in which the number of the historical text feature vectors is smaller than or equal to a preset threshold; for a second class, the number of the historical text feature vectors of which is larger than a preset threshold value, clustering the historical text feature vectors in the second class to obtain second class results of the class, determining the class center of each class in the second class results, and further determining the historical text feature vectors corresponding to the class center of each class in the second class results as topics of the real-time text feature vectors and the real-time image feature vectors in the class aiming at each second class. Therefore, after the first clustering, when the number of the historical text feature vectors in the class is different, different strategies are adopted to determine topics, so that more accurate topics can be obtained.

Fig. 4 is a flowchart of a fourth embodiment of a topic generation method provided in the present application. Referring to fig. 4, in the method provided in this embodiment, on the basis of the foregoing embodiment, the searching for a similar feature vector matched with the feature vector from the feature vector library includes:

s401, performing feature extraction on the target real-time release content by using the image-text model to obtain at least one target feature vector of the target real-time release content.

For example, in one embodiment, if the target real-time distribution content is the real-time distribution content 4, in this step, the image-text model is used to perform feature extraction on the real-time distribution content 4, so as to obtain a real-time text feature vector 41 and a real-time image feature vector 42.

S402, calculating the similarity between each target feature vector and each feature vector in the feature vector library according to each target feature vector, and selecting a preset number of feature vectors from the feature vector library as similar feature vectors of the target feature vector according to the sequence of the similarity from high to low.

For specific implementation procedures and implementation principles for calculating the similarity, reference may be made to descriptions in the related art, and details are not repeated here. For example, the cosine law may be utilized to calculate the similarity.

Specifically, the preset number is set according to actual needs, and in this embodiment, this is not limited. For example, in one embodiment, the predetermined number is 3.

For example, in connection with the above example, when the preset number is 3, similar feature vectors of the real-time text feature vector 4 are determined as the text feature vector 1, the text feature vector 2, and the image feature vector 1 for the real-time text feature vector 41.

S403, determining the similar feature vector of the at least one target feature vector as a matching feature vector matched with the target real-time release content.

For example, in connection with the above example, in this step, the similar feature vector of the real-time text feature vector 41 and the similar feature vector of the real-time image feature vector 42 are determined as matching feature vectors that match the target real-time distribution content 4.

According to the topic generation method provided by the embodiment, the image-text model is utilized, feature extraction is carried out on the target real-time release content to obtain at least one target feature vector of the target real-time release content, then the similarity between the target feature vector and each feature vector in the feature vector library is calculated for each target feature vector, a preset number of feature vectors are selected from the feature vector library to serve as similar feature vectors of the target feature vector according to the sequence of the similarity from top to bottom, and then the similar feature vector of the at least one target feature vector is determined to be a matching feature vector matched with the target real-time release content. Therefore, the appointed number of feature vectors can be selected as the similar feature vectors based on the sequence of the similarity from high to low, and then the similar feature vectors are utilized to determine the topics of the target real-time release content, so that more accurate topics can be obtained.

Fig. 5 is a flowchart of a fifth embodiment of a topic generation method provided in the present application. Referring to fig. 5, the method provided in this embodiment includes:

s501, acquiring a customized topic, and carrying out feature extraction on the customized topic by utilizing the image-text model to obtain a customized feature vector of the customized topic; the customized topics comprise historical search records of users and specified topics selected by operators.

Specifically, the customized topics can meet some customized requirements, and are used for adapting the exploration of topics of interest by users and the distribution strategy of operation.

Alternatively, the customized topics may include a user's history search record and specified topics selected by the operator, where the user's history search record generally records topics of interest to the user. Further, the specified topics selected by operators are generally popular topics.

It should be noted that the customized topics are mainly high-discussion content and positive-energy content. Each customized topic includes text content.

In specific implementation, for a customized topic, a customized feature vector of the customized topic can be extracted by means of a graphic model, and the dimension of the customized feature vector can be a 1 x 256-dimensional vector.

S502, constructing a custom topic library based on custom feature vectors of a plurality of custom topics.

Specifically, the customized feature vectors of the customized topics are stored in a database, and the database is the customized topic library. The basic unit stored in the custom speech question library is a 1 x 256-dimensional feature vector, and each basic unit contains corresponding meta information, where the meta information includes identification information of text content corresponding to the basic unit, and the like.

S503, aiming at the target real-time release content of the topic to be determined, searching a target custom feature vector matched with the target custom feature vector from the custom topic library.

Specifically, in one possible implementation manner, for the target real-time release content of the topic to be determined, the image-text model may be utilized to perform feature extraction on the target release content to obtain a target feature vector thereof, further calculate the similarity between the target feature vector and each feature vector in the custom topic library, and finally select the target feature vector with the similarity greater than the preset threshold as the target custom feature vector matched with the target feature vector. The preset threshold is set according to actual needs, and in this embodiment, the preset threshold is not limited.

S504, determining the topics of the target customized feature vector as topics of the target real-time release content.

For example, in one possible implementation, the topic of the target customized feature vector is popular topic 1, and in this step, the topic of the target real-time release content is determined to be the popular topic 1.

According to the topic generation method provided by the embodiment, the customized topic is obtained, and the image-text model is utilized to conduct feature extraction on the customized topic, so that a customized feature vector of the customized topic is obtained; the customized topics comprise historical retrieval records of users and appointed topics selected by operators, and a customized topic library is built based on customized feature vectors of a plurality of customized topics, so that the customized topic library can be built, and then, the topics can be determined based on the customized topic library according to real-time release content of the topics to be determined. Therefore, the method can meet the customized demands of users, adapt to the exploration of the topics of interest by the users and the distribution strategy of operation, does not need to be manually participated, reduces the cost and is convenient to popularize and apply.

Corresponding to the foregoing embodiment of a topic generation method, the present application further provides an embodiment of a topic generation device.

The topic generation device provided by the application can be applied to topic generation equipment. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory through a processor of topic generation equipment where the device is located for operation. In terms of hardware level, as shown in fig. 6, a hardware structure diagram of a topic generating device where a topic generating device provided in the present application is a topic generating device where the device is located in an embodiment, except for a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 6, other hardware may be included according to an actual function of the topic generating device, which is not described herein.

Fig. 7 is a schematic diagram of the structure of the topic generation device according to the first embodiment. Referring to fig. 7, the apparatus provided in this embodiment includes an extracting module 710, a constructing module 720, a determining module 730, and a searching module 740; wherein,

the extracting module 710 is configured to perform feature extraction on each piece of real-time release content released in real time by using a pre-trained cross-modal aligned graphics-text model, so as to obtain a real-time text feature vector and/or a real-time image feature vector of the real-time release content;

the building module 720 is configured to build a feature vector library based on the real-time text feature vectors and/or the real-time image feature vectors of the plurality of real-time release contents;

the determining module 730 is configured to determine topics of real-time text feature vectors and real-time image feature vectors in the feature vector library;

the searching module 740 is configured to search a matching feature vector matched with the target real-time publishing content of the topic to be determined from the feature vector library;

the determining module 730 is further configured to determine a topic of the target real-time publishing content according to the topic of the matching feature vector.

According to the topic generation device provided by the embodiment, for each piece of real-time release content released by a user in real time, feature extraction is performed on the real-time release content by utilizing a pre-trained cross-modal aligned image-text model to obtain a real-time text feature vector and/or a real-time image feature vector of the real-time release content, a feature vector library is constructed based on the real-time text feature vectors and/or the real-time image feature vectors of the plurality of pieces of real-time release content, and then topics of the real-time text feature vectors and the real-time image feature vectors in the feature vector library are determined, so that target real-time release content of topics to be determined is determined, matching feature vectors matched with the target real-time release content are searched from the feature vector library, and topics of the target real-time release content are determined according to the topics of the matching feature vectors. Therefore, a feature vector library can be constructed according to the plurality of pieces of real-time release content, and the topics of the target real-time release content of the topics to be determined can be automatically determined based on the feature vector library, so that manual participation is not needed, the cost is reduced, and the method is convenient to popularize and apply.

Optionally, the determining module 730 is specifically configured to:

searching target historical release contents carrying a designated mark from historical release contents of historical release of a user, and extracting features of the target historical release contents by utilizing the image-text model to obtain a historical text feature vector of the target historical content; the target historical release content is text content;

inserting the historical text feature vector into the feature vector library, and clustering the feature vectors in the feature vector library to obtain a first clustering result;

and determining topics of the real-time text feature vector and the real-time image feature vector in each class by utilizing text content corresponding to the historical text feature vector in the class according to each class in the first clustering result.

Optionally, the determining module 730 is specifically configured to:

determining text contents corresponding to the historical text feature vectors in a first class as topics of real-time text feature vectors and real-time image feature vectors in the first class aiming at the first class with the number of the historical text feature vectors smaller than or equal to a preset threshold;

clustering the historical text feature vectors in a second class aiming at the second class with the number of the historical text feature vectors larger than a preset threshold value to obtain a second clustering result of the second class, and determining the class center of each class in the second clustering result;

And determining text contents corresponding to the historical text feature vectors corresponding to the class centers of each class in the second aggregation results of the second class as topics of the real-time text feature vectors and the real-time image feature vectors in the second class aiming at each second class.

Optionally, the pre-training process of the pre-trained cross-modal aligned teletext model includes:

extracting sample data based on historical release content of historical release of a user; wherein the sample data is text and picture/video data pairs;

training a pre-constructed cross-modal image-text model by utilizing the text and picture/video data to obtain the pre-trained cross-modal aligned image-text model; wherein the pre-trained cross-modality aligned teletext model maps text, pictures and video to a unified feature space.

Optionally, the searching module 740 is specifically configured to:

extracting features of the target real-time release content by using the image-text model to obtain at least one target feature vector of the target real-time release content;

for each target feature vector, calculating the similarity between the target feature vector and each feature vector in the feature vector library, and selecting a preset number of feature vectors from the feature vector library as similar feature vectors of the target feature vector according to the sequence of the similarity from high to low;

And determining the similar feature vector of the at least one target feature vector as a matching feature vector matched with the target real-time release content.

Optionally, the determining module 730 is specifically configured to determine an intersection or a union of topics of the matching feature vector, and determine the intersection or the union as a topic of the target real-time release content.

With continued reference to fig. 6, the present application further provides a topic generating device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of any one of the methods provided in the first aspect of the present application when the program is executed.

Further, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of any of the methods provided in the first aspect of the present application.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. A topic generation method, the method comprising:

2. The method of claim 1, wherein determining topics for real-time text feature vectors and real-time image feature vectors in the feature vector library comprises:

Searching target historical release contents carrying a designated mark from historical release contents of historical release of a user, and extracting features of the target historical release contents by utilizing the image-text model to obtain historical text feature vectors of the target historical release contents; the target historical release content is text content;

3. The method of claim 2, wherein for each class in the first clustering result, determining topics for real-time text feature vectors and real-time image feature vectors in the class using text content corresponding to historical text feature vectors in the class, comprises:

4. The method according to claim 1, wherein the method further comprises:

acquiring a customized topic, and extracting features of the customized topic by utilizing the image-text model to obtain a customized feature vector of the customized topic; the customized topics comprise historical retrieval records of users and appointed topics selected by operators;

constructing a custom topic library based on custom feature vectors of a plurality of custom topics;

aiming at the target real-time release content of the topic to be determined, searching a target customized feature vector matched with the target customized feature vector from the customized topic library;

and determining the topic of the target customized feature vector as the topic of the target real-time release content.

5. The method of claim 1, wherein the pre-training process of the pre-trained cross-modality aligned teletext model comprises:

6. The method of claim 1, wherein finding similar feature vectors that match from the feature vector library comprises:

7. The method of claim 1, wherein the determining the topic of the target real-time publication content based on the topic of the matching feature vector comprises:

and determining an intersection or a union of topics of the matching feature vector, and determining the intersection or the union as the topics of the target real-time release content.

8. The topic generation device is characterized by comprising an extraction module, a construction module, a determination module and a search module; wherein,

9. A topic generation device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1-7 when the program is executed by the processor.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.