CN112800339B

CN112800339B - Information stream searching method, device and equipment

Info

Publication number: CN112800339B
Application number: CN202110364974.8A
Authority: CN
Inventors: 岳天驰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-06-22
Anticipated expiration: 2041-04-06
Also published as: CN112800339A

Abstract

The application provides an information flow searching method, a device and equipment, which relate to the technical field of artificial intelligence and are used for improving the speed of searching information flow, in the information flow searching method, a plurality of information flow titles are clustered to obtain corresponding information flow description texts, when a search is needed, the corresponding information flow description text can be matched based on the search key words, corresponding information flow title set is obtained according to the information flow description text, information flow content is obtained according to the information flow title set, since it is not necessary to match the search keyword with each information stream title one by one, the matching amount in the search process is reduced, the speed of searching the information stream is increased, and, the information flow description text is generated based on the plurality of information flow title sets, so that the main content of the event can be reflected more accurately, and the accuracy of the information flow content determined based on the search keywords can be improved.

Description

Information stream searching method, device and equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to the technical field of machine learning, and provides an information stream searching method, device and equipment.

Background

With the continuous development of the internet, various content service platforms, for example, news service platforms, gradually appear, and some news service platforms can timely push hot news to users according to currently occurring hot events. However, a great deal of hot news is released on a news service platform, and how to quickly provide hot news meeting the requirements of a user from a great amount of hot news is a technical problem which needs to be solved urgently.

Disclosure of Invention

The embodiment of the application provides an information flow searching method, device and equipment, which are used for improving the speed of searched information flow content.

In one aspect, an embodiment of the present application provides an information stream searching method, including:

acquiring a search keyword;

determining an information flow description text matched with the search keyword; the information flow description text is used for describing the information flow content corresponding to the information flow title in the corresponding information flow title set;

acquiring an information flow title set corresponding to the information flow description text;

pulling information stream content based on the information stream titles in the information stream title set;

and displaying the pulled information flow content as the search result matched with the search keyword.

In one aspect, an embodiment of the present application provides an information flow searching apparatus, including:

the keyword acquisition module is used for acquiring search keywords;

the matching module is used for determining an information flow description text matched with the search keyword; the information flow description text is used for describing the information flow content corresponding to the information flow title in the corresponding information flow title set;

the title acquisition module is used for acquiring an information flow title set corresponding to the information flow description text;

the pulling module is used for pulling information flow content based on the information flow titles in the information flow title set;

and the display module is used for displaying the pulled information flow content as the search result matched with the search keyword.

In a possible embodiment, the apparatus further comprises a text generation module, wherein:

the text generation module is used for respectively executing the following operations aiming at each target event: acquiring each information flow title corresponding to one target event in each target event; determining an information flow description text corresponding to the target event based on the information flow titles;

and the matching module is used for determining the information flow description texts matched with the search keywords from the information flow description texts corresponding to the target events.

In a possible embodiment, the text generation module is specifically configured to:

acquiring a target text sequence of a target event, wherein the target text sequence comprises a first target text subsequence of an information flow title set of the target event and a second target text subsequence comprising character mask marks;

performing, by a plurality of codecs in a trained target text generation model, a plurality of iterations of operations based on the target text sequence, wherein:

in a first round of iterative operations, an encoded output of a first codec of the plurality of codecs is obtained based on the first target subsequence of text, and a decoded output of the first codec is obtained based on the second target subsequence of text;

in each iteration operation except for a first iteration operation in the multiple iterations operations, the decoding output of the first codec is obtained based on the prediction result of a historical iteration operation, the historical iteration operation refers to the iteration operation performed before the current iteration operation, and the prediction result corresponding to each iteration operation is obtained based on the decoding output of the last codec in the multiple codecs;

in each of the plurality of rounds of iterative operations, a decoded output of each codec other than the first codec is obtained based on an encoded output and a decoded output of a previous codec;

and obtaining an information flow description text corresponding to the target event based on the prediction result corresponding to the multiple rounds of iterative operations.

In a possible embodiment, the prediction results of the historical round iteration operations comprise a plurality of prediction results of which the prediction probabilities meet a preset probability condition; the predicted result of each iteration operation except the first iteration operation in the multiple iteration operations is obtained by the following steps:

inputting each combination in a plurality of combinations into the target text generation model respectively to obtain a prediction result corresponding to each combination in the current iteration operation, wherein the plurality of combinations are results of the combination of the first target text subsequence and each prediction result in the plurality of prediction results of the historical iteration operation;

and in the prediction results corresponding to the combinations in the iteration operation of the current round, taking the prediction result meeting the prediction probability condition as the prediction result of the iteration operation of the current round.

In a possible embodiment, the information flow description text corresponding to the target event is obtained through a trained text generation model, and the apparatus further includes a model training module, where the model training module is specifically configured to:

acquiring a sample text sequence set, wherein each sample text sequence comprises a first sample text subsequence of a sample information flow title set of a sample event and a second sample text subsequence of a sample information flow mask description text of the sample event;

performing multiple rounds of iterative training on a plurality of codecs in the text generation model based on the text sequence sample set until a model convergence condition is met to obtain a trained target text generation model, wherein each round of iterative training comprises the following processes:

inputting a sample text sequence selected based on the sample text sequence set into the plurality of codecs to obtain a decoded output of a last codec, wherein the encoded output of a first codec in the plurality of codecs is obtained based on a first sample text subsequence in the sample text sequence, the decoded output of the first codec is obtained based on a second sample text subsequence in the sample text sequence, and the decoded output of each codec except the first codec is obtained based on the encoded output and the decoded output of a previous codec;

adjusting model parameters of the plurality of codecs based on a decoded output of the last codec.

In a possible embodiment, the model training module is specifically configured to:

performing the following operations for the plurality of codecs, respectively:

if one of the codecs is a first codec, encoding a first sample text subsequence in the sample text sequence based on the first codec to obtain an encoded output of the first codec, and decoding a second sample text subsequence in the sample text sequence based on the first codec to obtain a decoded output of the first codec;

if one of the plurality of codecs is the ith codec, encoding the encoding output of the (i-1) th codec based on the ith codec to obtain the encoding output of the ith codec, and decoding the encoding output and the decoding output of the (i-1) th codec based on the ith codec to obtain the decoding output of the ith codec, wherein i is an integer greater than 1 and not less than N, and N is the total number of the plurality of encoders;

the decoded output of the nth codec is obtained.

obtaining a first weight matrix based on the first codec, wherein each row in the first weight matrix corresponds to each first input in the sample text sequence one-to-one, and each row in the first weight matrix comprises the corresponding first input and an attention weight value between the corresponding first input in the sample text sequence;

based on the first codec, for each row in the first class of row set corresponding to the respective first input in the second sample text subsequence in the sample text sequence in the first weight matrix, respectively performing the following operations: setting an attention weight value between a first input corresponding to one line in the first-class line set and a first-class input to zero to obtain a second weight matrix, wherein the first-class input is other inputs positioned after the first input corresponding to the one line in a second sample text subsequence of the sample text sequence;

and obtaining the decoding output of the first codec according to the second weight matrix based on the first codec.

based on the one codec, for each row in the second class row set corresponding to the respective first input in the first sample text sequence in the first weight matrix, respectively performing the following operations: setting an attention weight value between first input and second input corresponding to one line in the second type line set to be zero to obtain a third weight matrix, wherein the second type input is each first input in a second sample text sequence of the sample text sequence;

based on the one codec, obtaining an encoded output of the first codec according to the third weight matrix.

obtaining a fourth weight matrix based on the ith codec, wherein each row in the fourth weight matrix corresponds to each second input in the encoding output and the decoding output of the ith-1 codec, respectively, and each row in the fourth weight matrix comprises the corresponding second input and an attention weight value between the corresponding second input;

for a third class of row sets corresponding to respective second inputs in the decoding output of the i-1 th codec in the fourth weight matrix, performing the following operations, respectively: setting an attention weight value between a second input and a third input corresponding to a row in the third-class row set to zero to obtain a fifth weight matrix, wherein the third input is other second inputs located after the second input corresponding to the row in the decoding output of the i-1 th codec;

and obtaining the decoding output of the ith codec according to the fifth weight matrix based on the ith codec.

based on the ith codec, for each row in the fourth class row set corresponding to the respective input in the coded output of the (i-1) th codec in the fourth weight matrix, performing the following operations: setting an attention weight value between a second input corresponding to each row in the fourth-class row set and a fourth-class input to zero to obtain a sixth weight matrix, wherein the fourth-class input is each second input in the coding output of the i-1 th codec;

and obtaining the coding output of the ith codec according to the sixth weight matrix based on the ith codec.

In one possible embodiment, each codec of the plurality of codecs comprises a codec layer, wherein:

the coded output and the decoded output of any codec are both output through a coding and decoding layer in the codec.

In a possible embodiment, each of the plurality of codecs comprises one coding layer and one decoding layer shared by model parameters, wherein:

the coded output and decoded output of any codec are output through one coding layer and one decoding layer in the any codec, respectively.

for each of the decoded outputs of the last codec, performing the following process:

obtaining a generation probability for the sample information stream to mask a word in the description text based on one of the outputs, wherein the generation probability represents a distribution probability of the masked word in the global word list;

obtaining a copy probability of the masked word in the sample information stream mask description text based on the one output and the encoded output of the last codec, wherein the copy probability is used to represent a correlation between the masked word and the sample text sequence;

performing weighted summation on the generation probability and the reproduction probability to obtain a prediction probability of the hidden word;

adjusting the model parameters of the plurality of codecs based on the predicted probabilities that the sample information stream masks individual words masked in the description text.

An embodiment of the present application provides a computer device, including:

at least one processor, and

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing any of the information flow search methods as previously discussed by executing the instructions stored by the memory.

Embodiments of the present application provide a computer storage medium having stored thereon computer instructions that, when executed on a computer, cause the computer to perform any of the information flow searching methods as previously discussed.

Due to the adoption of the technical scheme, the embodiment of the application has at least the following technical effects:

in the embodiment of the application, the information flow description text is obtained in advance according to the information flow title set clustering, that is, the information flow description text summarizes the information flow content corresponding to the information flow title in the information flow title set to realize the clustering of a large amount of information flow contents, so that after the search keyword is obtained, the information flow description text corresponding to the search keyword can be quickly and accurately matched according to the search keyword, and further, the information flow content related to the search keyword can be quickly determined according to the information flow description text.

Drawings

Fig. 1A is a schematic diagram of a model structure of a transform model according to an embodiment of the present application;

fig. 1B is a first application scenario diagram applicable to an information stream searching method provided in the embodiment of the present application;

fig. 1C is a diagram of an application scenario applicable to an information stream searching method according to an embodiment of the present application;

fig. 2 is a flowchart of an information flow searching method provided in an embodiment of the present application;

FIG. 3 is a schematic interface diagram of an information flow search process provided in an embodiment of the present application;

fig. 4 is a first flowchart of a text generation model training method according to an embodiment of the present application;

fig. 5 is a first schematic structural diagram of a text generation model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a text generation model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a codec according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating a text sequence processing sample by a text generation model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a text generation model provided in the embodiment of the present application;

FIG. 10 is a schematic diagram illustrating a principle of calculating a predicted probability of a masked word according to an embodiment of the present application;

fig. 11 is a second flowchart of a text generation model training method according to an embodiment of the present application;

fig. 12 is a flowchart of a method for generating an information flow description text based on a trained target text generation model according to an embodiment of the present application;

fig. 13 is a schematic interaction diagram between a terminal and a first server according to an embodiment of the present application;

FIG. 14 is a schematic structural diagram of an information flow search decoration according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the drawings and specific embodiments.

To facilitate better understanding of the technical solutions of the present application for those skilled in the art, the following terms related to the present application are introduced.

1. Event and information stream content: events in this application refer to things that have occurred or are likely to occur, including news events. For ease of distinction, events involved in training a model are referred to as sample events, and events involved in using a model are referred to as target events. The information flow content is used to describe the corresponding event, and one event may correspond to one or more information flow contents.

2. Stream header and stream header set: the flow header refers to a brief statement summary of events, each event may have a plurality of corresponding headers, and each flow header of an event is referred to as a flow header set. For ease of distinction, the header when the model is trained may be referred to as a sample information stream header and the header when the model is used may be referred to as a target information stream header.

3. Information flow description text and information flow mask description text: the information flow description text is a refined phrase summary of the event. The length of the stream description text of the event may be shorter than the length of the stream header of the event. The information flow mask description text refers to a result of masking a part of words in the information flow description text of the event, the mark for masking the words is a character mask mark, the information flow mask description text has the character mask marks of the respective words to be masked, that is, the information flow mask description text does not have content information of the words to be masked but has position information of the words to be masked in the information flow mask description text, and correspondingly, the character mask mark does not retain the word information of the words to be masked itself but retains the position information of the words to be masked in the sentence. For the sake of distinction, the information flow mask description text when training the model is referred to as a sample information flow mask description text, and the information flow mask description text when using the model is referred to as a target information flow mask description text.

4. Text sequence, first text subsequence and second text subsequence: in the application, the sequence representation of the information flow title set of the event is called a first text subsequence; the sequence representation of the information flow mask description text of the event is called a second text subsequence; and the splicing result of the first text subsequence and the second text subsequence is a text sequence. To facilitate distinguishing the relative positions of the first text sub-sequence and the second text sub-sequence in the text sequence, the first text sub-sequence further includes a sequence representation of a segmentation marker for indicating a segmentation position of the information stream mask description text from the information stream header set, the segmentation marker being, for example: sep. For the convenience of distinction, a text sequence used in training a model is called a sample text sequence, a first text subsequence is correspondingly called a first sample text subsequence, and a second text subsequence is correspondingly called a second sample text subsequence; the text sequence used when using the model is referred to as a target text sequence, the first text sub-sequence is correspondingly referred to as a first target text sub-sequence, and the second text sub-sequence is correspondingly referred to as a second target text sub-sequence. For clarity of the description of the stream header set, the stream description text and the stream mask description text, the text sequence, the first text subsequence and the second text subsequence, reference is made to the example shown in table 1 below:

TABLE 1

In table 1, x1 corresponds to the sequence representation of the stream header corresponding to D1, x2 corresponds to the sequence representation of the stream header corresponding to D2, x3 corresponds to the sequence representation of the stream header corresponding to D3, t1 is the sequence representation of the segment flag "sep", and c1, c2, c3, c4, c3, and c6 are the sequence representations of the respective words in "55 kinds of [ mask ] price [ mask ] respectively".

5. BERT model: the BERT model is a pre-trained bidirectional language model, and strong text sequence modeling capability is obtained by pre-training on large-scale text data. The BERT model takes a transform encoder framework as a basis, achieves the purpose of constructing a bidirectional language model by a complete filling type covering method, and achieves good effect on a plurality of tasks of natural language processing.

6. Transformer: the Transformer is a neural network framework and comprises an encoding part and a decoding part, wherein the encoding part comprises a plurality of encoding layers which are connected in sequence, the decoding part comprises a plurality of decoding layers which are connected in sequence, each encoding layer and each decoding layer comprise a self-attention layer, and the self-attention layer can adopt a multi-head attention mechanism. The model can fully interact with the input text by utilizing an attention mechanism, and can effectively capture the global information of the input text to obtain rich context expression.

Referring to fig. 1A, a model structure of a transform model is shown, the transform model includes an encoding portion and a decoding portion, the encoding portion includes a plurality of encoding layers, the decoding portion includes a plurality of decoding layers, and fig. 1A illustrates an example in which the encoding portion includes 3 encoding layers and the decoding portion includes 3 decoding layers, but the number of encoding layers included in the encoding portion and the number of decoding layers included in the decoding portion are not limited in practice. The encoded output of the last layer of the encoding section may be input to each decoding layer of the decoding section, each decoding layer decoding based on the encoded output of the last layer of the encoding layer and the decoded output of the previous decoding layer.

7. Cloud technology (Cloud technology): based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, a resource pool can be formed and used as required, and the cloud computing business model is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

8. Cloud computing (cloud computing): the method is a computing mode, and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information services according to needs. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

9. Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

10. Machine Learning (ML): the method is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

It should be noted that "a plurality" in the embodiments of the present application is two or more.

In order to quickly provide corresponding information stream content, embodiments of the present application provide an information stream search method, apparatus, and device. The following is a description of a design concept of an information flow searching method in the embodiment of the present application.

In the embodiment of the application, the information stream description text is generated according to the information stream title set, which is equivalent to clustering the information stream content under an event, so that when the search keyword is obtained, the corresponding information stream content can be matched according to the search keyword, the search keyword does not need to be matched with each information stream title under the event, the information stream content under the event can be quickly determined, and the corresponding information stream content can be quickly fed back. And an information flow description text is generated according to the plurality of information flow title sets, so that the information flow description text can reflect the information flow content of the information flow title set of an event, and the information flow content searched based on the information flow description text is more comprehensive.

Based on the above design concept, an application scenario of the information stream searching method, apparatus, and device in the embodiments of the present application is introduced below.

Referring to fig. 1B, a first application scenario diagram of an information stream searching method according to an embodiment of the present application is shown, where the first application scenario diagram includes a terminal 110, an application 111 running in the terminal 110, and a first server 120. In fig. 1B, two terminals 110 are taken as an example, and the number of terminals 110 is not limited in practice.

The first server 120 is a background server corresponding to the application 111, for example, an application, an applet, a web application, etc. pre-installed in the terminal 110. Application programs generally refer to various types of software applications that can provide information flow description text to a user, such as: news applications, teletext applications, etc.

The first server 120 may obtain the information flow header set and the information flow description text of each target event in each target event, for example, the first server 120 obtains the information flow header set and the information flow description text of the target event from the network resource or the application 111, and obtains the information flow description text corresponding to each target event based on the information flow header set and the information flow description text of each target event. The first server 120 may feed back these information stream description texts to the terminal 110.

When the user searches through the application 111, the terminal 110 may match out a corresponding information stream description text based on the search keyword, so as to feed back corresponding information stream content to the terminal according to the information stream description text, wherein a specific process of the information stream search method will be discussed below.

Further, the first server 120 may generate the information flow description text according to the trained text generation model, wherein the generation process of the information flow description text is described below. The trained text generation model may be obtained by the first server 120 training itself. For example, the first server 120 may obtain the information flow header set and the information flow description text of each of the target events, for example, the first server 120 obtains the information flow header set and the information flow description text of the target event from the network resource or the application 111, and obtains the sample text sequence set based on the information flow header set and the information flow description text of each target event. The first server 120 trains a text generation model based on the sample text sequence set, thereby obtaining a trained target text generation model, wherein a process of training the text generation model will be discussed below.

Referring to fig. 1C, a second application scenario diagram of an information stream searching method according to an embodiment of the present application is shown, where the second application scenario diagram includes a terminal 110, an application 111 running in the terminal 110, a first server 120, and a second server 130.

Unlike fig. 1B, the target text generation model used by the first server 120 in fig. 1C may be obtained from the second server 130.

After the second server 130 trains and obtains the target text generation model, the target text generation model is sent to the first server 120, and the first server 120 generates the model based on the trained target text, so as to generate the information flow description text corresponding to each target event. The first server 120 may feed back or recommend a corresponding information stream description text for the user according to the search keyword of the user.

It should be noted that the information stream search method in the present application may be applied to various application scenarios, for example, the information stream search method is applied to a hotspot search scenario, and information stream contents of a hotspot event are automatically matched for a user. The information flow searching method can also be applied to various application scenes such as virtual reality, augmented reality, unmanned driving, intelligent furniture, intelligent office, intelligent wearing, intelligent transportation, smart cities, unmanned aerial vehicles and robots, and the specific application scene of the information flow searching method is not limited by the method.

It should be noted that the first server 120 and the second server 130 may be implemented by cloud computing.

In one possible application scenario, in order to reduce communication latency between the devices, the first server 120 and the second server 130 may each deploy servers in different regions. Or for load balancing, the first server 120 or the second server 130 may respectively serve each terminal through different servers. The following description takes the example that the second server 130 deploys a plurality of servers respectively as an example:

the data sharing can be realized by a plurality of servers through the block chain, and the plurality of servers equivalently form a data sharing system. For example, a terminal is located at a site a and is in communication connection with one server, and another terminal is located at a site b and is in communication connection with a server other than the one server among the plurality of servers.

Each server in the data sharing system has a node identifier corresponding to the server, and each server in the data sharing system can store node identifiers of other servers in the data sharing system, so that the generated blocks can be broadcast to the other servers in the data sharing system according to the node identifiers of the other servers. Each server may maintain a node identifier list as shown in the following table, and store the server name and the node identifier in the node identifier list correspondingly. The node identifier may be an Internet Protocol (IP) address and any other information that can be used to identify the node, and only the IP address is used as an example in table 2 for description.

TABLE 2

The first server 120 and the second server 130 in the present application may be independent physical servers, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers providing basic cloud computing services such as cloud services, a cloud database, cloud computing, a cloud function, cloud storage, Network services, cloud communication, middleware services, domain name services, security services, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. In addition, the first terminal and the second terminal may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a game device, a smart television, a smart bracelet, and the like. The terminal and each server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

It should be noted that the application scenario diagrams shown in fig. 1B to 1C are examples of application scenarios to which the information flow searching method in the embodiment of the present application is applied, but do not limit the application scenarios to which the embodiment of the present application is applied.

On the basis of the application scenarios discussed in fig. 1B to fig. 1C, the following describes an information flow search method by taking an example in which a terminal executes the information flow search method in the embodiment of the present application.

Referring to fig. 2, a flowchart of an information stream searching method according to an embodiment of the present application is shown, where the flowchart includes:

s21, a search key is obtained.

The terminal may obtain the search keyword according to an input operation of the user, for example, a paste operation or an input operation performed in an input box in the application by the user. The search key words are used for searching corresponding information stream contents. The terminal obtains one or more search keywords, and when the terminal obtains a plurality of search keywords, information stream content search can be carried out based on each search keyword respectively.

For example, referring to fig. 3, which is an interface schematic diagram of an information flow search process provided in an embodiment of the present application, when a user inputs a search keyword in an input box 301 shown in (1) in fig. 3, a terminal obtains the search keyword according to an input operation of the user, specifically, a "price reduction for medicines" shown in (1) in fig. 3.

And S22, determining an information flow description text matched with the search keyword, wherein the information flow description text is used for describing the information flow content corresponding to the information flow title in the corresponding information flow title set.

The terminal can obtain the information flow description texts of the target events from the first server, or the terminal can generate the information flow description texts of the target events by itself, and after the terminal obtains the search keywords, the information flow description texts corresponding to the search keywords can be matched from the information flow description texts of the target events according to the search keywords. The meaning of the information flow description text can refer to the content discussed in the foregoing, and is not described in detail here.

S23, an information flow title set corresponding to the information flow description text is obtained.

After matching the corresponding information flow description text, the terminal may obtain an information flow header set corresponding to the information flow description text.

S24, pulling the content of the information stream based on the information stream header in the information stream header set.

The terminal may pull the information stream content corresponding to each information stream title in the information stream title set, or the terminal may filter one or more information stream titles from the information stream title set, and pull the information stream content corresponding to the filtered information stream title set. The terminal can screen out the information flow titles with the matching degree with the search keyword being greater than the preset matching degree, and the terminal can also screen out the information flow titles corresponding to the information flow contents with the quality being greater than the preset quality, so that the information flow contents with high quality are provided for the user.

And S25, displaying the pulled information flow content as the search result matched with the search keyword.

The terminal can take the pulled information flow content as a search result and display the search result.

Continuing with the example shown in fig. 3, after the user inputs the search keyword, the terminal presents the interface schematic diagram shown in (2) in fig. 3 based on the search keyword, where (2) in fig. 3 presents the information stream title 302 and the information stream content 303 corresponding to each information stream title, and (2) in fig. 3 exemplifies the presentation of two information stream contents, without actually limiting the number of information stream contents presented by the terminal.

In the embodiment of the application, the terminal can perform fast matching according to the search keyword and the information flow description text, so that the search keyword is fast positioned to each information flow content corresponding to the matched information flow description text, and the corresponding information flow content is fed back, therefore, the search keyword does not need to be respectively matched with each information flow title, the search speed can be improved, the information flow content can be provided for a user more fast, and the user experience is improved.

As an embodiment, in executing S22, the terminal may filter, from the multiple information flow description texts corresponding to the multiple target events, an information flow description text matching the search keyword, where the multiple information flow description texts corresponding to the multiple target events may be obtained directly from the first server by the terminal, or the multiple information flow description texts corresponding to the multiple target events may be generated by the terminal itself.

For example, the terminal may obtain an information stream title set corresponding to a target event, and extract keywords in the information stream title set, thereby generating an information stream description text corresponding to the target event. The terminal can also generate a model according to the trained target text to generate an information flow description text corresponding to the target event. The trained target text generation model may be obtained by the terminal from the first service or the first server, or may be trained by the terminal.

The following first introduces the training process of the target text generation model:

taking the example that the terminal executes the training text generation model, the following flowchart of the training method of the text generation model shown in fig. 4 is combined, and the flowchart includes:

s41, a set of sample text sequences is obtained, each sample text sequence including a first sample text subsequence of the sample information stream header set of the corresponding sample event and a second sample text subsequence of the sample information stream mask description text.

The terminal may obtain the sample information flow header set and the sample information flow description text of each sample event from a network resource or from an application program, and the number of sample headers included in the sample information flow header set may be one or more. The following describes a manner in which the terminal obtains a sample text sequence:

the terminal serializes the sample information stream header set and the split flag to obtain a first sample subsequence. The meaning of the segmentation mark can refer to the content discussed above, and is not described herein again. The terminal can mask part of text in the sample information flow description text of one sample event so as to obtain the sample information flow mask description text, and serialize the sample information flow mask description text so as to obtain a second sample text subsequence. Concatenating the first sample text subsequence and the second sample text subsequence, thereby obtaining a sample text sequence.

The terminal can also directly splice the sample information flow title set and the sample information flow description text to obtain a spliced text, mask part of the text in the sample information flow description text in the spliced text, and serialize the masked spliced text, thereby obtaining a sample text sequence.

By analogy, the second server may obtain the sample text sequence corresponding to each sample event, thereby obtaining the sample text sequence set.

S42, based on the sample text sequence set, carrying out multiple rounds of iterative training on a plurality of codecs in the text generation model until a model convergence condition is met, and outputting a trained target text generation model, wherein each round of iterative training comprises the following processes:

s421, inputting a sample text sequence selected based on the sample text sequence set into a plurality of codecs to obtain a decoded output of a last codec, wherein the coded output of a first codec is obtained based on a first sample text subsequence, the decoded output of the first codec is obtained based on a second sample text subsequence, and the decoded output of each codec except the first codec is obtained based on the coded output and the decoded output of a previous codec;

s422, based on the decoding output of the last codec, the model parameters of a plurality of codecs are adjusted.

The terminal can select a part of sample text sequences in the sample text sequence set, and each round of iterative training is carried out on the text generation model.

In each iteration training process, the terminal can input the selected sample text sequence into a plurality of codecs, and the first sample text subsequence is encoded by the first codec to obtain the encoded output of the first codec, wherein the encoded output of the first codec can be understood as the semantic features of the information stream header set. Similarly, the second sample text subsequence is decoded by the first codec to obtain the decoded output of the first codec, and the decoded output of the first codec can be understood as the semantic feature of the sample information flow covering description text.

The encoded output of the first encoder is then encoded by a second codec to obtain an encoded output of the second codec. The encoded output and decoded output of the first codec are decoded by a second codec to obtain a decoded output of the second codec. The second codec is a second codec of the plurality of codecs, and the first codec is a previous codec of the second codec.

And so on until the decoded output of the last codec is obtained, the decoded output of the last codec may mask the sequence corresponding to each masked word in the description text for the sample information stream.

After the terminal obtains the decoding output of the last codec, the terminal can determine the prediction probability of each masked word in the sample information flow mask description text according to the decoding output of the last codec, calculate the loss function according to the prediction probability of each masked word, and further adjust the model parameters of the text generation model according to the loss function, thereby completing the iterative training of the current round.

In the embodiment shown in fig. 4, when the terminal trains the text generation model, each codec in the text generation model can fully learn the sample information stream header set and the semantics of the sample information stream covering description text, so as to improve the semantic integrity of the text output by the text generation model, thereby improving the accuracy of the subsequent output of the text generation model. And each codec except the first codec decodes based on the decoding output and the encoding output of the previous codec, so that the codec can fully utilize the encoding output of each codec when decoding, semantic omission of a sample information stream header set of the encoding output is reduced, correlation between the output of the text generation model and the sample information stream header set is improved, and the accuracy of the output of the text generation model is improved.

In S41, the terminal may obtain the sample information stream description text and the sample information stream title set of each sample event when generating the sample text sequence. The following describes an exemplary manner of obtaining a sample information stream description text and a sample information stream header set of each sample event:

the terminal can obtain a large number of sample information flow titles and sample information flow description texts from network resources or application programs, match each sample information flow title with each sample information flow description text, associate the matched sample information flow title with the sample information flow description text, and so on, thereby obtaining a sample information flow title set and a sample information flow description text corresponding to each sample event. There are various matching modes, and the following examples illustrate:

the first matching mode is as follows: and matching the titles of all the sample information streams with the description texts of all the sample information streams by adopting the trained model.

The trained model may use any text matching model, such as: and the terminal can directly input each sample information flow title and each sample information flow description text into the BERT model in a form of sequence pairs so as to obtain a matching result output by the BERT model.

For example, the terminal obtains the sample information stream header 1 from the network resource as: the price of 55 medicines is reduced, and the average price is reduced by 53 percent; the sample information stream header 2 is: the apple is found to contain a large amount of nutrient substances; the sample information stream header 3 is: the secret recipe for longevity is … …; the sample information stream header 4 is: the 55 medicines are reduced by 53% on average! There is also a good message … …. The terminal obtains a sample information flow description text A from the network resource as follows: price reduction of 55 medicines; the sample information stream description text B is: newly discovered apple nutrition; the sample information stream description text C is: the secret recipe of longevity.

The terminal may match such sample information stream description text and sample information stream header through the BERT model, thereby determining that sample information stream header 1 and sample information stream header 2 match sample information stream description text a, sample information stream header 3 matches sample information stream description text B, and sample information stream header 4 matches sample information stream description text C.

And a second matching mode: and carrying out keyword matching on each sample title and each sample information flow description text.

Specifically, the terminal may extract a first keyword in each sample information stream title and a second keyword in each sample information stream description text, and determine that the first keyword of one sample information stream title matches the second keyword in one sample information stream description text, and then determine that the sample information stream title and the sample information stream description text correspond to the same sample event. It should be noted that the first keywords refer to keywords in the title of the sample information stream, and the second keywords refer to keywords in the description text of the sample information stream, which does not actually limit the number of the first keywords and the second keywords.

After obtaining the sample information stream description text, the terminal may obscure the sample information stream description text for the sample event to obtain the sample information stream occlusion description text. For example, the terminal may randomly extract a part of the text from the sample information stream description text to replace the part of the text with a character mask mark, thereby obtaining the sample information stream mask description text. The random extraction mode can avoid the condition that the extraction of too close texts causes the overfitting of the subsequent text generation model.

Further, the terminal may randomly extract a fixed proportion of the text in the sample information flow description text during random extraction. The random extraction mode can avoid the condition that overfitting of a subsequent text generation model is caused by extracting too close texts, and can also avoid the condition that too many or too few texts are covered to influence the training effect of the text generation model.

After obtaining the sample information stream header set and the sample information stream description text corresponding to the sample event, the terminal may serialize the sample information stream header set and the sample information stream mask description text to obtain the target text sequence.

In serialization, the terminal may obtain a sequence representation of each word, combine the sequence representations of the individual words to obtain a sample text sequence. In relation to how the terminal obtains the sample information stream header set and the sample information stream mask description text for sequential representation of each word, the following describes how to obtain a sequential representation of a word:

the first method is as follows:

the word is encoded using one-hot encoding to obtain a sequence representation of the word.

For example, using one-hot encoding for "drug", the sequence that results in the word is denoted "0110".

The second method comprises the following steps:

and obtaining the sequence representation of the word according to the character vector, the position vector and the sentence break vector of the word.

The terminal may vectorize the word, for example using one-hot encoding, to obtain a character vector for the word, the character vector representing a content representation of the word.

The terminal may vectorize the position of the word in the sample information stream header set and the sample information stream mask description text, the position vector representing the position of the word in the sample information stream header set and the sample information stream mask description text.

And vectorizing the sentence break result of the word belonging to the sample information flow title set or the sample information flow mask description text by the terminal to obtain a sentence break vector, wherein the sentence break vector indicates that the word belongs to the sample information flow title set or the sample information flow mask description text.

After the character, position, and sentence break vectors for the word are obtained, the character, position, and sentence break vectors for the word may be weighted and summed to obtain a sequence representation for the word.

For example, "55 drugs are reduced in price, and the average reduction of 53% for sep 55 [ mask ] item [ mask ] price" is achieved, wherein "55 drugs are reduced in price, the average reduction of 53%" is the sample title of the sample event, "sep" is the segmentation marker, and "55 drugs are reduced in price" is the sample information flow mask description text.

The terminal encodes the word "medicine" to obtain a character vector of "0110"; vectorizing the 'third position' where the 'medicine' is located to obtain a position vector '1110'; vectorizing the title set 0001 of the sample information flow or the description text 0000 of the sample information flow covering of the medicine to obtain a punctuation vector 0001; the terminal adds the character vector, position vector and sentence break vector to obtain a sequence of words denoted "1221".

In this way, the sequence representation of each word not only includes the content information of the word, but also includes the position information and sentence break information of the word, so that richer word information can be expressed, and the text generation model can learn more information subsequently.

After obtaining the sample text sequence set, the terminal may perform multiple rounds of iterative training on multiple codecs of the target text generation model based on the sample text sequence set. In each iteration training, the terminal can input the sample text sequence into the text generation model, and adjust the model parameters of the text generation model according to the decoding output of the last codec in the text generation model.

Specifically, the text generation model includes a plurality of codecs, the number of which is, for example, 12. And the terminal encodes the first sample text subsequence through a first codec in the text generation model to obtain the encoded output of the first codec. And decoding the second sample text subsequence by the first codec to obtain a decoded output of the first codec.

For convenience of description, the codecs other than the first codec among the codecs are referred to as the ith codec, where i is an integer greater than 1 and less than or equal to N, and N is the total number of the codecs in the text generation model.

After the first codec outputs the coded output and the decoded output, the terminal codes the coded output of the (i-1) th codec through the ith codec in the multiple codecs to obtain the coded output of the ith codec. Decoding the coded output and the decoded output of the (i-1) th codec by the ith codec to obtain a decoded output of the ith codec until a decoded output of the last codec is obtained.

In this embodiment, the text generation model includes a plurality of codecs connected in sequence, and the codecs other than the first codec can decode using the encoding output and the decoding output of the previous codec, so that when the text generation model decodes, the situation that the semantics of the sample information stream header set are omitted is reduced, the association between the output result of the text generation model and the sample information stream header set is increased, and the accuracy of the output of the text generation model is increased.

For example, please refer to fig. 5, which is a schematic structural diagram of a text generation model according to an embodiment of the present application, where the text generation model includes a first codec, a second codec, and a third codec. It should be noted that, in fig. 5, the text generation model includes 3 codecs as an example, but the number of codecs of the text generation model is only required to be greater than or equal to 2, and the application does not specifically limit this. The following illustrates the processing of the sample text sequence by the text generation model of fig. 5.

And the first codec decodes the second sample text subsequence in the sample text sequence to obtain the decoded output of the first codec. In the embodiments of the present application, for convenience of description, the encoded output of the first codec is referred to as a first encoded output, the decoded output of the first codec is referred to as a first decoded output, and so on.

The first coded output and the first decoded output are both input to a second codec via the first codec, and the first coded output may be encoded by the second codec to obtain a second coded output. The first encoded output and the first decoded output may be decoded by a second codec to obtain a second decoded output.

Inputting both the second encoded output and the second decoded output to a third codec via the second codec, the second encoded output being encoded by the third codec to obtain a second encoded output; the second codec output and the second decoded output may be decoded by a second codec to obtain a third decoded output.

After obtaining the third decoded output, the terminal may adjust the model parameters of the text generation model according to the third decoded output.

When the structure of each codec in the text generation model is different, the process of processing a sample text sequence by each codec is different. The following description will first describe the structure of each codec provided in the embodiments of the present application by way of example:

the first structure is as follows:

the text generation model includes a plurality of codecs, each codec including a codec layer.

In the embodiment of the application, one codec has one codec layer, and each codec layer has a decoding function and an encoding function. Each codec layer may be implemented by, for example, a Long Short-Term Memory (LSTM) or a codec layer in a transform.

The following text generation model under the first structure exemplifies the processing procedure of the sample text sequence:

1. the coding and decoding process of the first coding and decoding layer in the first coder and decoder comprises the following steps:

coding the first sample text subsequence through the first coding and decoding layer to obtain a first coding output of the first coding and decoding layer; and decoding the second sample text subsequence by the first coding and decoding layer to obtain a first decoding output of the first coding and decoding layer.

Further, the first encoded output and the first decoded output are passed to a second codec through the first codec layer.

2. The coding and decoding process of the ith coding and decoding layer in the ith coder and decoder is as follows:

the value of i in the ith codec may be a number greater than 1 or less than N, where N is the number of the multiple codecs. And coding the i-1 coded output of the i-1 codec through the i-th codec layer to obtain the i-th coded output of the i-th codec layer. And decoding the i-1 coded output and the i-1 decoded output of the i-1 codec through the i-th codec layer to obtain the i-th coded output of the i-th codec layer.

And so on until the decoding output of the Nth codec layer in the Nth codec is obtained.

It should be noted that the decoding process and the encoding process corresponding to each codec layer may be performed simultaneously.

For example, please refer to fig. 6, which is a structural diagram of a text generation model according to an embodiment of the present application, where fig. 6 includes a first codec, a second codec, and a third codec. The first codec, the second codec and the third codec respectively comprise a first codec layer, a second codec layer and a third codec layer. It should be noted that fig. 6 illustrates an example in which the text generation model includes three codecs, and the number of codecs in the text generation model is not limited in practice.

In the embodiment of the application, the decoding and encoding functions are simultaneously realized by adopting one encoding and decoding layer, the structure of the text generation model is simplified, and because the structure of the text generation model is simpler, the model parameters needing to be trained are fewer, and the training efficiency of the text generation model can be relatively improved.

As an example, each codec may employ an attention mechanism to achieve encoding and decoding. For example, the codec layer in each codec includes self attention.

In the embodiment of the application, each codec adopts an attention mechanism to realize the decoding and encoding functions at the same time, so that the structure of the text generation model is simpler, the training amount of the text generation model is reduced, and the training efficiency of the text generation model is relatively improved. And moreover, by adopting an attention mechanism, each coding and decoding layer can fully learn the semantics of the context during coding so as to improve the accuracy of the output of the text generation model.

As an embodiment, since each codec in the text generation model includes one codec layer, the structure of the text generation model may completely reuse the coding part in the BERT model, and correspondingly, the initial model parameters of the text generation model may adopt the pre-trained model parameters of the coding part in the BERT model. The initial model parameters refer to the model parameters that the text generation model had before the first training. In this embodiment, the initial model parameters of the text generation model may be model parameters pre-trained by the BERT model, instead of random model parameters, which may relatively improve the training efficiency of the text generation model.

Based on the above codec structure, the following describes a processing procedure of the sample text sequence by taking a codec procedure of the first codec and a codec procedure of the ith codec as examples.

the input of the first codec is a complete sample text sequence, and only in the process of subsequently processing the sample text sequence by the first codec, the first sample text subsequence in the sample text sequence is encoded, and the second sample text subsequence in the sample text sequence is decoded. Whether the first codec performs the decoding process or the encoding process, the codec may calculate a first weight matrix corresponding to the sample text sequence, perform the encoding process on the first sample text subsequence based on the first weight matrix, perform the decoding process on the second sample text subsequence, and respectively describe the process of the first codec performing the encoding and decoding on the sample text sequence in three parts as follows:

the first part, the terminal inputs the sample text sequence into a first codec, and the first codec calculates a first weight matrix corresponding to the sample text sequence.

Elements of each portion in the sample text sequence correspond to each input position in the first codec, that is, the elements of each portion in the sample text sequence are respectively used as an input for each input position in the first codec, for convenience of description, an element corresponding to each input position in the first codec in the sample text sequence is referred to as a first input, in a subsequent codec process, the terminal may calculate a first weight matrix of the sample text sequence with each input position as a minimum processing unit, and a process of calculating the first weight matrix is described below:

s1.1, with the first codec, a first transformation matrix in the first codec can be utilized

A second transformation matrix

The third transformation matrix

And respectively transforming each first input to obtain a query vector Q, a key vector K and a value vector V corresponding to each first input.

Wherein, the calculation formula for calculating the query vector, the key vector and the value vector is exemplified as follows:

wherein the content of the first and second substances,

representing the jth first input in the sample text sequence.

S1.2, the first coder-decoder determines the dot product of each first input and each first input in the sample text sequence to obtain a first weight vector corresponding to each first input.

The dot product of the first input is a product between the query vector Q corresponding to the first input and the key vector K of each first input, and taking a first input as an example, there is a corresponding dot product between the first input and each first input, and each dot product is actually an attention weight value for representing a correlation between the first input and the corresponding first input, which may be further understood as an attention degree given to the first input corresponding to the attention weight value when processing the input position corresponding to the first input. Each first input corresponds to a first weight vector corresponding to an attention weight value between the first input and the respective first input.

S1.3, obtaining a first weight matrix based on first weight vectors corresponding to each first input in the sample text sequence.

The terminal may combine the first weight vectors of each first input in the sample text sequence as rows of a matrix to obtain a first weight matrix. Since each first weight vector corresponds to one first input, accordingly, each line in the first weight matrix corresponds to each first input in the sample text sequence.

For example, the sample text sequence is [ x1, x2, x3, x4, x5, x6], and the query vector Q, the key vector K, the value vector V, and the first weight vector corresponding to each first input in the sample text sequence shown in table 3 below can be obtained through the steps of steps 1.1 to S1.2:

TABLE 3

The terminal may combine the respective first weight vectors to obtain a first weight matrix, which may be represented as:

and a second part, namely the first codec, performs coding and decoding on the first sample text subsequence based on the first weight matrix:

when each first input in the first sample text subsequence is encoded, the terminal may perform bidirectional encoding on the first sample text subsequence, and bidirectional encoding may be understood as that all information of the first sample text subsequence may be obtained in a decoding process, but actually, information related to the second sample text subsequence cannot be obtained, so in order to enable the first codec to perform bidirectional encoding on the first sample text subsequence, in this embodiment of the present application, the terminal may control an attention weight value corresponding to the second sample text subsequence in the first weight matrix, thereby implementing a bidirectional encoding process.

In particular, as discussed above, each row in the first weight matrix actually corresponds to a first input, and thus the terminal can determine from the first weight matrix a second set of rows corresponding to respective first inputs in the first sample sub-sequence. The terminal performs the following operation for each row in the second type row set: and setting the attention weight value between the first input corresponding to one line in the second type of line set and the second type of input to be zero. The second type of input is the respective first input in the second sample text subsequence. After zeroing each row in the second set of rows, a third weight matrix may be obtained. The first codec obtains a coded output of the first codec based on the third weight matrix.

And setting the attention weight value between the first input corresponding to one line in the second-class line set and the second-class input to zero at the terminal, which is equivalent to forcing the correlation between the first input corresponding to the line and the first input in the second sample text subsequence to zero, so that the correlation information of the second sample text subsequence is not introduced in the encoding process, thereby realizing the bidirectional encoding process of the first sample text subsequence and obtaining a third weight matrix.

For example, continuing with the first weight matrix shown in table 3 as an example, the terminal determines that the second type of row set is the first row to the fourth row in the first weight matrix corresponding to table 3, and the second type of input is the fourth first input and the fifth first input in the sample text sequence, so that the fifth attention weight and the sixth attention weight of each row in the second type of row set can be set to 0, so as to obtain a third weight matrix shown as follows:

the first codec may calculate an attention representation of each first input in the first sample sub-sequence according to the third weight matrix; the coded output of the first codec is obtained from the attention representation of each first input in the first sample sub-sequence.

In a specific implementation, the terminal may normalize, for each first input in the first sample subsequence, a weight vector corresponding to the first input in the third weight matrix, and multiply the weight vector by the value vector, so as to obtain an attention representation of the first input, and so on, obtain an attention representation of each first input in the first sample subsequence, where a calculation formula of the attention representation of each input is as follows:

wherein the content of the first and second substances,

indicating the attention representation corresponding to the ith first input.

Representing the corresponding value of the ith first input in the third weight matrix.

For the scaling factor, the size of the scaling factor can be set according to requirements.

Further, in order to calculate a more accurate attention representation for each first input, each codec may employ a multi-headed self-attention mechanism to calculate the attention representation for each first input.

Specifically, each codec includes a self-attention device, the self-attention device includes a plurality of head structures, each head structure respectively calculates an attention representation of each first input, and so on, a plurality of attention representations of each first input can be obtained, and the process of calculating an attention representation of a first input can refer to the foregoing discussion, and is not described herein again. Wherein, when the two head structures respectively calculate the attention expression of a first input, the first transformation matrix adopted by each head structure

A second transformation matrix

The third transformation matrix

May be different.

After obtaining the plurality of attention representations of a first input, the plurality of attention representations of a first input may be combined to obtain a head attention representation of the first input:

wherein the content of the first and second substances,

respectively showing the calculated attention expressions of the first head structure to the nth head structure,

is an attention representation of the first input.

After the terminal obtains the attention representation of each first input, the terminal may directly use the attention representation of each first input as the encoded output of the first codec.

The terminal may also residual-concatenate the attention representation of each first input with the first input to obtain the encoded output of the first codec, so that the encoded output of the first codec may retain as much information as possible with the first sample text sub-sequence. For example, the terminal performs an addition and normalization operation on the coded output of the first codec, thereby enabling the residual concatenation of the attention representation of each first input with that first input.

For example, continuing to use the sample text sequence shown in table 3 as an example, please refer to fig. 7, which is a schematic structure of a first codec, and fig. 7 illustrates an example of a structure of the first codec, and actually, a structure of each codec of a plurality of codecs may refer to the schematic structure in fig. 7. The terminal processes the first sample text subsequence by the self attention in the first codec, respectively calculates the attention corresponding to x1 to x4 as (z 1, z2, z3, z 4), (z 1, z2, z3, z 4) to be added and normalized through feedforward, and outputs (z 1+ x1, z2+ x2, z3+ x3, z4+ x 4) through addition and normalization, thus obtaining the coding output of the first codec.

And a third part, namely decoding the second sample text subsequence by the first codec based on the first weight matrix:

when decoding each first input in the second sample text subsequence, other first inputs after the currently decoded first input cannot be obtained in practice, in other words, the decoding process should be unidirectional in the text generation scenario, so in order to enable the first codec to decode the second sample text subsequence unidirectionally, in this embodiment of the present application, the terminal may control the attention weight value in the first weight matrix, thereby implementing the unidirectional decoding process.

Specifically, when the terminal decodes the second sample text subsequence, it may determine, from the first weight matrix, a first type row set corresponding to each first input in the second sample text sequence, and set an attention weight value between the first input corresponding to one row in the first type row set and the first type input to zero to obtain the second weight matrix. The terminal may obtain a decoded output of the first codec based on the second weight matrix. The first type of input is other first inputs in the second sample text subsequence, which are positioned after the first input corresponding to the line in the first type of line set.

Since each row in the first weight matrix corresponds to one first input in the sample text sequence, the terminal may determine the rows in the first weight matrix corresponding to the second sample text subsequence, i.e., the first set of rows. The terminal determines a first input corresponding to one line in the first-class line set, and the attention weight value between the first input and the first-class input is set to be zero, which is equivalent to forcing the correlation between the first input corresponding to the line and the first input behind the first input to be zero, thereby realizing the unidirectional decoding process.

For example, continuing with the first weight matrix shown in table 3 as an example, the terminal determines that the first-type row set is the second-to-last row and the first-to-last row in the first weight matrix, and the second-to-last row corresponds to the fifth first input in the sample text sequence, and when the fifth first input is decoded, the first input after the fifth first input is actually unclear, i.e., is independent of the sixth first input, so that (a) can be given

) Set to 0, thereby resulting in a second weight matrix as shown below:

after obtaining the second weight matrix, the terminal respectively calculates attention representation of each first input in the second sample text subsequence according to the second weight matrix through the first codec, and obtains decoding output of the first codec according to the attention representation of each first input in the second sample text subsequence.

In particular implementations, the terminal may use the attention of each first input as the encoded output of the first codec. The terminal may also residual concatenate the first inputs corresponding to the second sample text sub-sequence with the attention representation of each first input to obtain a decoded output of the first codec.

For example, continuing with the sample text sequence shown in table 3 as an example, the attention expressions (z 5 and z 6) corresponding to x5 and x6 are calculated respectively, and the terminal may use (z 5 and z 6) as the encoding output of the first codec, or (z 5+ x5 and z6+ x 6) as the encoding output of the first codec.

2. The coding and decoding process of the first coding and decoding layer in the ith coder and decoder comprises the following steps:

the value of i in the ith codec can refer to the content discussed above, and is not described herein again. The following describes the codec process of other codecs except the first codec among a plurality of codecs, taking the ith codec as the second codec as an example.

The first codec may pass the first encoded output and the second decoded output to the second codec, i.e., the first encoded output and the first decoded output collectively serve as inputs to the second codec.

Determining, by the second codec, a fourth weight matrix corresponding to the first encoded output and the first decoded output:

correspondingly, each of the first encoded output and the first decoded output actually corresponds to an input position of the second codec, and for convenience of description, an element of the first encoded output and the first decoded output corresponding to the input position of the second codec is referred to as a second input. The second codec may determine a fourth weight matrix corresponding to the first encoded output and the first decoded output. Each row in the fourth weight matrix corresponds to a respective second input in the encoding output and decoding output of the second codec, respectively, and each row in the fourth weight matrix includes the corresponding second input for the row and the attention weight value between the respective second inputs. For the content of calculating the attention weight value between one second input and each second input, the content of calculating the attention weight value between the first input and each first input may refer to the content of calculating the attention weight value between the first input and each first input, which is not described herein again.

Decoding, by the second codec, the first encoded output based on the fourth weight matrix:

and setting the attention weight value between the second input corresponding to one row in the fourth type row set and the fourth type input to be zero in the fourth weight matrix corresponding to each second input in the first coding output so as to obtain a sixth weight matrix. The first codec obtains a coded output of the first codec based on the sixth weight matrix. Wherein the fourth type of input is a respective first input in the first decoded output.

Specifically, each row in the fourth weight matrix corresponds to one first input of the first encoded output and the first decoded output, so the terminal may determine each row in the fourth weight matrix corresponding to the first encoded output, and for convenience of description, the determined rows are referred to as a fourth-type row set. The terminal determines a second input corresponding to one row in the fourth-class row set, and the attention weight value between the second input and the fourth-class input is set to be zero, in other words, it is equivalent to force the correlation between the second input corresponding to the row and the second input located in the first decoding output to be zero, so as to implement the encoding process of the first encoding output, and after the median of the second-class row set is zero, a sixth weight matrix is obtained.

Decoding, by the second codec, the first decoded output and the first encoded output based on the fourth weight matrix:

and setting the attention weight value between the second input corresponding to one row in the third type row set and the third type input to be zero in each row in the third type row set corresponding to each second input in the first decoding output in the fourth weight matrix so as to obtain a fifth weight matrix. Obtaining a decoded output of the second codec based on the fifth weight matrix. And the third kind of input is other second inputs which are positioned after the second input corresponding to the row in the second decoding output.

Each row in the fourth weight matrix corresponds to a second input, so the terminal may determine each row in the fourth weight matrix corresponding to the second encoded output, and for convenience of description, the determined rows are referred to as a third-type row set. The terminal determines a second input corresponding to one line in the third-class line set, and the attention weight value between the second input and the third-class input is set to be zero, which is equivalent to forcing the correlation between the second input corresponding to the line and the second input located after the second input to be zero, thereby realizing the unidirectional decoding process.

Similarly, the terminal may use the attention representation of each second input as the encoded output of the second codec. The terminal may also residual concatenate the attention representation of each second input with the second input corresponding to the decoded output of the first codec to obtain the decoded output of the second codec.

For example, referring to fig. 8, a schematic diagram of a text generation model processing a sample text sequence is shown, where the text generation model includes a first codec, a second codec and a third codec, each codec can perform encoding to obtain an encoded output 810 of each codec, and each codec can perform decoding to obtain a decoded output 820 of each codec, where:

a first codec encoding the first sample text subsequence to obtain a first encoded output, and decoding the second sample text subsequence to obtain a first decoded output; the second codec bi-directionally encodes the first encoded output against the first encoded output to obtain a second encoded output, and the second codec uni-directionally decodes the first decoded output and the first encoded output to obtain a second decoded output, and so on.

And the second method comprises the following steps:

each codec includes a decoding layer and an encoding layer.

Unlike the structure of the first codec discussed above, the codec discussed above is equivalent to a codec layer for simultaneously implementing encoding and decoding functions, and each codec in the embodiments of the present application includes a decoding layer for implementing the decoding function and an encoding layer for implementing the encoding function.

For convenience of description, in the embodiments of the present application, a decoding layer in a first codec is referred to as a first decoding layer, an encoding layer in the first codec is referred to as a first encoding layer, a decoding layer in an ith codec is referred to as an ith decoding layer, and an encoding layer in the first codec is referred to as an ith encoder.

For example, please refer to fig. 9, which is a schematic structural diagram of a text generation model according to an embodiment of the present disclosure, where the text generation model includes a first codec, a second codec and a third codec, the first codec includes a first coding layer and a first decoding layer, the second codec includes a second coding layer and a second decoding layer, and the third codec includes a third coding layer and a third decoding layer.

As an example, the encoding layer may include self-attention and the encoding layer may include self-attention.

Specifically, the content of the coded representation obtained by self-attention of each coding layer can refer to the content discussed above, and is not described herein again. Each decoded layer may use self-attention to obtain a decoded representation as follows:

1. for a first decoding layer in a first codec:

the self-attention representation corresponding to the second sample text subsequence can be obtained through self-attention by the first decoding layer, and the decoding representation of the first decoding layer is obtained according to the self-attention representation corresponding to the second sample text subsequence. The manner of decoding the second sample text subsequence may refer to the content discussed in the foregoing, and is not described herein again.

2. For the ith decoding layer in the ith codec:

and outputting a decoded output of the ith decoding layer through the decoding representation and the coding representation of the ith decoding layer according to the (i-1) th coding and decoding layer.

For example, the ith decoding layer converts the coded representation of the (i-1) th decoding layer by using a fifth transformation matrix and a sixth transformation matrix respectively to obtain a key matrix and a value matrix of the ith decoding layer, converts the coded representation of the (i-1) th decoding layer according to the fourth transformation matrix to obtain a query matrix of the (i-1) th decoding layer, and further calculates the attention representation of each third input in the ith decoding layer according to the query matrix and the key matrix. When calculating the attention expression of one third input, in order to implement unidirectional decoding, the attention weight value between one third input and other third inputs subsequent to the third input may be blocked, thereby implementing unidirectional decoding. The formula for calculating the attention expression of each third input may refer to the foregoing formula (4), and is not described herein again. After obtaining the attention representations of the respective third inputs, the decoded output of the i-th decoding layer is thus obtained.

In the embodiment of the application, each codec comprises a decoding layer and an encoding layer, and decoding can be performed by using the decoding layer, and encoding is performed by using the encoding layer, so that a weight matrix corresponding to the encoding layer does not need to be shielded in the encoding process, and the processing process is relatively simplified.

To further simplify the training process of the text generation model, as an embodiment, the model parameters of the coding layer and the decoding layer in each codec may be shared, for example, the self-attention in the first coding layer is the same as the self-attention model parameters in the first decoding layer, and the coding-decoding attention in the ith coding layer is the same as the self-attention model parameters of the ith coding layer.

Further, the initial model parameters of the coding layer and the decoding layer in each codec may employ pre-trained model parameters of the coding part in the BERT model.

In any of the above manners, the decoded output of the last codec in the text generation model is obtained, each output element in the decoded output of each codec is equivalent to the representation of each masked word in the sample information stream mask description text, the terminal may obtain the prediction probability of each masked word based on the decoded output of the last codec, further calculate the loss function according to the prediction probability of each word, and adjust the model parameters of the text generation model according to the loss function.

The terminal may normalize each output element separately to obtain a predicted probability for each masked word. This approach allows for fast and simple prediction of the prediction probability of each masked word.

In order to obtain a more accurate probability of masking a word, in the embodiment of the present application, the terminal introduces a pointer network, obtains a copy probability of the masked word from the encoding output and the decoding output of the last codec through the pointer network, the copy probability being used for representing a degree of correlation between the masked word and the sample text sequence, obtains a generation probability of the masked word based on the decoding output of the last codec, the generation probability being used for representing a distribution probability of the masked word in a global vocabulary including words generated by a text generation model in a previous training process. And then obtaining the prediction probability of the hidden word according to the generation probability and the reproduction probability.

For example, referring to fig. 10, in order to schematically illustrate a principle of obtaining the predicted probability of a word according to an embodiment of the present application, the terminal calculates the copy probability and the generation probability according to the encoded output and the decoded output of the last codec, respectively, and calculates the predicted probability of the masked word according to the copy probability and the generation probability.

An example of a formula for calculating the probability of generation for each masked word is as follows:

wherein the content of the first and second substances,

representing the probability of generation of the masked word, W and b both belong to the model parameters to be trained,

representing the ith output of the decoded output of the codec.

wherein the content of the first and second substances,

representing the probability of copying of the masked word,

、

and b both belong to the model parameters to be trained,

refers to the output at a predetermined position in the coded output of the last codec, e.g., the first position in the coded output of the last codec.

The terminal obtains a prediction probability of the masked word according to the copy probability and the generation probability of each masked word, and the calculation formula of the prediction probability is as follows:

wherein the content of the first and second substances,

representing the predicted probability of the ith masked word,

belonging to the model parameters to be trained.

The formula for computing the penalty function for the masked word is exemplary as follows:

based on the loss function of each masked word, a loss function of the iteration training of the current round is obtained, and the calculation formula is as follows:

after obtaining the loss function of the iterative training of the current round, the terminal may adjust the model parameters of the text generation model based on the loss function until the text generation model satisfies the model convergence condition, so as to obtain the trained target text generation model. The model convergence condition is, for example, that the iterative training times satisfy the preset times, or that the loss function of the text generation model reaches the preset value, and the like, and this is not limited in the present application.

For example, the terminal may train the text generation model by using a stochastic gradient descent method, and perform optimization training by using an Adam optimizer.

In order to facilitate understanding of the text generation model training method in the embodiment of the present application, a text generation model training method executed by a terminal is taken as an example, and a structure of each codec is taken as a first structure discussed above, so as to exemplify the text generation model training method related to the embodiment of the present application:

referring to fig. 11, a flowchart of a training method for generating a model for a text is shown, the flowchart includes:

and S111, acquiring a sample information stream title set and a sample description text of the news event.

The method for acquiring the sample information flow title set and the sample information flow description text of the sample event by the terminal comprises the following steps: the [ CLS ] country purchase trades in volume, 55 drugs are reduced by [ SEP ] 55 drug reductions [ SEP ] ", where" CLS "represents the beginning of the sample stream header set, the first" SEP "represents the sample stream header set and the split marker of the sample stream description text, and the second" SEP "represents the end of the sample stream description text. The terminal MASKs the sample information flow title set and partial texts in the sample information flow description texts to obtain that the purchase price of [ CLS ] country is changed by quantity, and 55 medicines are reduced in price [ SEP ] 55 MASK products to [ MASK ] [ SEP ] ".

And S112, converting the sample information stream title set and the sample information stream mask description text into a sequence to obtain a sample text sequence.

The terminal converts the [ CLS ] country purchase price in terms of volume and 55 medicines price reduction [ SEP ] 55 [ MASK ] products price reduction [ MASK ] [ SEP ] "into a sequence to obtain a sample text sequence, wherein the sample text sequence is [ A1A 2A 3X 1B 1B 2], A1, A2, A3 and X1 are the serialization results corresponding to the sample information flow header set, and B1 and B2 are the serialization results corresponding to the sample information flow MASK description texts.

And S113, training the text generation model based on each sample text sequence until the text generation model meets a model convergence condition, thereby obtaining a trained target text generation model.

The terminal inputs the sample text sequence into the text generation model, where a1, a2, A3, X1, B1, and B2 in the foregoing correspond to the input positions of the first codec in the text generation model, respectively.

A1, A2, A3 and X1 are coded by a first codec in the text generation model, and the coded output of the first codec is obtained as follows: z1, Z2, Z3 and Z4, B1 and B2 are encoded by a first codec in the text generation model, and the encoding output of the first codec is obtained as: z5 and Z6.

Next, Z1, Z2, Z3, and Z4 are encoded by the second codec in the text generation model, and the encoded output of the second codec is obtained as: p1, P2, P3 and P4, the first codec in the text generation model encodes Z5 and Z6, and the encoding output of the first codec is obtained as: p5 and P6, and so on until the decoded output of the last codec is obtained.

And calculating a loss function of the training of the time by decoding output of the last codec of the terminal, and adjusting model parameters of the text generation model based on the loss function until the text generation model meets a model convergence condition, thereby obtaining the trained target text generation model.

The above is an example of the training text generation model, and after the terminal obtains the trained target text generation model, or obtains the trained generation model from the first server or the second server, the target information flow description text corresponding to the target event may be generated based on the trained target generation model.

Referring now to fig. 12, a flowchart of a method for generating an information flow description text based on a trained target text generation model is shown, the flowchart comprising:

s121, acquiring a target text sequence of the target event, wherein the target text sequence comprises a first target text subsequence of the information flow title set and a second target text subsequence of the character mask mark.

The terminal obtains a target event corresponding to an information flow description text to be generated, obtains each information flow title of the target event from network resources, serializes the information flow titles to obtain a first target text subsequence, represents a word to be generated by a character covering mark, serializes the character covering mark to obtain a second target text subsequence.

And S122, executing multiple rounds of iterative operations based on the target text sequence through multiple codecs in the trained target text generation model.

Specifically, in a first iteration, a first codec of the multiple codecs encodes a first target text subsequence, and obtains an encoded output of the first codec; the first coder-decoder decodes the second target text subsequence to obtain the decoding output of the first coder-decoder; the decoded output of each codec other than the first codec is obtained based on the encoded output and the decoded output of the previous codec; and the terminal obtains a prediction result corresponding to the iteration operation in the current round based on the decoding output of the last codec in the plurality of codecs.

In each iteration operation except the first iteration operation in a plurality of iterations, a first codec encodes a first target text subsequence to obtain a coded output of the first codec; the first codec decodes the prediction result of the historical round of iterative operation to obtain the decoded output of the first codec, wherein the historical round of iterative operation refers to iterative operation performed before the current round of iterative operation, for example, the current round is a third round, and then the historical round refers to the first round and the second round; the decoded output of each codec other than the first codec is obtained based on the encoded output and the decoded output of the previous codec; and the terminal obtains a prediction result corresponding to the iteration operation in the current round based on the decoding output of the last codec in the plurality of codecs.

And S123, obtaining an information flow description text based on the prediction result corresponding to the multi-round iteration operation.

In this embodiment of the application, the first server inputs the target text sequence into a target text generation model, and obtains a prediction result of the text generation model in the first iteration, where the target text generation model may be obtained by the terminal based on the above-mentioned text generation model training method, or may be obtained directly from the first server.

And inputting the first target text subsequence and the prediction result into the target text generation model to obtain the prediction result of the target text generation model in the second iteration operation. And the first server inputs the first target text subsequence, the prediction result in the first iteration operation and the prediction result in the second iteration operation into the target text generation model, obtains the prediction result of the target text generation model in the third iteration operation, and so on until the target text generation model predicts the information flow description text.

The encoding process and the decoding process performed by each codec in the target text generation model may refer to the processing content of each codec in the text generation model discussed above, and are not described herein again.

In order to ensure that the correlation between the prediction results of each iteration is larger, in the embodiment of the application, the first server may combine the first target text subsequence and the multiple prediction results corresponding to the historical iteration respectively to obtain multiple combinations, input the multiple combinations into the target text generation model respectively to obtain the multiple prediction results of the iteration, and screen out the possible prediction results from the multiple prediction results. To more clearly illustrate the process of the multiple iteration operations, the following is exemplified:

specifically, in the first iteration, the terminal inputs the Xmask into the target text generation model, and obtains three words, for example, a, b, and c, of which the prediction probability of the target text generation model is the largest. Wherein, the "mask" is used to indicate that the character covers the mark in the process of the iteration operation.

The terminal can combine X with a, b, and c, respectively, to obtain Xa, Xb, and Xc. In the second iteration, the terminal may input the first target text subsequence and each prediction result of the first iteration into the text generation model, for example, input Xamask, xmask, and xcmak into the text generation model, respectively, so as to obtain three words, d, e, and f, respectively, with the highest prediction probability in the second iteration.

The terminal may combine the prediction result of the first iteration operation with the prediction result of the second iteration operation to obtain a plurality of combinations, for example: ad. bd, cd, ae, be, ce, af, bf, cf. The terminal may calculate the prediction probability of each combination, and select a plurality of combinations with prediction probabilities satisfying the prediction probability condition as input of the next iteration, for example, a plurality of combinations with prediction probabilities satisfying the prediction probability condition, ad, ae, and bd.

In the third iteration, the terminal may input the text generation model by combining the first target text subsequence with the prediction results of the first iteration and the second iteration, so as to obtain the prediction result corresponding to the third iteration, for example, inputting Xadmask, Xaemask, and Xbdmask into the text generation model. And in the same way, obtaining the prediction result of each iteration operation.

When the terminal obtains the information flow description text with a preset length, for example, until the end character, the terminal obtains the information flow description text based on the prediction result of each iteration operation in a plurality of iterations.

By analogy, the terminal can generate the information flow description text corresponding to each target event according to the process.

The structure of the target text generation model in the embodiment of the present application may refer to the result of the text generation model discussed above, and details are not repeated here.

In the embodiment of the application, the target text generation model can generate the information flow description text word by word, and when generating the word to be predicted each time, the word to be predicted is generated based on the preamble of the word to be predicted and the target information flow header set, so that the accuracy of the generated word to be predicted is improved, and when generating the word to be predicted each time, the codec can decode by using the decoding output and the encoding output of the last codec, so that the association between the output and the input of the target text generation model is increased, and the accuracy of the generated information flow description text is improved, thereby being beneficial to improving the accuracy of the search result recommended based on the information flow description text.

After the terminal generates the information flow description texts corresponding to the target events, the terminal may screen the information flow description texts matched with the search keyword from the generated information flow description texts, obtain an information flow title set corresponding to the information flow description texts, and pull corresponding information flow contents based on one or more information flow titles in the information flow title set.

To more clearly illustrate the information flow searching method, the following describes an interaction process between the terminal and the first server by taking the application scenario shown in fig. 1B or fig. 1C as an example:

referring to fig. 13, an interaction diagram between a terminal and a first server is shown, where the interaction diagram includes:

s131, the first server generates an information flow description text.

The first server may generate an information flow description text corresponding to each target event, and the content of the information flow description text generated by the first server may refer to the content discussed above, which is not described herein again.

S132, the first server obtains the search keyword from the terminal.

The order of steps S131 and S132 may be arbitrary, and the present application does not limit this.

S133, the first server determines an information flow description text matching the search keyword.

The first server determines an information flow description text matching the search keyword from a plurality of information flow description texts.

S134, the first server acquires an information flow title set corresponding to the information flow description text.

After matching the information flow description text corresponding to the keyword, the first server may correspondingly obtain an information flow title set of the information flow description text.

S135, the first server obtains the information flow content based on the information flow title in the information flow title set.

The first server may obtain the corresponding information stream content based on one or more information stream headers of the set of information stream headers.

S136, the first server sends the information flow content to the terminal.

After obtaining the corresponding information stream content, the first server may send the information stream content to the terminal for presentation by the terminal.

In the embodiment shown in fig. 13, the terminal and the first server cooperatively execute the information stream search method, so that the processing amount of the terminal can be relatively reduced, the data amount stored by the terminal can be relatively reduced, and the first server matches the information stream text based on the search keyword, so that the information stream text corresponding to the search keyword can be quickly determined, and a corresponding search result can be quickly fed back to the terminal.

Based on the same inventive concept, an embodiment of the present application provides an information flow searching apparatus, which can implement the functions of the terminal or the first server discussed above, referring to fig. 14, and the apparatus includes:

a keyword acquisition module 1401 for acquiring a search keyword;

a matching module 1402, configured to determine an information flow description text matching the search keyword; the information flow description text is used for describing the information flow content corresponding to the information flow title in the corresponding information flow title set;

a title obtaining module 1403, configured to obtain an information stream title set corresponding to the information stream description text;

a pull module 1404, configured to pull the content of the information stream based on the information stream header in the information stream header set;

and the display module 1405 is used for displaying the pulled information stream content as a search result matched with the search keyword.

In a possible embodiment, the apparatus further comprises a text generation module 1406, wherein:

a text generating module 1406, configured to perform the following operations for each target event: acquiring each information flow title corresponding to one target event in each target event; determining an information flow description text corresponding to a target event based on each information flow title;

a matching module 1402, configured to determine, from the information flow description texts corresponding to the target events, an information flow description text matching the search keyword.

In a possible embodiment, the text generation module 1406 is specifically configured to:

acquiring a target text sequence of a target event, wherein the target text sequence comprises a first target text subsequence of an information flow title set of the target event and a second target text subsequence of a character covering mark corresponding to a word to be predicted;

performing, by a plurality of codecs in the trained target text generation model, a plurality of iterations based on the target text sequence, wherein:

in a first iteration, an encoded output of a first codec of the plurality of codecs is obtained based on the first target text subsequence, and a decoded output of the first codec is obtained based on the second target text subsequence;

in each round of iteration operation except the first round of iteration operation in the plurality of rounds of iteration operation, the decoding output of the first codec is obtained based on the prediction result of the historical round of iteration operation, the historical round of iteration operation refers to the iteration operation performed before the current round of iteration operation, and the prediction result corresponding to each round of iteration operation is obtained based on the decoding output of the last codec in the plurality of codecs;

in each iteration operation of the multiple iterations, the decoded output of each codec except the first codec is obtained based on the encoded output and the decoded output of the previous codec;

and obtaining an information flow description text corresponding to a target event based on the prediction result corresponding to the multi-round iterative operation.

In one possible embodiment, the prediction results of the historical round iteration operation comprise a plurality of prediction results of which the prediction probabilities meet a preset probability condition; the predicted result of each iteration except the first iteration in the multiple iterations is obtained as follows:

respectively inputting each combination in the multiple combinations into a target text generation model to obtain a prediction result corresponding to each combination in the current iteration operation, wherein the multiple combinations are results of the combination of the first target text subsequence and each prediction result in the multiple prediction results of the historical iteration operation;

and in the prediction results corresponding to a plurality of combinations in the iteration operation of the current round, taking the prediction result meeting the prediction probability condition as the prediction result of the iteration operation of the current round.

In a possible embodiment, the information flow description text corresponding to a target event is obtained by a trained text generation model, and the apparatus further includes a model training module 1407, where the model training module 1407 is specifically configured to:

performing multiple rounds of iterative training on a plurality of codecs in a text generation model based on a text sequence sample set until a model convergence condition is met to obtain a trained target text generation model, wherein each round of iterative training comprises the following processes:

inputting a sample text sequence selected based on the sample text sequence set into a plurality of codecs to obtain a decoded output of a last codec, wherein an encoded output of a first codec in the plurality of codecs is obtained based on a first sample text subsequence in the sample text sequence, a decoded output of the first codec is obtained based on a second sample text subsequence in the sample text sequence, and a decoded output of each codec except the first codec is obtained based on an encoded output and a decoded output of a previous codec;

model parameters of a plurality of codecs are adjusted based on a decoded output of a last codec.

In one possible embodiment, the model training module 1407 is specifically configured to:

for a plurality of codecs, the following operations are performed, respectively:

if one of the plurality of codecs is a first codec, encoding a first sample text subsequence in the sample text sequence based on the one codec to obtain an encoded output of the first codec, and decoding a second sample text subsequence in the sample text sequence based on the first codec to obtain a decoded output of the first codec;

if one of the plurality of codecs is the ith codec, encoding the encoding output of the (i-1) th codec based on the ith codec to obtain the encoding output of the ith codec, and decoding the encoding output and the decoding output of the (i-1) th codec based on the ith codec to obtain the decoding output of the ith codec, wherein i is an integer greater than 1 and not less than N, and N is the total number of the plurality of codecs;

the decoded output of the nth codec is obtained.

obtaining a first weight matrix based on a first codec, wherein each line in the first weight matrix corresponds to each first input in the sample text sequence one to one, and each line in the first weight matrix comprises one line of corresponding first inputs and an attention weight value between each first input in the sample text sequence;

based on the first codec, for each line in the first class of line set corresponding to the respective first input in the second sample text subsequence in the sample text sequence in the first weight matrix, respectively: setting an attention weight value between a first input corresponding to one line in the first-class line set and the first-class input to zero to obtain a second weight matrix, wherein the first-class input is other inputs positioned after the first input corresponding to one line in a second sample text subsequence of the sample text sequence;

based on a codec, for each row in the second class row set corresponding to the respective first input in the first sample text sequence in the first weight matrix, respectively: setting an attention weight value between first input and second input corresponding to one line in a second type line set to be zero to obtain a third weight matrix, wherein the second type input is each first input in a second sample text sequence of the sample text sequence;

based on one codec, the coded output of the first codec is obtained according to the third weight matrix.

obtaining a fourth weight matrix based on the ith codec, wherein each row in the fourth weight matrix corresponds to each second input in the coding output and the decoding output of the (i-1) th codec respectively, and each row in the fourth weight matrix comprises one row of corresponding second inputs and attention weight values between each second input;

for the third class row sets corresponding to the second inputs in the decoding output of the i-1 th codec in the fourth weight matrix, respectively performing the following operations: setting an attention weight value between a second input corresponding to one row in a third-class row set and a third-class input to zero to obtain a fifth weight matrix, wherein the third-class input is other second inputs positioned after the second input corresponding to one row in the decoding output of the i-1 th codec;

based on the ith codec, respectively performing the following operations for each row in the fourth class row set corresponding to each input in the coded output of the (i-1) th codec in the fourth weight matrix: setting the attention weight value between the second input corresponding to each row in the fourth-class row set and the fourth-class input to zero to obtain a sixth weight matrix, wherein the fourth-class input is each second input in the coding output of the i-1 th codec;

and obtaining the coded output of the ith codec according to the sixth weight matrix based on the ith codec.

the coded output and the decoded output of any codec are both output through a coding and decoding layer in any codec.

In one possible embodiment, each of the plurality of codecs comprises one coding layer and one decoding layer shared by model parameters, wherein:

the coded output and the decoded output of any codec are output through one coding layer and one decoding layer in any codec, respectively.

for each of the decoded outputs of the last codec, the following procedure is performed:

obtaining a generation probability that the sample information stream masks a masked word in the description text based on one of the outputs, wherein the generation probability represents a distribution probability of the masked word in the global word list;

obtaining a copy probability of the sample information stream obscuring a masked word in the description text based on an output and the encoded output of the last codec, wherein the copy probability is used to represent a correlation between the masked word and the sample text sequence;

carrying out weighted summation on the generation probability and the reproduction probability to obtain the prediction probability of the hidden word;

model parameters for a plurality of codecs are adjusted based on the predicted probabilities of the sample information stream obscuring individual words of the description text that are obscured.

As an example, model training module 1407 and text generation module 1406 in fig. 14 are optional components.

It should be noted that, the processing procedure of the text generation model on the sample text sequence may refer to the content discussed above, and is not described here again.

It should be noted that the apparatus shown in fig. 14 may also implement any of the information flow searching methods discussed above, and details thereof are not repeated here.

Based on the same inventive concept, the embodiment of the present application provides a computer device, which can implement the functions of the foregoing terminal or first server, please refer to fig. 15, and the computer device includes a processor 1501 and a memory 1502.

The processor 1501 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The specific connection medium between the memory 1502 and the processor 1501 is not limited in the embodiments of the present application. In the embodiment of the present application, the memory 1502 and the processor 1501 are connected by the bus 1503 in fig. 15, the bus 1503 is shown by a thick line in fig. 15, and the connection manner between other components is merely illustrative and not limited. The bus 1503 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 15, but this is not intended to represent only one bus or type of bus.

The memory 1502 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 1502 may also be a non-volatile memory (non-volatile memory) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or the memory 1502 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 1502 may be a combination of the above.

A processor 1501 is used to perform the information flow search method as previously discussed when invoking the computer program stored in the memory 1502.

Based on the same inventive concept, embodiments of the present application provide a computer storage medium storing computer instructions that, when executed on a computer, cause the computer to perform any of the information flow searching methods discussed above.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Based on the same inventive concept, the embodiments of the present application provide a computer program product, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the information flow searching method described above.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, and an optical disk.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An information stream searching method, comprising:

acquiring a search keyword;

determining an information flow description text matched with the search keyword; the information flow description text is used for describing information flow contents corresponding to information flow titles in a corresponding information flow title set, wherein the information flow description text is generated based on a trained text generation model, the text generation model comprises a plurality of codecs, the text generation model is obtained by performing multiple rounds of iterative training based on a sample text sequence set, each sample text sequence comprises a first sample text subsequence of a sample information flow title set of a sample event and a second sample text subsequence of a sample information flow mask description text of the sample event, and each round of iterative training comprises: inputting a sample text sequence selected from the sample text sequence set into the plurality of codecs, adjusting model parameters of the plurality of codecs based on a decoded output of a last codec in the plurality of codecs, wherein a decoded output of a first codec in the plurality of codecs is obtained based on a second sample text subsequence in the sample text sequence, a decoded output of each codec except the first codec is obtained based on an encoded output and a decoded output of a previous codec, and a decoded output of the first codec is obtained based on the first codec by: obtaining a first weight matrix, wherein each line in the first weight matrix corresponds to each first input in the sample text sequence one to one, and each line in the first weight matrix comprises the corresponding first input and an attention weight value between the corresponding first input in the sample text sequence; for each line in the first class of line set corresponding to each first input in a second sample text subsequence in the sample text sequence in the first weight matrix, respectively performing the following operations: setting an attention weight value between a first input corresponding to one line in the first-class line set and a first-class input to zero to obtain a second weight matrix, wherein the first-class input is other inputs positioned after the first input corresponding to the one line in a second sample text subsequence of the sample text sequence; obtaining a decoding output of the first codec according to the second weight matrix;

2. The method of claim 1, wherein prior to determining the information flow description text that matches the search keyword, the method further comprises:

for each target event, the following operations are respectively executed:

acquiring each information flow title corresponding to one target event in each target event;

determining an information flow description text corresponding to the target event based on the information flow titles;

the determining of the information flow description text matching with the search keyword includes:

and determining the information flow description texts matched with the search keywords from the information flow description texts corresponding to the target events.

3. The method of claim 2, wherein said determining a flow description text corresponding to said one target event based on said respective flow titles comprises:

4. The method of claim 3, wherein the predicted outcome of the historical round of iterative operations comprises a plurality of predicted outcomes with predicted probabilities satisfying a preset probability condition; the predicted result of each iteration operation except the first iteration operation in the multiple iteration operations is obtained by the following steps:

5. The method of claim 2, wherein the information flow description text corresponding to the target event is obtained through a trained text generation model, and the trained text generation model is obtained through training as follows:

6. The method of claim 5, wherein inputting a sample text sequence selected based on the set of sample text sequences into the plurality of codecs to obtain a decoded output of a last codec, comprises:

performing the following operations for the plurality of codecs, respectively:

the decoded output of the nth codec is obtained.

7. The method of claim 6, wherein said encoding a first sample subsequence of the sample text sequence based on the one codec to obtain an encoded output of the first codec, comprises:

8. The method of claim 6, wherein said decoding the encoded output and decoded output of the i-1 th codec based on the i-th codec to obtain the decoded output of the i-th codec comprises:

9. The method of claim 8, wherein said encoding the encoded output of the i-1 th codec based on the i-th codec to obtain the encoded output of the i-th codec comprises:

10. The method of any of claims 5 to 9, wherein each codec of the plurality of codecs comprises a codec layer, wherein:

11. The method of claim 6, wherein each codec of the plurality of codecs comprises an encoding layer and a decoding layer that are shared by model parameters, wherein:

12. The method of any of claims 5-9 or 11, wherein said adjusting model parameters of said plurality of codecs based on a decoded output of said last codec comprises:

13. An information flow search apparatus, comprising:

the keyword acquisition module is used for acquiring search keywords;

the matching module is used for determining an information flow description text matched with the search keyword; the information flow description text is used for describing information flow contents corresponding to information flow titles in a corresponding information flow title set, wherein the information flow description text is generated based on a trained text generation model, the text generation model comprises a plurality of codecs, the text generation model is obtained by performing multiple rounds of iterative training based on a sample text sequence set, each sample text sequence comprises a first sample text subsequence of a sample information flow title set of a sample event and a second sample text subsequence of a sample information flow mask description text of the sample event, and each round of iterative training comprises: inputting a sample text sequence selected from the sample text sequence set into the plurality of codecs, adjusting model parameters of the plurality of codecs based on a decoded output of a last codec in the plurality of codecs, wherein a decoded output of a first codec in the plurality of codecs is obtained based on a second sample text subsequence in the sample text sequence, a decoded output of each codec except the first codec is obtained based on an encoded output and a decoded output of a previous codec, and a decoded output of the first codec is obtained based on the first codec by: obtaining a first weight matrix, wherein each line in the first weight matrix corresponds to each first input in the sample text sequence one to one, and each line in the first weight matrix comprises the corresponding first input and an attention weight value between the corresponding first input in the sample text sequence; for each line in the first class of line set corresponding to each first input in a second sample text subsequence in the sample text sequence in the first weight matrix, respectively performing the following operations: setting an attention weight value between a first input corresponding to one line in the first-class line set and a first-class input to zero to obtain a second weight matrix, wherein the first-class input is other inputs positioned after the first input corresponding to the one line in a second sample text subsequence of the sample text sequence; obtaining a decoding output of the first codec according to the second weight matrix;

14. A computer device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of claims 1-12 by executing the instructions stored by the memory.

15. A computer device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor;