CN114218348A

CN114218348A - Method, device, equipment and medium for acquiring live broadcast segments based on question and answer text

Info

Publication number: CN114218348A
Application number: CN202111518552.8A
Authority: CN
Inventors: 彭锐
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-03-22

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a question and answer text-based live broadcast segment acquisition method, device, equipment and medium, wherein the method comprises the following steps: receiving user comment data containing user identification, live broadcast identification and comment content; acquiring a historical comment clustering model corresponding to the live broadcast identification, and filtering through an invalid content detection model to obtain an effective text; clustering the effective texts through a historical comment clustering model to obtain keyword clustering results of the effective texts; acquiring question and answer texts matched with the keyword clustering result from a question and answer library, and constructing a semantic index vector space based on all the matched question and answer texts; acquiring a question and answer text adjacent to the effective text from the semantic index vector space by using a KNN algorithm; and acquiring the live broadcast segment by using a text matching algorithm. Therefore, the invention realizes the direct input of the comments, automatically obtains the corresponding live broadcast segment and greatly improves the experience satisfaction degree of the user.

Description

Method, device, equipment and medium for acquiring live broadcast segments based on question and answer text

Technical Field

The invention relates to the technical field of artificial intelligence semantic analysis, in particular to a question and answer text-based live broadcast segment acquisition method, device, equipment and medium.

Background

With the rapid development of internet technology, the live broadcast era has come, many enterprises have publicity for their brands through live broadcast, and the traffic brought by live broadcast has become the preferred channel of many merchants. Live broadcast refers to an information network distribution mode which synchronously makes and distributes information along with the occurrence and development processes of events and has a bidirectional circulation process on site. In the live broadcasting process, audiences can only watch live broadcasting contents along with the progress of the live broadcasting time, once a certain part of contents are missed, the live broadcasting can only be finished to watch playback, the contents which are interactively rewarded in the live broadcasting process can not be watched quickly in the live broadcasting process, and the contents are missed.

Disclosure of Invention

The invention provides a question-answer text-based live broadcast segment acquisition method, device, equipment and medium, which can realize that a video segment for answering an input question can be watched by directly inputting the question in comment content, greatly improve the experience satisfaction of a user and improve the timeliness of answering the question in the live broadcast process.

A live broadcast segment obtaining method based on question and answer texts comprises the following steps:

receiving user comment data; the user comment data comprises a user identifier, a live broadcast identifier and comment content;

acquiring a historical comment clustering model corresponding to the live broadcast identification, and filtering the comment content through an invalid content detection model to obtain an effective text;

inputting the effective text into the historical comment clustering model, and clustering the effective text through the historical comment clustering model to obtain a keyword clustering result of the effective text;

obtaining a question-answer text matched with the keyword clustering result from a question-answer library, and constructing a semantic index vector space based on all matched question-answer texts;

acquiring the question-answering text adjacent to the effective text from the semantic index vector space by using a KNN algorithm;

acquiring the live broadcast segment corresponding to the question and answer text from a live broadcast segment library by using a text matching algorithm, and pushing the acquired live broadcast segment to the user identification; wherein one of the question and answer texts corresponds to one live broadcast segment.

A live broadcast segment acquisition device based on question and answer texts comprises:

the receiving module is used for receiving user comment data; the user comment data comprises a user identifier, a live broadcast identifier and comment content;

the filtering module is used for acquiring a historical comment clustering model corresponding to the live broadcast identification, and filtering the comment content through an invalid content detection model to obtain an effective text;

the clustering module is used for inputting the effective texts into the historical comment clustering model and clustering the effective texts through the historical comment clustering model to obtain keyword clustering results of the effective texts;

the building module is used for obtaining the question and answer texts matched with the keyword clustering result from the question and answer library and building a semantic index vector space based on all the matched question and answer texts;

the acquisition module is used for acquiring the question and answer text adjacent to the effective text from the semantic index vector space by using a KNN algorithm;

the push module is used for acquiring the live broadcast segment corresponding to the question and answer text from a live broadcast segment library by using a text matching algorithm and pushing the acquired live broadcast segment to the user identifier; wherein one of the question and answer texts corresponds to one live broadcast segment.

A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above method for obtaining a question and answer text-based live clip when executing the computer program.

A computer-readable storage medium, which stores a computer program, which, when executed by a processor, implements the steps of the above-mentioned question-answer text-based live clip acquiring method.

The invention provides a method, a device, equipment and a medium for acquiring a live broadcast segment based on a question and answer text, wherein the method comprises the steps of receiving user comment data containing user identification, live broadcast identification and comment content; acquiring a historical comment clustering model corresponding to the live broadcast identification, and filtering the comment content through an invalid content detection model to obtain an effective text; inputting the effective text into the historical comment clustering model, and clustering the effective text through the historical comment clustering model to obtain a keyword clustering result of the effective text; obtaining a question-answer text matched with the keyword clustering result from a question-answer library, and constructing a semantic index vector space based on all matched question-answer texts; acquiring the question-answering text adjacent to the effective text from the semantic index vector space by using a KNN algorithm; the live broadcast segment corresponding to the question and answer text is obtained from the live broadcast segment library by using a text matching algorithm, and the obtained live broadcast segment is pushed to the user identification, so that the user comment data of the user can be received in real time, effective texts can be obtained by automatic filtering, the effective texts are clustered by a historical comment clustering model, the question and answer text is automatically matched from the question and answer library, a semantic index vector space is constructed, the nearest question and answer text is quickly obtained by using a KNN algorithm, so that the corresponding live broadcast segment is automatically found and pushed to the user, therefore, the video segment needing question and answer interaction in live broadcast is positioned without a mode that the user watches and plays back after finishing the live broadcast, the problem is directly input into the comment content, the video segment answering the input problem can be watched, and the experience satisfaction degree of the user is greatly improved, the timeliness of answering the questions in the live broadcast process is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

Fig. 1 is a schematic view of an application environment of a question and answer text-based live broadcast segment acquisition method in an embodiment of the present invention;

fig. 2 is a flowchart of a live clip acquisition method based on question and answer text in an embodiment of the present invention;

fig. 3 is a flowchart of step S20 of the question and answer text-based live clip obtaining method according to an embodiment of the present invention;

fig. 4 is a flowchart of step S30 of the question and answer text-based live clip obtaining method according to an embodiment of the present invention;

fig. 5 is a flowchart of step S40 of the question and answer text-based live clip obtaining method according to an embodiment of the present invention;

fig. 6 is a schematic block diagram of a device for acquiring a live clip based on a question and answer text according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a computer device in an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The method for acquiring the live broadcast segment based on the question and answer text can be applied to the application environment shown in figure 1, wherein a client (computer equipment or a terminal) is communicated with a server through a network. The client (computer device or terminal) includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

In an embodiment, as shown in fig. 2, a method for obtaining a live clip based on a question and answer text is provided, which mainly includes the following steps S10-S60:

s10, receiving user comment data; the user comment data comprises a user identifier, a live broadcast identifier and comment content.

Understandably, when the live broadcast of the live broadcast identification is carried out, the live broadcast mode can be carried out on a page or application program software, when a user corresponding to the user identifier needs to ask a live question or know a reward content and other live related problems on a live broadcast page or application software of the live broadcast identifier, inputting the user comment data in a comment input box, thereby receiving the user comment data from the user identification in real time, the user comment data is comment content input by a user in a live broadcasting process, the user comment data comprises the user identification, the live broadcasting identification and the comment content, the user identification is the only identification of the user in the live broadcast platform, the live broadcast identification is a code which endows the live broadcast platform with the only identification, and the comment content is the content input by the user in a live broadcast comment area.

And S20, obtaining a historical comment clustering model corresponding to the live broadcast identification, and filtering the comment content through an invalid content detection model to obtain an effective text.

Understandably, one live broadcast identification corresponds to one historical comment clustering model, a corresponding historical comment clustering model is created when a live broadcast corresponding to the live broadcast identification is started, or in the period before the live broadcast is accurate (before the live broadcast is not started), an assistant can accurately generate possible contents of questions and answers, at the moment, a corresponding historical comment clustering model is created based on the accurate possible contents of questions and answers, the historical comment clustering model is a model for learning and outputting a cluster label set which is most concerned in comment contents of all histories and is associated with the live broadcast identification or/and prepared possible contents of questions and answers, the historical comment clustering model is a trained clustering model, the cluster label set is a key word which appears most frequently and in different categories, the K-means text clustering means that firstly, K objects are arbitrarily selected from n text objects as initial clustering centers, and for the remaining other text objects, the K objects are respectively assigned to the most similar clusters (represented by the clustering centers) according to the similarity (distance) between the K objects and the clustering centers, then the clustering centers of each new cluster (the mean value of all objects in the cluster) are calculated, and the process is repeated until the standard measure function starts to converge.

Wherein the invalid content detection model is a trained neural network model for identifying invalid content in input content, the training process of the invalid content detection model is to collect historical samples containing invalid content labels, extract invalid features from the samples, identify the samples according to the extracted invalid features, compare the identified results with the invalid content labels, calculate loss values between the identified results and the invalid content labels by using a cross entropy loss function, detect whether the loss values reach convergence conditions, iteratively update parameters of the invalid content detection model when the loss values do not converge, return to execute the extraction of the invalid features from the samples, perform the cyclic execution until the loss values reach the convergence conditions, stop the training to obtain the trained invalid content detection model, and the filtering process is to remove invalid parts in the comment content, only the effective content is reserved, namely, firstly, the comment content is compared with the historical subtitles by using a text comparison algorithm, the content same as the historical subtitles is removed, and subtitle filtering content is obtained, because the content same as the historical subtitles is a reference part, the problem is not true of the user; secondly, web site links irrelevant to live broadcast content are filtered to obtain link filtering content, and the web site link content is written malicious advertisements or malicious content which are both confused live broadcast; and finally, identifying the content with the invalid characteristic in the link filtering content through the invalid content detection model, filtering the content with the invalid characteristic, recording the filtered content as the valid text, wherein the valid text is the comment content after the filtering processing, and the filtering processing process can be a process of filtering historical subtitles of the comment content, removing the contents of subtitles appearing in the history, then performing link filtering, removing the content of website links, and finally removing invalid contents such as expressions, numbers and redundant punctuation marks to obtain the valid text.

In an embodiment, as shown in fig. 3, the step S20, namely, the filtering the comment content to obtain valid text, includes:

s201, performing historical subtitle filtering on the comment content by using a text comparison algorithm to obtain subtitle filtering content.

Understandably, the history caption is filtered to remove caption content appearing in live broadcasting from comment content, the text comparison algorithm is to split the history caption into each independent sentence, the position identical to the first character of one sentence in the history caption is found from the comment content, then word-by-word comparison is carried out, when different single characters are compared, the comparison is stopped, a text query algorithm is used to find the next position identical to the first character of the sentence, a comparison processing algorithm is circulated, whether the content identical to the sentence in the history caption exists in the comment content can be compared by using the text comparison algorithm, if the content identical to the sentence in the history caption exists, the content is removed, thereby achieving the history caption filtering function, and the remaining comment content is recorded as the caption filtering content, and if the content inconsistent with the sentence in the historical caption does not exist, the comment content is not processed, and the comment content is recorded as the caption filtering content.

And the subtitle filtering content is comment content filtered by the historical subtitles.

And S202, performing link filtering on the subtitle filtering content to obtain link filtering content.

Understandably, the link filtering is a process of removing content of a website link in input content, that is, a character string matching algorithm is used to search content of an HTTP character string in the subtitle filtering content, if a character string consistent with the HTTP character string is found, character-by-character detection is performed on the content behind the found character string to judge whether the content is a character until a first character is detected, a process of removing the character string content between the found character string and the first character is performed, and finally the subtitle filtering content after link filtering is recorded as the link filtering content.

Wherein, the character string matching algorithm is to use the HTTP character string as a mode character string, use the subtitle filtering content as a main character string, when the mode character string and the main character string are searched for matching, each pointer points to the current matching character (the main character string is a pointer i, the mode character string is a pointer j), on the premise of ensuring that the pointer i does not backtrack, the pointer j can only backtrack, wherein, the main character string is a character string to be matched, i.e. the instruction data, the mode character string is a matched character string, i.e. the update instruction, i is a pointer recording the matching progress in the main character string, j is the distance of pointer backtracking, which is equivalent to the distance of the mode character string moving to the right, i.e. after a certain character fails to match, the position of j pointer backtracking, for a given mode character string, wherein each character may encounter matching failure, at this time, the corresponding j pointers all need to be traced, the specific traced position is determined by the mode character string itself and has no relation with the main character string, the traced position of the j pointer corresponding to each character in the mode character string can be obtained through an algorithm, the obtained result is correspondingly stored in an array (the default array name is next), the algorithm is that for a certain character in the mode character string, the character string in front of the certain character is extracted, the number of continuous identical character strings is respectively checked from the two ends of the character string, on the basis of the character string, "+ 1" is obtained, the result is the j value corresponding to the character, the value corresponding to the first character of each mode string is 0, and the value corresponding to the second character is 1, for example: finding the next of the mode character string "abcarba", wherein 0 and 1 corresponding to the first two characters are fixed, for the character ' c ', the extracted character strings "ab", a ' and ' b ' are not equal, the number of the same character strings is 0, 0+1 is 1, and therefore the next value corresponding to ' c ' is 1; extracting 'abc' from the fourth character 'a', wherein the first character 'a' is not equal to the fourth character 'c', the same number is 0, and 0+1 is 1, so that the next value corresponding to 'a' is 1; extracting "abca" from the fifth character 'b', wherein the first 'a' and the last 'a' are the same, and the same number is 1, 1+1 is 2, so that the next value corresponding to 'b' is 2; extracting the sixth character 'a', wherein the first two characters 'ab' and the last two characters 'ab' are the same, and the same number is 2, 2+1 is 3, so that the next value corresponding to 'a' is 3; extracting the last character 'c', wherein the first character 'a' is the same as the last character 'a', the same number is 1, and 1+1 is 2, so that the next value corresponding to 'c' is 2; therefore, the value in the next array corresponding to the character string "abcarba" is (0, 1, 1, 1, 2, 3, 2), and the fast matching algorithm and the normal matching algorithm both start matching from the beginning of the main character string, but during the matching process, the fast matching algorithm records some necessary information, and according to the information, during the subsequent matching process, the meaningless matching process is skipped, for example: the main character string is ababccabab, the pattern character string is abcac, the next value corresponding to the pattern character string is (0, 1, 1, 1, 2), and the matching process is that the i pointer is not moved and the j pointer retreats to the specified position according to the value corresponding to the j pointer in the next value when the matching fails, so that the fast matching algorithm can be found to be matched only 3 times, and the common searching algorithm needs to be matched 6 times, therefore, the fast matching algorithm is faster than the common searching algorithm, and the position of the HTTP character string can be found faster.

S203, extracting invalid features of the link filtering contents through the invalid content detection model, and determining invalid contents according to the extracted invalid features.

Understandably, the invalid content detection model is a trained Neural Network model for identifying invalid content in input content, the training process of the invalid content detection model is to collect historical samples containing invalid content labels, extract invalid features from the samples, identify according to the extracted invalid features, compare the identified results with the invalid content labels, calculate loss values between the identified results and the invalid content labels by using a cross entropy loss function, detect whether the loss values reach a convergence condition, iteratively update parameters of the invalid content detection model when the loss values do not converge, return to execute the extraction of the invalid features from the samples, circulate the execution until the loss values reach the convergence condition, stop the training to obtain the trained invalid content detection model, the Network structure of the invalid content over-detection model can be CRNN (probabilistic Neural Network function), convolutional recurrent neural network), wherein the invalid identification process is a process of extracting invalid features from input text contents through the invalid content detection model, classifying according to the extracted invalid features, classifying and outputting contents with the invalid features, and identifying the contents to obtain invalid contents.

The invalid features are all-digital, all-letter, expression and the like.

S204, removing the invalid content from the link filtering content to obtain the valid text.

Understandably, the invalid content identified in the link filtering content is removed, so as to obtain the valid text, wherein the valid text is the content of the user real comment corresponding to the user identification.

According to the method, historical subtitle filtering is performed on the comment content by using a text comparison algorithm to obtain subtitle filtering content; performing link filtering on the subtitle filtering content to obtain link filtering content; extracting invalid features of the link filtering contents through the invalid content detection model, and determining invalid contents according to the extracted invalid features; and removing the invalid content from the link filtering content to obtain the valid text, so that invalid parts in the comment content can be automatically filtered, only useful information is reserved, interference of the invalid parts on subsequent live broadcast segment acquisition based on the question and answer text is avoided, and the accuracy of the subsequent live broadcast segment acquisition based on the question and answer text is improved.

And S30, inputting the effective texts into the historical comment clustering model, and clustering the effective texts through the historical comment clustering model to obtain keyword clustering results of the effective texts.

Understandably, the clustering process includes the steps of performing word segmentation on the effective text, performing stop word removal on words after word segmentation, recording all remaining words after stop word removal as the unit words, and calculating the weight (IF-IDF value) of each unit word by applying a TD-IDF algorithm according to the frequency of the unit word appearing in a history comment data set in the history comment clustering model and the frequency of reverse text; generating a two-dimensional effective matrix according to all the unit words and the weights of the unit words; and clustering the effective matrix to obtain the keyword clustering result.

The stop word processing is a processing procedure for removing words same as stop words, the TF-IDF (Term Frequency-Inverse text Frequency) algorithm is a common weighting technique for information retrieval and text mining (text mining) and is used for evaluating the importance degree of a word to one of files in a file set or a corpus, the TF-IDF algorithm comprises a Term Frequency (Term Frequency, abbreviated as TF) and an Inverse text Frequency (Inverse Document Frequency, abbreviated as IDF), when the TF (Term Frequency) and the IDF (Inverse text Frequency) of a word are multiplied, a TF-IDF value of a word can be obtained, and the clustering processing is a processing procedure for clustering adjacent similar classified regions and merging the similar classified regions by using a morphological operator.

In an embodiment, as shown in fig. 4, in step S30, that is, the inputting the valid text into the history comment clustering model, and performing clustering processing on the valid text through the history comment clustering model to obtain a keyword clustering result of the valid text includes:

s301, performing word segmentation and stop word removal processing on the effective text to obtain a plurality of unit words.

Understandably, the word segmentation is to divide the word with the smallest unit into a plurality of unit words, the stop word processing is a processing process to remove the word same as the stop word, the stop word processing can compare each unit word with the stop word through a text comparison algorithm to remove the unit word which is consistent with the stop word, preferentially, the text comparison algorithm is a RK algorithm (Rabin-Karp) algorithm, and the text comparison algorithm is to judge whether the two are consistent by comparing hash values of the two.

S302, performing weight calculation on each unit word by using a TD-IDF algorithm based on the historical comment data set in the historical comment clustering model to obtain the weight of each unit word.

Understandably, the history comment clustering model includes the history comment data set, the history comment data set is comment data (content) of all histories or/and a prepared set of possible problems, the TF-IDF (Term Frequency-Inverse text Frequency) algorithm is a common weighting technique for information retrieval (information retrieval) and text mining (text mining) for evaluating the importance degree of a word to one of a file set or a corpus, the TF-IDF algorithm includes Term Frequency (TF) and Inverse text Frequency (Inverse text Frequency, abbreviated IDF), when the TF (Term Frequency) and the IDF (Inverse text Frequency) of an abbreviation are multiplied, a TF-IDF value of a word is obtained, the Term Frequency is the Frequency of occurrence of a unit word in the history comment data set, the reverse text frequency is obtained by dividing the total number of the historical comment data set by the number of the historical comment data containing the unit word, taking the logarithm of the result obtained by the division, performing weight calculation on each unit word, wherein the weight calculation is a calculation process of multiplying TF (word frequency) and IDF (reverse text frequency) of one unit word to obtain an ID-IDF value, and recording the result after weight calculation as the weight of each unit word.

S303, generating an effective matrix according to all the unit words and the weights of the unit words.

Understandably, generating a two-dimensional array by all the unit words and the weights corresponding to the unit words, and generating the effective matrix of the two-dimensional array.

S304, performing K-means clustering processing on the effective matrix through the historical comment clustering model to obtain the keyword clustering result.

Understandably, the process of the K-means clustering process is to determine, from the effective matrix, for example, the effective matrix is W [ i ] [ j ], k pieces of data are randomly selected from the range of 0 to i in W [ i ] [ j ], so as to generate k clusters (clusters), k corresponds to the cluster centers mean (cluster) of the k clusters, k, respectively calculating Euclidean distance (i, k) for each row in W [ i ] [ j ], respectively calculating n (i) of a cluster center nearest to each row in W [ i ] [ j ], judging whether the n (i) cluster nearest to the sample is a cluster to which the cluster belongs, if so, ending, otherwise, adding the sample into the cluster k, recalculating, repeating continuously, and finally outputting the keyword clustering result, the keyword clustering result represents the result of the most concentrated unit word in the historical comment data set.

The invention realizes the word segmentation and stop word removal processing of the effective text to obtain a plurality of unit words; performing weight calculation on each unit word by using a TD-IDF algorithm based on a historical comment data set in the historical comment clustering model to obtain the weight of each unit word; generating an effective matrix according to all the unit words and the weights of the unit words; and performing K-means clustering processing on the effective matrix through the historical comment clustering model to obtain a keyword clustering result, so that the keywords in the effective text can be automatically identified by applying a TD-IDF algorithm and the K-means clustering processing, and the extraction of the question and answer text is facilitated.

And S40, obtaining the question and answer texts matched with the keyword clustering result from the question and answer library, and constructing a semantic index vector space based on all the matched question and answer texts.

Understandably, the question-answer library stores all the question-answer texts in the live broadcast corresponding to the live broadcast identification, the question-answer texts may be texts of questions and answers designed in advance before the live broadcast, or may be texts with frequent and concerned questions identified in the live broadcast process, a text similarity algorithm is applied to find out the question-answer texts matched with the keyword clustering result from the question-answer library, preferably, the text similarity algorithm is a text cosine similarity algorithm, the cosine similarity algorithm is an algorithm for performing word embedding vector conversion on two texts to obtain two groups of words embedded with the converted vector, and then a cosine value between the converted two groups of words embedded with the vector is calculated, so that an algorithm for measuring the similarity between the two texts through the cosine value is used, and a vector index construction method is used to construct the semantic index vector space containing all the matched question-answer texts, the semantic index vector space is a space for generating index numbers corresponding to the question and answer texts and establishing mapping relations between the index numbers and the question and answer texts.

And the question and answer text is a text pair which is extracted in the recording process corresponding to the live broadcast identification and contains a question text and a reply text.

In an embodiment, before the step S40, that is, before the step of obtaining the question and answer text matching the keyword clustering result from the question and answer library, the method includes:

a historical review data set is obtained.

Understandably, the historical review data set is a collection of review data (content) for all histories or/and prepared possible problems that occurred.

And performing problem identification on the historical comment data set through a problem identification model to obtain a problem concerned result comprising at least one problem concerned.

Understandably, the problem recognition model is a trained neural network model for recognizing problems which are concentrated, problem semantic features of each content in the historical comment data set are extracted through the problem recognition model, the extracted problem semantics are classified, the contents with the problem semantic features are recognized and clustered, so that a plurality of concerned problems which are most concerned can be gathered, all the concerned problems are recorded as the concerned problem results, the problem recognition model is trained by collecting historical text samples containing problem labels, extracting the problem semantic features of the text samples, recognizing according to the extracted problem semantic features, comparing the recognized results with the problem labels, and calculating loss values between the recognized results and the problem labels by using a cross entropy loss function, and detecting whether the loss value reaches a convergence condition, iteratively updating initial parameters of the problem recognition model when the loss value does not reach the convergence condition, returning to execute the extraction of problem semantic features of the text sample, and stopping training until the loss value reaches the convergence condition, thereby obtaining the trained problem recognition model.

The network structure of the problem identification model can be set according to requirements, for example, the network structure of the problem identification model is a network structure of Bi-LSTM, and the problem semantic features are features related to problems.

And after the concern question result is pushed to a terminal corresponding to the live broadcast identification, if a recording password for starting a live broadcast segment is detected, executing a recording action until the recording password is stopped, and obtaining the live broadcast segment.

Understandably, the concerned problem result is pushed to a terminal corresponding to the live broadcast identification, when the terminal is live broadcast, a live broadcast user or an assistant is placed at the terminal in front of the live broadcast user or the assistant so that the live broadcast user or the assistant can check the problem concerned by audiences in the live broadcast process, after the concerned problem result is pushed to the terminal corresponding to the live broadcast identification, if a recording password for starting a live broadcast segment is detected, the recording password can be triggered by a sentence or a gesture of the live broadcast user, the recording action is executed, the recording brake is an execution instruction for recording the video in the live broadcast corresponding to the live broadcast identification through a video acquisition device until the recording stop password is detected, the processes of detecting the recording password and the recording stop password can be detected through the state of a start-stop key, and the content of a specific sentence (for example, a person is asked) can be detected through a voice recognition model The recording password is triggered by detecting the action of a gesture (such as the action of an index finger upwards) through an action recognition model, the recording password is triggered by detecting the content of a specific sentence (such as the answer is close) through a voice recognition model, or the recording stopping password is triggered by detecting the action of a gesture (such as the action of a gesture of ending the gesture OK) through an action recognition model, and the video collected between the recording password and the recording stopping password is recorded as the segment.

The speech recognition model is a trained model for extracting frequency domain features of an input speech segment and performing character prediction on the extracted frequency domain features to obtain each character in the speech segment, and may be a model for extracting frequency domain features and character prediction by using a distillation learning method, training a Teacher network and a Student network, and recognizing a text from the input speech segment through the trained Student network, where the distillation learning method is to migrate parameters of a corresponding layer, and train a simple model (Student network) by using an output of a pre-trained complex model (Teacher model, Teacher network) as a supervision signal, for example: the student network adopts a mode of distilling at intervals of N layers, that is, the process of identifying the distilled layer is performed at intervals of preset N layers, for example, the teacher network has 12 layers, if the student network is set to 4 layers, a transform loss is calculated at intervals of 3 layers, and the mapping function g (m) is 3 × m, where m is the number of layers related to coding in the student network, and the specific correspondence is as follows: the layer 1 transform of the student network corresponds to the layer 3 of the teacher network, the layer 2 of the student network corresponds to the layer 6 of the teacher network, the layer 3 of the student network corresponds to the layer 9 of the teacher network, and the layer 4 of the student network corresponds to the layer 12 of the teacher network, the voice Recognition model can also be a model for recognizing the input audio to identify the text by adopting an Automatic voice Recognition technology (ASR, which is a technology for converting human voice into text), the training process of the action Recognition model is to extract action characteristics by collecting image samples marked with action labels and presetting the Recognition parameters of the action Recognition model and applying the action Recognition model, the action characteristics are preset action-related characteristics, such as an action with an index finger upwards or a gesture action with an OK gesture ending, and are classified according to the extracted action characteristics, obtaining an action recognition result, calculating an action loss value between an action label and the action recognition result by using a cross entropy loss function, iteratively updating the recognition parameters of the action recognition model when detecting that the action loss value does not reach a convergence condition, and re-executing the step of extracting action characteristics by using the action recognition model until detecting that the action loss value reaches the convergence condition, stopping training, and recording the converged action recognition model as a pre-trained action recognition model.

And performing text recognition and question-answer text division on the live broadcast segments by using an audio segmentation technology and a voice recognition technology to obtain a question text and a reply text.

Understandably, the audio segmentation technology is a technology for separating an image and an audio from an input live-broadcast recorded video, removing the image, and reserving an audio part, and the audio segmentation technology is used for separating the live-broadcast segment to obtain an audio segment of the live-broadcast segment, the voice Recognition technology (ASR) is a technology for converting human voice into text, namely extracting voiceprint features of the audio data, namely extracting voiceprint features with mel cepstrum coefficient (MFCC) in the audio data, identifying pronunciation words corresponding to the voiceprint features through the voiceprint features, so as to convert corresponding text contents, thereby identifying text contents of characters contained in the audio segment, obtaining text segment contents of the live-broadcast segment, and extracting problem semantic features of the text segment contents of the live-broadcast segment through the problem identification model, and performing question recognition according to the extracted question semantic features, recognizing a question text, and recording the contents of the text segments of the live broadcast segments except the question text as a reply text.

And matching the attention question matched with the question text from the attention question result, and replacing the matched attention question with the question text.

Understandably, the concern questions with the similarity to the question text being greater than a preset threshold are found out from the concern question results by using a text cosine similarity algorithm, the preset threshold can be set according to requirements, for example, the preset threshold is 80%, the found concern questions are recorded as the concern questions matched with the question text, the matched concern questions are substituted for the question text, the substituted question text and the substituted reply text are recorded as new question and answer texts, or the substituted question text and the substituted reply text are covered with the original question and answer text, the corresponding relation between the question and answer text and the live broadcast segment is reserved, and the live broadcast segment library stores all the live broadcast segments of the live broadcast identification.

The invention realizes the purpose of obtaining the historical comment data set; performing problem identification on the historical comment data set through a problem identification model to obtain a problem concerned result comprising at least one problem concerned; after the concern question result is pushed to a terminal corresponding to the live broadcast identification, if a recording password for starting a live broadcast segment is detected, executing a recording action until the recording password is stopped, and obtaining the live broadcast segment; performing text recognition and question and answer text division on the live broadcast segments by using a voice recognition technology to obtain a question text and a reply text; and matching the attention questions matched with the question texts from the attention question results, and replacing the matched attention questions with the question texts to obtain new question and answer texts, so that the attention questions can be automatically identified from historical comment data sets, live broadcast segments of the attention questions can be automatically recorded in the live broadcast process, and a data basis is provided for subsequent playback.

In an embodiment, as shown in fig. 5, the step S40, namely, the constructing a semantic index vector space based on all the matched question and answer texts, includes:

s401, performing word embedding vector conversion on all the matched question and answer texts to obtain question and answer vectors corresponding to the matched question and answer texts.

Understandably, the Word embedding vector is converted into a vector representation which uses a Word2Vec model to convert a Word into a fixed length, the network structure of the Word2Vec model can be a network structure of a Skip-gram, the network structure of the Skip-gram is a probability model which predicts the appearance of a Word in a lower context in a window through a current Word, the current Word is regarded as x, other words in the window are regarded as y, and the probability of other words is predicted through a hidden layer and a Softmax activation function, so that the question vector which corresponds to each matched question and answer text one to one can be converted.

S402, constructing a vector index according to each question-answer vector.

Understandably, the process of constructing the vector Index may be constructed by using a FAISS Index construction method, FAISS provides a package class for applying to Index under different scenes, and constructs a corresponding vector Index according to the package class in which each question-answer vector falls, and the vector Index may also be a preset number of bits in front of the question-answer vector as an Index, so that a plurality of question-answer vectors correspond to one vector Index.

And S403, establishing a mapping relation with each matched question and answer text based on the vector index, and constructing the semantic index vector space.

Understandably, the mapping relation between the vector indexes and the corresponding question and answer texts is established, so that the semantic index vector space is constructed, and the adjacent question and answer texts can be quickly searched through the semantic index vector space.

The invention realizes the word embedding vector conversion of all the matched question and answer texts to obtain the question and answer vectors corresponding to the matched question and answer texts; constructing a vector index according to each question-answer vector; based on the vector indexes, the mapping relation between the matching question answering texts is established, and the semantic index vector space is established, so that the vector space can be provided for quickly acquiring the adjacent question answering texts subsequently by establishing the vector indexes and establishing the semantic index vector space, and the uncertainty of acquisition of the live broadcast segments based on the question answering texts is avoided.

And S50, acquiring the question and answer text adjacent to the effective text from the semantic index vector space by using a KNN algorithm.

Understandably, the KNN algorithm is a K-Nearest Neighbor (K-Nearest Neighbor) algorithm, which is the K-Nearest Neighbor algorithm if the majority of K most similar (i.e., Nearest neighbors in feature space) instances in feature space belong to a certain class, this instance also belongs to this category of algorithms, the ANN search algorithm and the KNN algorithm are used to construct the retrieval graph of the data structure composed of all the embedded vector coding processed storage documents, the vector retrieval process is that a retrieval graph structured by graph data is constructed and is compared through vector quantization, and further matching the distance of the central point, thereby quickly retrieving the corresponding document, so that the question and answer text nearest to the effective text can be quickly found in the semantic index vector space, and the question and answer text adjacent to the effective text can be obtained from the semantic index vector space.

S60, acquiring the live broadcast segment corresponding to the question and answer text from a live broadcast segment library by using a text matching algorithm, and pushing the acquired live broadcast segment to the user identifier; wherein one of the question and answer texts corresponds to one live broadcast segment.

Understandably, the text matching algorithm is an algorithm for matching similarity by using word vectors of texts and measuring matching degree according to the similarity, the live broadcast segment library stores all live broadcast segments in live broadcast of the live broadcast identification, the live broadcast segments completely matched with the question and answer text are searched in the live broadcast segment library by using the text matching algorithm, the live broadcast segments are video segments which are intercepted in live broadcast and are related to the question and answer text, the live broadcast segments are pushed to a terminal corresponding to the user identification, the received live broadcast segments are automatically played in a live broadcast platform, and the live broadcast segments are automatically withdrawn and restored to a live broadcast platform after being played, so that a live broadcast user can automatically play back the segments related to the comment content by inputting the comment content.

The method and the device realize the effect that the user comment data containing the user identification, the live broadcast identification and the comment content are received; acquiring a historical comment clustering model corresponding to the live broadcast identification, and filtering the comment content through an invalid content detection model to obtain an effective text; inputting the effective text into the historical comment clustering model, and clustering the effective text through the historical comment clustering model to obtain a keyword clustering result of the effective text; obtaining a question-answer text matched with the keyword clustering result from a question-answer library, and constructing a semantic index vector space based on all matched question-answer texts; acquiring the question-answering text adjacent to the effective text from the semantic index vector space by using a KNN algorithm; the live broadcast segment corresponding to the question and answer text is obtained from the live broadcast segment library by using a text matching algorithm, and the obtained live broadcast segment is pushed to the user identification, so that the user comment data of the user can be received in real time, effective texts can be obtained by automatic filtering, the effective texts are clustered by a historical comment clustering model, the question and answer text is automatically matched from the question and answer library, a semantic index vector space is constructed, the nearest question and answer text is quickly obtained by using a KNN algorithm, so that the corresponding live broadcast segment is automatically found and pushed to the user, therefore, the video segment needing question and answer interaction in live broadcast is positioned without a mode that the user watches and plays back after finishing the live broadcast, the problem is directly input into the comment content, the video segment answering the input problem can be watched, and the experience satisfaction degree of the user is greatly improved, the timeliness of answering the questions in the live broadcast process is improved.

In an embodiment, before the step S60, that is, before the acquiring the live segment corresponding to the question and answer text adjacent to the valid text, the method includes:

and acquiring a live broadcast recorded video.

Understandably, the live broadcast recording video is a video which starts to record the live broadcast process when the live broadcast starts, and the live broadcast recording video continuously records live broadcast contents along with the live broadcast time.

And separating audio data from the live recorded video by using an audio separation technology.

Understandably, the audio segmentation technology is a technology of separating a part of an image and audio from an input live recording video, removing the part of the image, and taking the rest as the audio data.

And performing text recognition on the audio data by using a voice recognition technology to obtain an audio text.

Understandably, the Speech Recognition technology (ASR) is a technology for converting human Speech into text, that is, extracting the voiceprint feature of the audio data, that is, extracting the voiceprint feature with mel cepstrum coefficient (MFCC) in the audio data, and recognizing the pronunciation word corresponding to the voiceprint feature through the voiceprint feature, so as to be able to convert the corresponding text content, and thus recognize the text content of the character contained in the audio data, and obtain the audio text.

And searching question and answer sentences matched with the question and answer text from the audio text by using a text matching algorithm and a context semantic algorithm, and intercepting the live broadcast segment based on the question and answer sentences.

Understandably, the text matching algorithm is an algorithm for matching out the similarity by using word vectors of texts and measuring the matching degree according to the similarity, the context semantic algorithm is an algorithm for predicting the content of the text question by predicting the context of the input text question in the audio text, searching question-answer sentences matched with the question-answer text from the audio text, and the process of intercepting the live broadcast segment based on the question-answer sentence can be matching a text question sentence matched with the question-answer text from the audio text by using a text matching algorithm, and carrying out context recognition on the text question to obtain a following answer sentence matched with the text question, and intercepting the live broadcast segment corresponding to the text question and the following answer sentence from the live broadcast recorded video.

The invention realizes the purpose of recording the video by acquiring the live broadcast; dividing audio data from the live recorded video by using an audio division technology; performing text recognition on the audio data by using a voice recognition technology to obtain an audio text; the method comprises the steps of searching question and answer sentences matched with the question and answer texts from the audio texts by using a text matching algorithm and a context semantic algorithm, and intercepting live broadcast segments based on the question and answer sentences, so that the live broadcast segments of all the question and answer texts can be automatically intercepted from live broadcast recorded videos by using an audio-video frequency division technology, a voice recognition technology, a text matching algorithm and a context semantic algorithm without manual interception, complete live broadcast segments can be accurately intercepted, and the accuracy and correctness of interception are improved.

In an embodiment, the step S64, namely the searching for question and answer sentences matching with the question and answer text from the audio text by using a text matching algorithm and a context semantic algorithm, and intercepting the live broadcast segment based on the question and answer sentences includes:

and searching a text question sentence matched with the question text in the question and answer text from the audio text by using a text matching algorithm.

Understandably, the text matching algorithm is an algorithm for matching similarity by using word vectors of a text and measuring the matching degree according to the similarity, wherein the similarity calculation method is to calculate by using a cosine similarity calculation method, and the text question sentence is a sentence in the audio text with the highest similarity to the question and answer text.

And performing context recognition on the text question by using a context semantic algorithm through a semantic recognition model to obtain a context answer sentence matched with the text question.

Understandably, the context semantic algorithm is an algorithm for predicting the content of the text question by semantic prediction of the context of the input text question in the audio text, so that the context answer sentence corresponding to the text question can be identified, wherein the context answer sentence is the sentence answering the text question in the audio text.

The semantic recognition model is a neural network model which is trained by a Bi-LSTM algorithm and used for recognizing answer sentences with answer characteristics of one question and one answer, the Bi-LSTM algorithm is also called a bidirectional long-and-short-term memory network algorithm, the semantic recognition model is a check method for carrying out embedded word vector conversion by carrying out common coding in forward and reverse directions, then carrying out context semantic prediction on coded word vectors, predicting text contents with answer characteristics for answering the text question, the training process of the semantic recognition model can carry out semantic prediction on question labels on input samples by inputting historically collected samples containing one question (question label) and one answer (answer label) and interference noise, predicting answer results, and then carrying out loss value calculation on the answer results and the answer labels, and obtaining a response loss value, when the response loss value does not reach a convergence condition, iteratively updating the semantic recognition model, re-executing semantic prediction until the response loss value reaches the convergence condition, and stopping training to obtain a trained semantic recognition model.

And determining the text question sentences and the following answer sentences as the question-answer sentences.

And intercepting the live broadcast segment corresponding to the question and answer sentence from the live broadcast recorded video.

Understandably, according to the starting frame of the text question sentence and the ending frame of the following answer sentence, the corresponding position of the text question sentence is positioned in the live-broadcast recorded video, the video content between the two positions is intercepted, and the video content is recorded as the live-broadcast segment corresponding to the question answer sentence.

The invention realizes that the text question sentence matched with the question text in the question and answer text is searched from the audio text by using a text matching algorithm; performing context recognition on the text question by using a context semantic algorithm through a semantic recognition model to obtain a context answer sentence matched with the text question; determining the text question sentence and the next answer sentence as the question-answer sentence; and the live broadcast segment corresponding to the question and answer sentences is intercepted from the live broadcast recorded video, so that the live broadcast segment of each question and answer text can be accurately intercepted from the live broadcast recorded video by using a text matching algorithm and a context semantic algorithm, and the intercepting accuracy and correctness are improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a question and answer text-based live broadcast segment acquisition device is provided, and the question and answer text-based live broadcast segment acquisition device corresponds to the question and answer text-based live broadcast segment acquisition method in the embodiment one to one. As shown in fig. 6, the device for acquiring a question-answer text-based live broadcast segment includes a receiving module 11, a filtering module 12, a clustering module 13, a constructing module 14, an acquiring module 15, and a pushing module 16.

The functional modules are explained in detail as follows:

the receiving module 11 is used for receiving user comment data; the user comment data comprises a user identifier, a live broadcast identifier and comment content;

the filtering module 12 is configured to obtain a historical comment clustering model corresponding to the live broadcast identifier, and filter the comment content through an invalid content detection model to obtain an effective text;

the clustering module 13 is configured to input the effective text into the history comment clustering model, and perform clustering processing on the effective text through the history comment clustering model to obtain a keyword clustering result of the effective text;

the building module 14 is configured to obtain a question-answer text matched with the keyword clustering result from a question-answer library, and build a semantic index vector space based on all matched question-answer texts;

an obtaining module 15, configured to obtain, by using a KNN algorithm, the question-answer text adjacent to the valid text from the semantic index vector space;

the pushing module 16 is configured to obtain the live broadcast segment corresponding to the question and answer text from a live broadcast segment library by using a text matching algorithm, and push the obtained live broadcast segment to the user identifier; wherein one of the question and answer texts corresponds to one live broadcast segment.

For specific limitations of the apparatus for acquiring a live clip based on a question and answer text, reference may be made to the above limitations of the method for acquiring a live clip based on a question and answer text, which are not described herein again. All or part of the modules in the question-answer text-based live broadcast segment acquisition device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a client or a server, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a readable storage medium and an internal memory. The readable storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the readable storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a live broadcast segment acquisition method based on question and answer texts.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for obtaining the live clip based on the question and answer text in the above embodiments.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for acquiring a live clip based on a question and answer text in the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A live broadcast segment obtaining method based on question and answer texts is characterized by comprising the following steps:

2. The method for acquiring the live clip based on the question and answer text as claimed in claim 1, wherein before acquiring the live clip corresponding to the question and answer text adjacent to the valid text, the method comprises:

acquiring a live broadcast recorded video;

dividing audio data from the live recorded video by using an audio division technology;

performing text recognition on the audio data by using a voice recognition technology to obtain an audio text;

3. The method for obtaining the question-answer text-based live clip according to claim 2, wherein the searching for the question-answer sentence matched with the question-answer text from the audio text by using a text matching algorithm and a context semantic algorithm, and the intercepting the live clip based on the question-answer sentence comprises:

searching a text question sentence matched with the question text in the question and answer text from the audio text by using a text matching algorithm;

performing context recognition on the text question by using a context semantic algorithm through a semantic recognition model to obtain a context answer sentence matched with the text question;

determining the text question sentence and the next answer sentence as the question-answer sentence;

4. The method for obtaining direct-broadcast segment based on question-answer text as claimed in claim 1, wherein before obtaining the question-answer text matching the keyword clustering result from the question-answer library, the method comprises:

acquiring a historical comment data set;

performing problem identification on the historical comment data set through a problem identification model to obtain a problem concerned result comprising at least one problem concerned;

after the concern question result is pushed to a terminal corresponding to the live broadcast identification, if a recording password for starting a live broadcast segment is detected, executing a recording action until the recording password is stopped, and obtaining the live broadcast segment;

performing text recognition and question and answer text division on the live broadcast segments by using a voice recognition technology to obtain a question text and a reply text;

and matching the concerned questions matched with the question texts from the concerned question results, and replacing the matched concerned questions with the question texts to obtain new question and answer texts.

5. The question-answer text-based live clip acquisition method as claimed in claim 1, wherein the filtering of the comment content by the invalid content detection model to obtain valid text comprises:

performing historical subtitle filtering on the comment content by using a text comparison algorithm to obtain subtitle filtering content;

performing link filtering on the subtitle filtering content to obtain link filtering content;

extracting invalid features of the link filtering contents through the invalid content detection model, and determining invalid contents according to the extracted invalid features;

and removing the invalid content from the link filtering content to obtain the valid text.

6. The method for obtaining question-answering text-based live broadcast segments according to claim 1, wherein the step of inputting the effective text into the historical comment clustering model, and performing clustering processing on the effective text through the historical comment clustering model to obtain a keyword clustering result of the effective text comprises:

performing word segmentation and stop word removal processing on the effective text to obtain a plurality of unit words;

performing weight calculation on each unit word by using a TD-IDF algorithm based on a historical comment data set in the historical comment clustering model to obtain the weight of each unit word;

generating an effective matrix according to all the unit words and the weights of the unit words;

and performing K-means clustering processing on the effective matrix through the historical comment clustering model to obtain the keyword clustering result.

7. The method for obtaining question-answer text-based live segments according to claim 1, wherein the constructing semantic index vector space based on all matched question-answer texts comprises:

performing word embedding vector conversion on all the matched question and answer texts to obtain question and answer vectors corresponding to the matched question and answer texts;

constructing a vector index according to each question-answer vector;

and establishing a mapping relation with each matched question and answer text based on the vector index, and establishing the semantic index vector space.

8. A live broadcast segment acquisition device based on question and answer texts is characterized by comprising:

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for obtaining a question-and-answer text-based live clip according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the method for acquiring a question and answer text-based live clip according to any one of claims 1 to 7.