CN108345679B

CN108345679B - Audio and video retrieval method, device and equipment and readable storage medium

Info

Publication number: CN108345679B
Application number: CN201810159175.5A
Authority: CN
Inventors: 侯佳礼; 刘俊华; 王建社; 柳林; 刘海波; 杨帆; 刘江; 赵志伟; 冯祥; 胡国平; 殷兵; 张程风
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-02-26
Filing date: 2018-02-26
Publication date: 2021-03-23
Anticipated expiration: 2038-02-26
Also published as: CN108345679A

Abstract

The application provides an audio and video retrieval method, an audio and video retrieval device, audio and video retrieval equipment and a readable storage medium, wherein the method comprises the following steps: acquiring an input search term; determining a target text document containing a search word in a pre-constructed text document library, wherein each text document in the text document library is obtained by transcribing a corresponding audio/video file; for each piece of target text document, determining text content related to the search word from the target text document, and obtaining text content corresponding to each piece of target text document; and determining a retrieval result according to the relevance between the text content corresponding to each target text document and the retrieval word and the audio/video file corresponding to each target text document. The method and the device have the advantages that the influence of the content irrelevant to the search terms on the search results is eliminated, and the search accuracy is greatly improved.

Description

Audio and video retrieval method, device and equipment and readable storage medium

Technical Field

The invention relates to the technical field of multimedia retrieval, in particular to an audio and video retrieval method, an audio and video retrieval device, audio and video retrieval equipment and a readable storage medium.

Background

With the development of modern computer and internet technologies, multimedia data is more and more abundant in variety and larger in scale, which makes multimedia retrieval technology more and more important. The audio and video retrieval is an important branch in multimedia retrieval, the retrieval accuracy of the existing audio and video retrieval method is low, and the requirement of a user on the retrieval accuracy is higher and higher, so that an audio and video retrieval method with high retrieval accuracy is urgently needed to meet the requirement of the user on the retrieval accuracy.

Disclosure of Invention

In view of this, the present invention provides an audio and video retrieval method, apparatus, device and readable storage medium to improve the accuracy of audio and video retrieval, and the technical scheme is as follows:

an audio and video retrieval method comprises the following steps:

acquiring an input search term;

determining a target text document containing the search word in a pre-constructed text document library, wherein each text document in the text document library is obtained by transcribing a corresponding audio/video file;

for each target text document, determining text content related to the search word from the target text document, and obtaining text content corresponding to each target text document;

and determining a retrieval result according to the relevance between the text content corresponding to each target text document and the retrieval word and the audio/video file corresponding to each target text document.

Determining a retrieval result according to the relevance between the text content corresponding to each target text document and the retrieval word and the audio/video file corresponding to each target text document, wherein the determining comprises the following steps:

calculating the correlation degree between the text content corresponding to each target text document and the search word;

and sequencing the audio/video files corresponding to the target text documents according to the degree of correlation between the text content corresponding to each target text document and the search word, wherein the sequenced audio/video files are used as search results.

Wherein the determining the text content related to the search term from the target text document comprises:

determining sentences containing the search words from the target text documents to obtain target sentences;

sequentially expanding the target sentences to two sides according to a preset first expansion rule by taking each target sentence as a reference sentence to obtain a target local document corresponding to the target sentence;

and carrying out duplication removal and merging treatment on the target local documents corresponding to the target sentences to obtain text contents related to the search words after treatment.

Before the target local documents corresponding to each target sentence are subjected to de-duplication and merging processing, the method further includes:

and taking the target local document corresponding to the target sentence as a reference local document, and sequentially expanding the target local document to two sides according to a preset second expansion rule to obtain the target local document after secondary expansion, wherein the target local document is used as an object for subsequent duplicate removal and combination operation.

The method for obtaining the target local document corresponding to the target sentence by sequentially expanding each target sentence serving as a reference sentence to two sides according to a preset first expansion rule includes:

and traversing and searching boundary sentences from two sides of the reference sentence by taking the target sentence as the reference sentence, wherein the two searched boundary sentences and each sentence contained between the two boundary sentences form a target local document corresponding to the target sentence.

Wherein, the traversing and searching boundary sentences from the two sides of the reference sentence by taking the target sentence as the reference sentence comprises:

respectively traversing towards two sides of the reference sentence by taking the target sentence as the reference sentence, and judging whether the similarity between the currently traversed sentence and the reference sentence is greater than or equal to a first preset value and whether the similarity between the next sentence to be traversed and the reference sentence is less than the first preset value aiming at the traversing process at any side;

and if the similarity between the currently traversed sentence and the reference sentence is greater than or equal to the first preset value, and the similarity between the next sentence to be traversed and the reference sentence is less than the first preset value, taking the currently traversed sentence as the boundary sentence, otherwise, continuing to traverse downwards until the boundary sentence is found.

The step of sequentially expanding the target local document corresponding to the target sentence to two sides according to a preset second expansion rule by using the target local document corresponding to the target sentence as a reference local document to obtain a target local document after secondary expansion includes:

and traversing and searching boundary sentences from two sides of the reference local document by taking the target local document corresponding to the target sentence as the reference local document, wherein the two searched boundary sentences and each sentence contained between the two searched boundary sentences form the target local document after the secondary expansion.

Wherein, the step of respectively traversing and searching boundary sentences to two sides of the reference local document by using the target local document corresponding to the target sentence as the reference local document comprises the following steps:

respectively traversing two sides of the reference local document by taking the target local document corresponding to the target sentence as the reference local document, and determining the correlation between the currently traversed sentence and the reference local document aiming at the traversing process of any one side;

if the relevance between the currently traversed sentence and the reference local document is smaller than a second preset value, judging whether the relevance between a continuous preset traversed sentence and the reference local document is smaller than the second preset value, wherein the continuous preset traversed sentence comprises the currently traversed sentence;

and if the correlation degree between the continuous preset traversed sentences and the reference local document is smaller than the second preset value, taking the traversed sentences adjacent to the continuous preset traversed sentences as boundary sentences, and if not, continuously traversing downwards until the boundary sentences are found.

Wherein the determining the relevance of the currently traversed sentence and the reference local document comprises:

generating a language model through the reference local document, and/or converting the reference local document into a sentence vector as a reference sentence vector;

calculating a probability score of the currently traversed sentence on the language model; and/or converting the currently traversed sentence into a sentence vector as a target sentence vector, and calculating the similarity score between the target sentence vector and the reference sentence vector;

and taking the probability score, or the similarity score, or a score obtained by summing the probability score and the similarity score as the relevance of the currently traversed sentence and the reference local document.

The sorting of the audio and video files corresponding to each target text document according to the degree of correlation between the text content corresponding to each target text document and the search term includes:

taking the relevance between the text content corresponding to each target text document and the search word as the relevance between the target text document and the search word;

and sequencing the audio/video files corresponding to the target text documents according to the sequence of the relevance of each target text document and the search word from large to small.

Optionally, after determining the target text document, the method further includes:

calculating the relevance of each target text document and the search word as the first relevance of the target text document and the search word;

then, the sorting the audio/video files corresponding to each target text document according to the degree of correlation between the text content corresponding to each target text document and the search term includes:

taking the correlation degree between the text content corresponding to each target text document and the search word as a second correlation degree between the target text document and the search word;

weighting and summing the first relevance and the second relevance of each target text document and the search term to obtain the relevance after weighted summation, wherein the relevance is used as the final relevance of the target text document and the search term;

and sequencing the audio/video files corresponding to the target text documents according to the final relevance of each target text document and the search word.

An audio-video retrieval device comprising: the system comprises an acquisition module, a related document determining module, a related text determining module and a retrieval result determining module;

the acquisition module is used for acquiring the input search terms;

the relevant document determining module is used for determining a target text document containing the search word in a pre-constructed text document library, wherein each text document in the text document library is obtained by transcribing a corresponding audio/video file;

the relevant text determining module is used for determining text contents relevant to the search terms from the target text documents for each target text document, and obtaining the text contents corresponding to each target text document;

and the retrieval result determining module is used for determining a retrieval result according to the correlation degree of the text content corresponding to each target text document and the retrieval word and the audio and video file corresponding to each target text document.

Wherein the relevant text determination module comprises: the sentence expansion module comprises a sentence determining module, a sentence expanding module and a sentence processing module;

the sentence determining module is used for determining sentences containing the search words from the target text documents to obtain target sentences;

the sentence expansion module is used for sequentially expanding each target sentence serving as a reference sentence to two sides according to a preset first expansion rule to obtain a target local document corresponding to the target sentence;

and the sentence processing module is used for carrying out duplication removal and combination processing on the target local documents corresponding to the target sentences to obtain text contents related to the search words after the processing.

Wherein the relevant text determination module further comprises: a document extension module;

and the document expansion module is used for sequentially expanding the target local document corresponding to the target sentence to two sides according to a preset second expansion rule by taking the target local document as a reference local document to obtain the target local document after secondary expansion, and the target local document is used as an object for subsequent duplicate removal and combination operation.

An audio-video retrieval device comprising: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program, and the program is specifically configured to:

acquiring an input search term;

A readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the above-described audio-video retrieval method.

According to the technical scheme, after the search word is obtained, the target text document containing the search word is determined from the pre-constructed text document library, the text content related to the search word is determined from each text document, the text content corresponding to each text document is obtained, and finally, the search result is determined according to the relevance between the text content corresponding to each text document and the search word and the audio and video file corresponding to each target text document. Considering that a user only pays attention to a part related to a search word, the audio and video retrieval method provided by the invention firstly obtains a text document containing the search word, namely, the text document not containing the search word is filtered, further, the text content related to the search word is determined from a target text document containing the search word, and the text content unrelated to the search word is filtered, namely, the influence of the text content unrelated to the search word on a retrieval result is removed, so that the retrieval accuracy is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of an audio and video retrieval method provided by an embodiment of the present invention;

fig. 2 is a schematic flowchart of an implementation manner of determining text content related to a search term from a target text document in the audio/video search method provided in the embodiment of the present invention;

fig. 3 is a schematic flowchart of another implementation manner of determining text content related to a search term from a target text document based on a correlation between context contents in the target text document in the audio/video retrieval method provided in the embodiment of the present invention;

fig. 4 is a schematic structural diagram of an audio/video retrieval device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an audio/video retrieval device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The audio and video retrieval method in the prior art is used for retrieving based on full-text statistical information, specifically, mass audio and video files are converted into texts in advance to form a text document library to be retrieved, during retrieval, a retrieval word input by a user is received, the full text of each text document in the text document library to be retrieved is searched, the text document containing the retrieval word is searched, then the audio and video files corresponding to the searched text document are sequenced according to the correlation degree of the retrieval word and the searched text document, and the sequenced audio and video files are used as a retrieval result.

The inventor discovers that in the process of implementing the invention: since the audio/video files are usually from audio recordings of telephone conferences, instant messaging tools, internet streaming media, etc., semantic topics change many times with text changes, such as movies including a plurality of story segments, lecture reports with a wide discussion range in the academic search field, video recordings of conferences, etc., if the search is performed based on full-text statistical information in the prior art, a small amount of more important information is easily submerged in a large segment of text which is not important or other topics, and thus, it is difficult to search related files.

In view of this, an embodiment of the present invention provides an audio and video retrieval method for improving retrieval accuracy, please refer to fig. 1, which shows a schematic flow diagram of the audio and video retrieval method, where the method may include:

step S101: and acquiring the input search word.

The search terms are related words capable of summarizing the contents to be searched, and can be input by a user in a search interface.

Step S102: and determining a target text document containing the search terms in a pre-constructed text document library.

Each text document in the text document library is obtained by transferring a corresponding audio/video file, namely each text document in the text document library corresponds to an audio/video file.

In this embodiment, an audio/video file library may be pre-constructed, where the audio/video files in the audio/video file library may be, but are not limited to, from recording software, an instant messaging tool, internet streaming media, and the like, for example, videos published by a video website, chat voices on the instant messaging tool, and recordings of a teleconference. It should be noted that the audio and video files in the audio and video file library are all video files containing voice content.

After the audio and video file library is constructed, the audio content of each audio and video file in the audio and video file library can be converted into characters through a voice transcription technology, so that text documents corresponding to each audio and video file are obtained, and after the text documents are obtained, a text document library can be constructed.

It should be noted that, because the audio/video retrieval method provided by this embodiment is performed based on the text documents, and the final retrieval result is the audio/video file related to the retrieval word, in order to facilitate subsequent determination of the retrieval result, each text document in the text document library needs to be associated with a corresponding audio/video file, so that the corresponding audio/video file can be found based on the text document.

Step S103: and for each piece of target text document, determining text content related to the search word from the target text document, and obtaining the text content corresponding to each piece of target text document.

Because the retrieval scheme in the prior art does not filter the parts irrelevant to the retrieval words, which results in low retrieval accuracy, in order to improve the retrieval accuracy, in the embodiment, after the target text document is obtained, text content relevant to the retrieval words is determined from the target text document based on the relevance between the context contents in the target text document, that is, text content irrelevant to the retrieval words in the target text document is filtered, so that the influence of the text content irrelevant to the retrieval words on the retrieval result is removed.

In this step, based on the correlation between the context contents in the target text document, the specific implementation process of determining the text contents related to the search term from the target text document may be referred to in the description of the following embodiments.

Step S104: and determining a retrieval result according to the relevance between the text content corresponding to each target text document and the retrieval word and the audio/video file corresponding to each target text document.

Specifically, the implementation process of this step may include:

first, the relevance between the text content corresponding to each target text document and the search terms is calculated.

In this embodiment, the relevance between the text content corresponding to each text document of the target and the search term may be calculated by, but not limited to, a classical information search algorithm such as TF-IDF, BM25, adaboost, CNN, and the like.

And then, sequencing the audio and video files corresponding to the target text documents according to the relevance between the text content corresponding to each target text document and the search word, wherein the sequenced audio and video files are used as search results.

Specifically, the relevance between the text content corresponding to each piece of target text document and the search word can be used as the relevance between the target text document and the search word, the audio and video files corresponding to each target text document are sorted according to the descending order of the relevance between each piece of target text document and the search word, and the sorted audio and video files are used as the search result. And after the retrieval result is determined, displaying the retrieval result, wherein the more front audio and video files in the displayed audio and video files have higher correlation degree with the retrieval words.

Specifically, in a possible implementation manner, the target text documents may be sorted in the descending order of the correlation degree between each target text document and the search word, and then the audio/video files corresponding to the target text documents are sorted according to the sorting of the target text documents; in another possible implementation manner, the audio and video files corresponding to the target text document can be directly sorted according to the sequence of the relevance between each target text document and the search word from large to small, and it can be understood that the relevance between the target text document and the search word is the relevance between the corresponding audio and video file and the search word because the target text document is transcribed from the corresponding audio and video file, and the audio and video files corresponding to the target text document can be sorted according to the relevance between the audio and video file and the search word.

According to the audio and video retrieval method provided by the embodiment of the invention, after the retrieval words are obtained, the target text documents containing the retrieval words are firstly determined from the pre-constructed text document library, then the text content related to the retrieval words is determined from each space target text document, the text content corresponding to each space target text document is obtained, and finally the audio and video files corresponding to each target text document are sequenced according to the relevance between the text content corresponding to each space target text document and the retrieval words, so that the retrieval result is obtained. Considering that a user only pays attention to a part related to a search word, the audio and video retrieval method provided by the embodiment of the invention firstly obtains a text document containing the search word, namely, the text document not containing the search word is filtered, further, the text content related to the search word is determined from the text document containing the search word based on the correlation between context contents, and the text content unrelated to the search word is filtered, namely, the influence of the text content unrelated to the search word on a retrieval result is removed, so that the retrieval accuracy is greatly improved.

Optionally, after determining the target text document, the audio/video retrieval method provided in the above embodiment may further include: and calculating the relevance of each target text document and the search word as the first relevance of the target text document and the search word.

Then, in the implementation process of step S104, the process of sorting the audio/video files corresponding to each target text document according to the degree of correlation between the text content corresponding to each target text document and the search term may include: taking the correlation degree of the text content corresponding to each target text document and the search word as a second correlation degree of the target text document and the search word; weighting and summing the first relevance and the second relevance of each target text document and the search word to obtain the relevance after weighted summation, and taking the relevance as the final relevance of the target text document and the search word; and sequencing the audio/video files corresponding to the target text documents according to the final relevance of each target text document and the search word.

Optionally, after the first relevance between the target text document and the search word is obtained, the audio and video files corresponding to each target text document may be sorted based on the first relevance between each target text document and the search word, and the sorted audio and video files are used as a basic search result. And after obtaining the second relevance between each target text document and the search word, or the relevance after weighted summation of the first relevance between each target text document and the search word and the second relevance, sequencing the audio and video files in the basic search result according to the second relevance between each target text document and the search word, or the relevance after weighted summation of the first relevance between each target text document and the search word and the second relevance between each target text document and the search word, and taking the sequenced audio and video files as the final search result.

In another embodiment of the present invention, there are various implementations of determining the text content related to the search term from the target text document in the foregoing embodiment step S103, and referring to fig. 2, a flowchart of one possible implementation is shown, which may include:

step S201: and determining sentences containing the search words from the target text documents to obtain target sentences.

Specifically, a character string matching algorithm may be used to query each sentence in the target text document for the presence of a search term, and the sentence in the target text document in which the search term is present is taken as the target sentence.

Step S202: and with each target sentence as a reference sentence, sequentially expanding the target sentences to two sides according to a preset first expansion rule to obtain target local documents corresponding to the target sentences.

The preset first expansion rule is used for searching sentences which are similar to the meaning of the target sentence and have the same subject.

In this embodiment, the implementation process of step S202 may include: and traversing and searching boundary sentences from two sides of the reference sentence by taking the target sentence as the reference sentence, wherein the two searched boundary sentences and each sentence contained between the two boundary sentences form a target local document corresponding to the target sentence.

Further, with the target sentence as a reference sentence, the process of traversing and finding the boundary sentences to both sides of the reference sentence respectively may include: respectively traversing to two sides of a reference sentence by taking a target sentence as the reference sentence, and judging whether the similarity between the currently traversed sentence and the reference sentence is greater than or equal to a first preset value and whether the similarity between the next sentence to be traversed and the reference sentence is less than the first preset value aiming at the traversing process of any one side; and if the similarity between the currently traversed sentence and the reference sentence is greater than or equal to a first preset value and the similarity between the next sentence to be traversed and the reference sentence is less than the first preset value, taking the currently traversed sentence as a boundary sentence, otherwise, continuing to traverse downwards until the boundary sentence is found.

The following describes a specific implementation process of the step S202 by using a specific example:

the target text document is D, the total number of sentences contained in the target text document D is N (100), and assuming that a search word appears in the 50 th sentence from front to back, that is, the 50 th sentence is the target sentence, the 50 th sentence is S (50), the forward sentence of S (50) is S (49), S (48), S (47) … …, and the backward sentence of S (50) is S (51), S (52), and S (53) … …, the process of expanding the sentences to both sides with S (50) as a reference is as follows:

and (3) forward expanding the S (50), namely traversing the sentence forward by taking the S (50) as a reference: first, go through S (49), determine whether S (49) is a forward boundary sentence, i.e. calculate the similarity score between S (49) and S (50) and the similarity score between S (48) and S (50), if the similarity score between S (49) and S (50) is greater than or equal to the predetermined value T1, and the similarity score between S (48) and S (50) is greater than or equal to T1, then S (49) is similar to S (50), S (48) is similar to S (50), and the content belongs to the same topic, then it is determined that S (49) is not a forward boundary sentence, if the similarity score between S (49) and S (50) is greater than or equal to T1, and the similarity score between S (48) and S (50) is less than T1, then S (49) is similar to S (50), S (48) is similar to S (50), and S (48) belongs to a different topic, that is, S (49) is a forward boundary sentence, where it is assumed that S (49) is not a forward boundary sentence, the forward traversal is continued, where S (48) is traversed, and if the similarity score of S (48) and S (50) and the similarity score of S (47) and S (50) are both greater than T1, it is determined that S (48) is not a forward boundary sentence, the forward traversal is continued, where S (47) is traversed, and if the similarity score of S (47) and S (50) is greater than T1, and the similarity score of S (46) and S (50) is less than T, it is determined that S (47) is a forward boundary sentence, and at this time, the forward traversal is ended.

And (3) carrying out backward extension on the S (50), namely traversing the sentence backwards by taking the S (50) as a reference: firstly, S (51) is traversed, whether S (51) is a backward boundary sentence is judged, namely, a similarity score between S (51) and S (50) and a similarity score between S (52) and S (50) are calculated, if the similarity score between S (51) and S (50) is greater than or equal to T1 and the similarity score between S (52) and S (50) is less than T1, the similarity score indicates that S (51) is similar to S (50) in terms of meaning, S (52) is not similar to S (50) in terms of meaning, S (51) and S (52) belong to different subjects, the S (51) is determined to be the backward boundary sentence, if the similarity score between S (51) and S (50) and the similarity score between S (52) and S (50) are both greater than T1, the S (51) is determined not to be the backward boundary sentence, and the S (51) is continued to be the backward boundary sentence, and the S (52) is traversed, assuming that the similarity between S (52) and S (50) and the similarity score between S (53) and S (50) are both greater than T1, the backward traversal is continued until S (53) is reached if S (52) is determined not to be a backward boundary sentence, and the backward traversal is ended if S (53) and S (50) have a similarity score greater than T1 and S (54) and S (50) have a similarity score less than T1.

By the above process, it can be determined that S (47) is a forward boundary sentence, S (53) is a backward boundary sentence, and the document formed by S (47) -S (53) is the target local document corresponding to the target sentence S (50).

The target local document corresponds to a relatively uniform and definite text topic related to the search term, such as a sub-chapter in an academic report recording, an episodic lens in a self-media video, and the like.

Step S203: and carrying out duplication removal and combination treatment on the target local documents corresponding to the target sentences to obtain text contents related to the search words after the treatment.

It should be noted that, in a document, a search word usually appears many times, that is, there are many sentences containing the search word, that is, there may be many target sentences, and correspondingly, there are also many target local documents corresponding to the target sentences, and at this time, it is necessary to merge the target local documents corresponding to each target sentence, however, there may be a case where a plurality of target local documents partially or completely overlap, if only the target local documents corresponding to all the target sentences are simply merged, the frequency and distribution of words in the text related to the search word in the original data may be changed, so as to affect the final search result, therefore, in order to avoid the above-mentioned case, it is necessary to merge the target local documents corresponding to each target sentence after the target local documents are deduplicated, and only one sentence with the same number is retained during the deduplication, for example, if the sentences S (15) and S (16) are included in both target local documents, the repeated S (15) and S (16) are removed and only one S (15) and one S (16) are retained when merging is performed.

Because discrete noise at text levels such as interlude and turnout exists in the audio/video file, the target local document obtained by expanding the target sentence by adopting the implementation mode is possibly incomplete, further, the text content related to the search term is incomplete, for example, the target text document includes 20 sentences, 20 sentences are S (1) to S (20), S (8) is the target sentence, and assuming that S (13) is an interlude, the backward boundary sentence is determined as S (12) according to the above implementation manner, in reality, S (14) to S (18) are still sentences belonging to the same subject as S (8), in order to ensure the comprehensiveness of the text content related to the search term, an embodiment of the present invention provides a flowchart of another implementation manner of determining the text content related to the search term from the target text document in step S103, where the implementation process may include:

step S301: and determining sentences containing the search words from the target text documents to obtain target sentences.

Step S302: and with each target sentence as a reference sentence, sequentially expanding the target sentences to two sides according to a preset first expansion rule to obtain target local documents corresponding to the target sentences.

It should be noted that, the specific implementation process of steps S301 to S302 can refer to steps S201 to S202 in the foregoing embodiment, which is not described herein again.

Step S303: and taking the target local document corresponding to the target sentence as a reference local document, and sequentially expanding the target local document to two sides according to a preset second expansion rule to obtain the target local document after secondary expansion corresponding to the target sentence, namely the complete semantic local document.

In this embodiment, the implementation process of step S303 may include: and traversing and searching boundary sentences from two sides of the reference local document by taking the target local document corresponding to the target sentence as the reference local document, wherein the two searched boundary sentences and each sentence contained between the two searched boundary sentences form the target local document after secondary expansion.

Further, with the target local document corresponding to the target sentence as the reference local document, the process of traversing and finding the boundary sentences to the two sides of the reference local document respectively may include: respectively traversing towards two sides of a reference local document by taking the target local document corresponding to the target sentence as the reference local document, determining the correlation degree between the currently traversed sentence and the reference local document aiming at the traversing process of any one side, and judging whether the correlation degrees between the continuously preset traversed sentences and the reference local document are smaller than a second preset value or not if the correlation degree between the currently traversed sentence and the reference local document is smaller than the second preset value, wherein the continuously preset traversed sentences comprise the currently traversed sentence; and if the correlation degrees of the continuous preset traversed sentences and the reference local document are all smaller than a second preset value, taking the traversed sentences adjacent to the continuous preset traversed sentences as boundary sentences, and otherwise, continuously traversing downwards until the boundary sentences are found.

The following describes a specific implementation process of the step S303 by using a specific example:

the target text document comprises 100 sentences, if a retrieval word appears in the 45 th sentence from the front to the back, namely the 45 th sentence is a target sentence, the 45 th sentence is recorded as S (45), the target local document D subjected to first expansion by the S (45) is S (28) to S (55), the forward sentence of the target local document D is S (27), S (26), S (25) … …, and the backward sentence is S (56), S (57) and S (58) … …, when the similarity between the continuous 3 sentences and D is less than a preset value T2, the expansion is finished, and the local document is subjected to secondary expansion by taking D as a reference:

and D is subjected to forward expansion, namely the local document is traversed forward by taking D as a reference: firstly, S (27) is traversed, the correlation between S (27) and D is determined, the correlation between S (27) and D is assumed to be smaller than a preset value T2, because the correlation between only one traversed sentence and D is smaller than T2 currently, the forward traversal is continued, S (26) is traversed, the correlation between S (26) and D is larger than T2, the forward traversal is continued, S (25) is traversed, the correlation between S (25) and D is larger than T2, the forward traversal is continued, S (24) is traversed, the correlation between S (24) and D is smaller than T2, the forward traversal is continued, S (23) is traversed, the correlation between S (23) and D is smaller than T2, the forward traversal is continued, S (22) is traversed, the correlation between S (22) and D is smaller than T2, and 3 continuous traversals and D are smaller than T2, the forward traversal ends with S (25) as the forward boundary sentence.

And D is subjected to backward extension, namely the local document is traversed backwards by taking D as a reference: firstly, S (56) is traversed, the relevance between S (56) and D is determined, the relevance between S (56) and D is assumed to be smaller than a preset value T2, because the relevance between only one traversed sentence and D is smaller than T2 currently, backward traversal is continued, S (57) is traversed, the relevance between S (57) and D is larger than T2, backward traversal is continued, S (58) is traversed, the relevance between S (58) and D is smaller than T2, backward traversal is continued, S (59) is traversed, the relevance between S (59) and D is smaller than T2, backward traversal is continued, S (60) is traversed, the relevance between S (60) and D is smaller than T2, the relevance between 3 continuous traversed sentences and D is smaller than T2, backward traversal is ended, and S (57) is used as a backward boundary sentence.

By the process, the step S (25) is determined to be a forward boundary sentence, the step S (57) is determined to be a backward boundary sentence, and the document formed by the step S (25) to the step S (57) is the target local document after the secondary expansion.

In this embodiment, there are various implementations for determining the relevance between the currently traversed sentence and the reference local document, and in one possible implementation, the process for determining the relevance between the currently traversed sentence and the reference local document may include: generating a language model through the reference local document, calculating the probability score of the currently traversed sentence on the language model, and taking the calculated probability score as the correlation degree of the currently traversed sentence and the reference local document.

In another possible implementation, the process of determining the relevance of the currently traversed sentence to the reference local document may include: converting the reference local document into a sentence vector as a reference sentence vector; converting the current traversed sentence into a sentence vector as a target sentence vector; and calculating the similarity score of the target sentence vector and the reference sentence vector, and taking the calculated similarity score as the correlation of the current traversed sentence and the reference local document.

In yet another possible implementation, the process of determining the relevance of the currently traversed sentence to the reference local document may include: generating a language model through a reference local document, and converting the reference local document into a sentence vector as a reference sentence vector; calculating the probability score of the currently traversed sentence on the language model; converting the current traversed sentences into sentence vectors as target sentence vectors, and calculating similarity scores of the target sentence vectors and the reference sentence vectors; and summing the probability score and the similarity score obtained by the calculation, and taking the summed score as the correlation degree of the current traversed sentence and the reference local document. The implementation mode combines the advantages of the two correlation degree determination modes and is a better implementation mode.

In a possible implementation manner, the language model can be generated based on the reference local document by using an ngram algorithm, the reference local document can be converted into a sentence vector by using a sensor embedding algorithm, a currently traversed sentence can be converted into a sentence vector by using the sensor embedding algorithm, and a similarity score between a target sentence vector and the reference sentence vector can be obtained by using a cosine distance.

Step S304: and carrying out duplication removal and combination treatment on the secondarily expanded target local documents corresponding to the target sentences to obtain text contents related to the search words.

It should be noted that, in a target document, a search word usually appears many times, that is, there are many sentences containing the search word, that is, there may be many target sentences, correspondingly, there are many target local documents corresponding to the target sentences, and there are also many target local documents after the secondary expansion, so that all the target local documents after the secondary expansion need to be merged.

However, in some cases, there may be some or all of the overlapping of multiple target local documents after secondary expansion, for example, the sentence in which the search word appears for the first time in a target text document is the i1 th sentence, the sentence in which the search word appears for the second time is the i2 th sentence, the sentences from the i1+1 st sentence, the i1+2 nd sentence, … … th sentence, the i2-1 st sentence and the i2 st sentence are connected, so that the last sentences of the target local document after secondary expansion corresponding to the i1 th sentence are overlapped with the first sentences of the target local document after secondary expansion corresponding to the i2 th sentence, and even the target local document after secondary expansion corresponding to the i1 th sentence is identical with the target local document after secondary expansion corresponding to the i2 th sentence, if all the target local documents after secondary expansion are simply merged, the frequency and distribution of words in the text related to the search word in the original data may be changed, thereby affecting the final retrieval result.

In order to avoid the above situation, the target partial documents corresponding to the target sentence after the secondary expansion need to be deduplicated and then merged, so that the frequency of each sentence in the target text document appearing in the final text is not more than once. And when in specific duplication elimination, only one sentence with the same number is reserved. For example, the target text document has two target sentences, wherein sentences in the target local document after the secondary expansion corresponding to one target sentence are sentences 2 to 10 in the target text document, sentences in the target local document after the secondary expansion corresponding to the other target sentence are sentences 9 to 15 in the target text document, and text contents related to the search words obtained after de-overlapping and merging are sentences 2 to 15.

The present embodiment is described below by way of a specific example: a target text document corresponds to an audio/video file, a word of innovation appears in the target text document, the word of innovation appears in 56 sentences, if a retrieval word input by a user is innovation, the sentence where the innovation is located is firstly expanded for the first time to obtain 56 target partial documents, then the target partial documents are expanded for the second time to obtain 56 target partial documents after the second expansion, and finally the 56 target partial documents after the second expansion are subjected to de-duplication and combination to obtain text contents related to the retrieval word innovation. Table 1 below shows the process of obtaining text content related to the term "innovation".

TABLE 1 text content determination process related to the term "Innovation

As can be seen from table 1, the "innovation" appears in the 3 rd sentence for the first time, the target local document after the 3 rd sentence is used as the reference sentence for the first expansion is the 2 nd to 4 th sentences, the target local document after the target local document is used as the 2 nd to 10 th sentences, the "innovation" appears in the 11 th sentence for the second time, the target local document after the 11 th sentence is used as the reference sentence for the first expansion is the 10 th to 15 th sentences, and the target local document after the target local document is used as the 9 th to 15 th sentences after the target local document is used as the second expansion, because the 2 nd to 10 th sentences and the 9 th to 15 th sentences are overlapped, that is, the 9 th sentence and the 10 th sentence are both included, when the 9 th sentence and the 10 th sentence are deleted during merging, so that the 9 th sentence and the 10 th sentence are only retained, and the other target local documents after the second expansion are similar.

The method provided by the two embodiments can determine the text content related to the search word from each piece of target text document, so as to obtain the text content corresponding to each piece of target text document, and then sort the audio/video files corresponding to each target text document based on the degree of correlation between the text content corresponding to each piece of target text document and the search word, so as to obtain the search result. Because the audio and video files corresponding to the target text documents are ordered according to the relevance between the text content related to the search terms and the search terms, namely the interference of the context-free content on the search is eliminated, the search accuracy is greatly improved.

Corresponding to the above audio and video retrieval method, an embodiment of the present invention further provides an audio and video retrieval device, please refer to fig. 4, which shows a schematic structural diagram of the audio and video retrieval device, and may include: an acquisition module 401, a related document determination module 402, a related text determination module 403, and a retrieval result determination module 404.

The obtaining module 401 is configured to obtain an input search term.

A relevant document determining module 402, configured to determine a target text document containing the search term in a pre-constructed text document library.

And each text document in the text document library is obtained by transcribing a corresponding audio/video file.

The related text determining module 403 is configured to determine, for each text document of the target, text content related to the search word from the target text document, and obtain text content corresponding to each text document of the target.

And the retrieval result determining module 404 is configured to determine a retrieval result according to the relevance between the text content corresponding to each target text document and the retrieval word, and the audio/video file corresponding to each target text document.

Considering that a user only pays attention to a part related to a search word, the audio/video retrieval device provided by the embodiment of the invention firstly obtains a target text document containing the search word, namely, firstly filters the text document not containing the search word, further determines text content related to the search word from the target text document containing the search word based on context correlation so as to filter the text content unrelated to the search word, namely, removes the influence of the content unrelated to the search word on a retrieval result, and finally determines the retrieval result based on the correlation between the text content corresponding to each piece of the target document and related to the search word and the search word, wherein the retrieval result determined by the process has higher accuracy.

In the audio/video retrieval apparatus provided in the foregoing embodiment, the retrieval result determining module 404 may include: a first relevancy calculation module and a sorting module.

And the first relevancy calculation module is used for calculating the relevancy between the text content corresponding to each target text document and the search word.

And the sequencing module is used for sequencing the audio and video files corresponding to the target text documents according to the relevance between the text content corresponding to each target text document and the search word, and the sequenced audio and video files are used as the search result.

In the audio/video retrieval apparatus provided in the foregoing embodiment, the relevant text determining module 403 may include: the sentence expansion module comprises a sentence determination module, a sentence expansion module and a sentence processing module.

And the sentence determining module is used for determining the sentences containing the search words from the target text documents to obtain the target sentences.

And the sentence expansion module is used for sequentially expanding the target sentences to two sides according to a preset first expansion rule by taking each target sentence as a reference sentence to obtain the target local documents corresponding to the target sentences.

And the sentence processing module is used for carrying out duplication removal and merging processing on the target local documents corresponding to the target sentences to obtain text contents related to the search words after the processing.

Further, the sentence expansion module is specifically configured to traverse and search boundary sentences to two sides of the reference sentence with the target sentence as the reference sentence, and the two searched boundary sentences and each sentence included therebetween constitute a target local document corresponding to the target sentence.

Further, the sentence expansion module comprises: traversing the submodule, judging the submodule and determining the boundary submodule.

And the traversal submodule is used for respectively traversing towards two sides of the reference sentence by taking the target sentence as the reference sentence.

And the judging submodule is used for judging whether the similarity between the current traversed sentence of the traversing submodule and the reference sentence is greater than or equal to a first preset value or not and whether the similarity between the next to-be-traversed sentence and the reference sentence is less than the first preset value or not aiming at the traversing process of any side.

And the boundary determining submodule is used for taking the currently traversed sentence as the boundary sentence when the similarity between the currently traversed sentence and the reference sentence is greater than or equal to the first preset value and the similarity between the next sentence to be traversed and the reference sentence is less than the first preset value.

And the traversal submodule is used for continuously traversing downwards until the boundary determining module determines the boundary sentence when the boundary determining submodule determines that the currently traversed sentence is not the boundary sentence.

Preferably, the relevant text determination module 403 may further include: and a document extension module.

And the document expansion module is used for sequentially expanding the target local document corresponding to the target sentence to two sides according to a preset second expansion rule by taking the target local document as a reference local document to obtain the target local document after secondary expansion, and the target local document is used as an object for subsequent de-duplication and merging operation.

Further, the document extension module is specifically configured to traverse and search boundary sentences to two sides of the reference local document with the target local document corresponding to the target sentence as the reference local document, and the two searched boundary sentences and each sentence included therebetween constitute the target local document after the secondary extension.

Still further, the document extension module includes: the device comprises a traversal submodule, a correlation degree determining submodule, a judgment submodule and a boundary determining submodule.

And the traversing submodule is used for respectively traversing towards two sides of the reference local document by taking the target local document corresponding to the target sentence as the reference local document.

And the relevancy determination submodule is used for determining the relevancy between the sentence traversed by the traversal submodule currently and the reference local document aiming at the traversal process of any side.

And the judging submodule is used for judging whether the correlation degrees of the currently traversed sentences and the reference local document are smaller than a second preset value or not when the correlation degrees of the currently traversed sentences and the reference local document are smaller than the second preset value, wherein the correlation degrees of the continuously preset traversed sentences and the reference local document are smaller than the second preset value, and the continuously preset traversed sentences comprise the currently traversed sentences.

And the boundary determining submodule is used for taking the traversed sentences adjacent to the continuous preset traversed sentences as boundary sentences when the correlation degrees of the continuous preset traversed sentences and the reference local document are smaller than the second preset value.

And the traversal submodule is used for continuously traversing downwards until the boundary determining submodule determines the boundary sentence when the correlation degree between the continuously preset traversed sentences and the reference local document is less than the second preset value.

In the audio/video retrieval device provided in the foregoing embodiment, the sorting module is specifically configured to use a correlation between the text content corresponding to each target text document and the search word as a correlation between the target text document and the search word, and sort the audio/video files corresponding to each target text document according to a descending order of the correlation between each target text document and the search word.

Optionally, in the audio/video retrieval apparatus provided in the foregoing embodiment, the retrieval result determining module 404 may further include: a second correlation calculation module and a weighted summation module.

And the second correlation degree calculation module is used for calculating the correlation degree of each target text document and the search word as the first correlation degree of the target text document and the search word.

And the weighted summation module is used for taking the correlation degree between the text content corresponding to each target text document and the search word as the second correlation degree between the target text document and the search word, and carrying out weighted summation on the first correlation degree and the second correlation degree between each target text document and the search word to obtain the correlation degree after weighted summation, which is taken as the final correlation degree between the target text document and the search word.

And the sequencing module is specifically used for sequencing the audio/video files corresponding to the target text documents according to the final relevance of each target text document and the search word.

An embodiment of the present invention further provides an audio/video retrieval device, please refer to fig. 5, which shows a schematic structural diagram of the audio/video retrieval device, and the audio/video retrieval device may include a memory 501 and a processor 502.

A memory 501 for storing programs;

a processor 502 configured to execute the program, the program specifically configured to:

acquiring an input search term;

The audio and video retrieval device further comprises: a bus, a communication interface 503, an input device 504, and an output device 505.

The processor 502, the memory 501, the communication interface 503, the input device 504, and the output device 505 are connected to each other by a bus. Wherein:

a bus may include a path that transfers information between components of a computer system.

The Processor 502 may be a general-purpose Processor, such as a general-purpose Central Processing Unit (CPU), a Network Processor (NP), a microprocessor, etc., an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program according to the present invention. But may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

The processor 502 may include a main processor and may also include a baseband chip, modem, and the like.

The memory 501 stores programs for executing the technical solution of the present invention, and may also store an operating system and other key services. In particular, the program may include program code including computer operating instructions. More specifically, memory 501 may include a read-only memory (ROM), other types of static storage devices that may store static information and instructions, a Random Access Memory (RAM), other types of dynamic storage devices that may store information and instructions, a disk storage, a flash, and so forth.

Input devices 504 may include devices that receive data and information input by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer, or gravity sensor, among others.

Output device 505 may include a means for allowing information to be output to a user, such as a display screen, printer, speakers, etc.

Communication interface 503 may include any means for using any transceiver or the like to communicate with other devices or communication networks, such as ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The processor 502 executes the program stored in the memory 501, and invokes other devices, which can be used to implement the steps of the audio/video retrieval method provided by the embodiment of the present invention.

The embodiment of the invention also provides a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the audio and video retrieval method are realized.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An audio and video retrieval method is characterized by comprising the following steps:

acquiring an input search term;

determining a retrieval result according to the relevance between the text content corresponding to each target text document and the retrieval word and the audio/video file corresponding to each target text document;

wherein determining text content related to the search term from the target text document comprises:

2. The audio-video retrieval method of claim 1, wherein the determining of the retrieval result through the correlation between the text content corresponding to each target text document and the retrieval word and the audio-video file corresponding to each target text document comprises:

and sequencing the audio/video files corresponding to the target text documents according to the degree of correlation between the text content corresponding to each target text document and the search word, wherein the sequenced audio/video files are used as the search result.

3. The audio-video retrieval method according to claim 1, wherein before the de-duplication and merging processing is performed on the target local document corresponding to each target sentence, the method further comprises:

4. The audio/video retrieval method according to claim 1, wherein the step of sequentially expanding each target sentence serving as a reference sentence to two sides according to a preset first expansion rule to obtain a target local document corresponding to the target sentence comprises:

5. The audio-video retrieval method according to claim 4, wherein the step of traversing and searching boundary sentences to two sides of the reference sentence by using the target sentence as the reference sentence comprises:

6. The audio/video retrieval method according to claim 3, wherein the step of sequentially expanding the target local document corresponding to the target sentence to both sides according to a preset second expansion rule by using the target local document as a reference local document to obtain a target local document after secondary expansion comprises:

7. The audio-video retrieval method according to claim 6, wherein the step of respectively traversing and searching boundary sentences to both sides of a reference local document by using a target local document corresponding to the target sentence as the reference local document comprises:

8. The audio-video retrieval method of claim 7, wherein the determining the relevance of the currently traversed sentence to the reference local document comprises:

9. The audio-video retrieval method of claim 2, wherein the sorting of the audio-video files corresponding to each target text document according to the degree of correlation between the text content corresponding to each target text document and the retrieval word comprises:

10. The audio-visual retrieval method of claim 2, wherein after determining the target text document, the method further comprises:

the sorting of the audio and video files corresponding to each target text document according to the degree of correlation between the text content corresponding to each target text document and the search word includes:

11. An audio-video retrieval device, comprising: the system comprises an acquisition module, a related document determining module, a related text determining module and a retrieval result determining module;

the acquisition module is used for acquiring the input search terms;

the retrieval result determining module is used for determining a retrieval result according to the correlation degree of the text content corresponding to each target text document and the retrieval word and the audio and video file corresponding to each target text document;

12. The audiovisual retrieval device of claim 11, wherein said relevant text determination module further comprises: a document extension module;

13. An audio-video retrieval device, comprising: a memory and a processor;

the memory is used for storing programs;

acquiring an input search term;

and carrying out duplication removal and merging treatment on the target local documents corresponding to the target sentences to obtain texts related to the search words after treatment.

14. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the audiovisual retrieval method according to any of claims 1 to 10.