The method of video speech recognition and retrieval
Technical field
The present invention relates to the video manufacture field, particularly the method for a kind of video speech recognition and retrieval.
Background technology
Present cloud and search technique have been widely used in the various industries, present video search technology is also still in heuristic process, video search is large because of its data volume, be difficult for the reason such as expressions with the search of image content or video segment does not also reach detail, and label that the video search of at present widespread use all is based on filename and artificial increase is used as keyword search.Simultaneously, speech recognition technology also has been applied in the every field widely, but only is single speech recognition at present, and most of for only for identifying than the voice of short-movie section, does not do deep research and utilization.Simultaneously at present video can intercept intermediate segment and play or caught sometime sectional drawing content, but is not applied in the search at present.
In view of this, those skilled in the art provide the method for a kind of video speech recognition and retrieval for the problems referred to above.
Summary of the invention
The invention provides the method for a kind of video speech recognition and retrieval, overcome the difficulty of prior art, can carry out extensive and pointed search to video, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously.
The present invention adopts following technical scheme:
The invention provides the method for a kind of video speech recognition and retrieval, may further comprise the steps:
(1) the sound part with all videos changes text into by speech recognition;
(2) text is stored separately respectively or is attached in its video;
(3) choose the frequency of occurrences is the highest in the text some words as the word tag of this video, after described word tag is added on the filename of video;
(4) retrieve the word tag of all videos.
Preferably, the text in the described step (2) is preserved with the word file form.
Preferably, the text in the described step (2) is preserved with the TXT document form.
Preferably, the individual character number of the word tag in the described step (3) is defined as 3.
Preferably, the individual character number of the word tag in the described step (3) is defined as 5.
Preferably, the individual character number of the word tag in the described step (3) is defined as 10.
Owing to adopted above-mentioned technology, compared with prior art, the present invention can carry out extensive and pointed search to video, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously.
Further specify the present invention below in conjunction with drawings and Examples.
Description of drawings
Fig. 1 is the process flow diagram of the method for video speech recognition of the present invention and retrieval;
Fig. 2 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 1;
Fig. 3 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 2;
Fig. 4 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 3.
Embodiment
Introduce three kinds of specific embodiments of the present invention below by Fig. 1 to 4.
As shown in Figure 1, the method for a kind of video speech recognition of the present invention and retrieval may further comprise the steps:
(1) the sound part with all videos changes text into by speech recognition;
(2) text is stored separately respectively or is attached in its video;
(3) choose the frequency of occurrences is the highest in the text some words as the word tag of this video, after described word tag is added on the filename of video;
(4) retrieve the word tag of all videos.
Text in the described step (2) is preserved with the word file form, or preserves with the TXT document form.
Preferably, the individual character number of the word tag in the described step (3) is defined as 3, or is 5, or is 10.
Actual operating position of the present invention is as follows:
Embodiment 1
As shown in Figure 2, in the public safety, in admission camera video content, obtain audio files and use speech recognition technology to handle accordingly, be stored in high in the clouds, or only preserve text beyond the clouds, with the actual audio-video document of other easily big data quantity memory bank storages, can carry out the screenshotss picture of single text retrieval or text, video segment and corresponding timeslice as result for retrieval for two kinds of situations during retrieval.
Embodiment 2
As shown in Figure 3, in the individual application, can do the Internet video media to video file equally similarly retrieves, the special application, sort articles time admission memory location and quote corresponding Item Title for example, input corresponding Item Title during search and can find the article storage position, prevent because of the difficulty problem of looking for of forgeing or the situation such as non-arrangement people finder exists, take when for example cleaning up the room and say: the clothing in summer is put here, daddy's shirt is put here, old mother's overcoat is put here, younger brother's pencil is put here, elder sister's cosmetics are all put here, when searching the input shirt, then retrieve a plurality of shirt results, according to screenshotss determine the target shirt timeslice or directly find the position to get final product.This domestic. applications can have been avoided the conflict that the reasons such as misunderstanding of the difference of same thing memory caused because can not find article or house person greatly, and the old man relatively poor for memory is especially convenient.
Embodiment 3
As shown in Figure 4, search for the Internet video media, high in the clouds is analyzed with sound video and is transformed, and indicate by Time Line in the mode of similar captions, the user only need input corresponding text or says the content (being converted to text by speech recognition technology equally) of wanting to search for and can list corresponding captioned test and the screenshotss picture of video segment and corresponding Time Line during search.Example when the user only remembers the part lines of certain film, uses this technology to carry out video frequency searching for these part lines.
In summary, owing to adopted above-mentioned technology, the present invention can carry out extensive and pointed search to video, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously.
Above-described embodiment only is used for illustrating technological thought of the present invention and characteristics, its purpose is to make those skilled in the art can understand content of the present invention and implements according to this, can not only limit claim of the present invention with present embodiment, be all equal variation or modifications of doing according to disclosed spirit, still drop in the claim of the present invention.