CN102074235A - Method of video speech recognition and search - Google Patents

Method of video speech recognition and search Download PDF

Info

Publication number
CN102074235A
CN102074235A CN 201010600817 CN201010600817A CN102074235A CN 102074235 A CN102074235 A CN 102074235A CN 201010600817 CN201010600817 CN 201010600817 CN 201010600817 A CN201010600817 A CN 201010600817A CN 102074235 A CN102074235 A CN 102074235A
Authority
CN
China
Prior art keywords
video
speech recognition
search
text
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010600817
Other languages
Chinese (zh)
Other versions
CN102074235B (en
Inventor
刘伟奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqin Technology Co Ltd
Original Assignee
Huaqin Telecom Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqin Telecom Technology Co Ltd filed Critical Huaqin Telecom Technology Co Ltd
Priority to CN 201010600817 priority Critical patent/CN102074235B/en
Publication of CN102074235A publication Critical patent/CN102074235A/en
Application granted granted Critical
Publication of CN102074235B publication Critical patent/CN102074235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method of video speech recognition and search, comprising the following steps: 1) converting all video sounds into a text by speech recognition; 2) independently storing the texts or attaching the texts to videos; 3) selecting a plurality of words which occur maximally in the texts as word labels of the videos, wherein the word labels are added behind the file names of the videos; and 4) searching the word labels of all videos. The method can be used to search the videos widely and specifically, and carry out quick positioning in public security and private goods search.

Description

The method of video speech recognition and retrieval
Technical field
The present invention relates to the video manufacture field, the method for particularly a kind of video speech recognition and retrieval.
Background technology
Present cloud and search technique extensively apply in the various industries, present video search technology is also still in heuristic process, video search is big because of its data volume, be difficult for reason such as expressions with the search of image content or video segment does not also reach detail, and label that the video search of widespread use at present all is based on filename and artificial increase is used as keyword search.Simultaneously, speech recognition technology also has been applied in the every field widely, but only is single speech recognition at present, and most of for only at discerning than the voice of short-movie section, does not do deep research and utilization.Simultaneously at present video can intercept intermediate segment and play or caught sometime sectional drawing content, but is not applied in the search at present.
In view of this, those skilled in the art provide the method for a kind of video speech recognition and retrieval at the problems referred to above.
Summary of the invention
The invention provides the method for a kind of video speech recognition and retrieval, overcome the difficulty of prior art, can carry out extensive and pointed search, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously video.
The present invention adopts following technical scheme:
The invention provides the method for a kind of video speech recognition and retrieval, may further comprise the steps:
(1) the sound part with all videos changes text into by speech recognition;
(2) text is stored separately respectively or is attached in its video;
(3) choose in the text and the word tag of the highest plurality of words of flat rate to occur, after described word tag is added on the filename of video as this video;
(4) retrieve the word tag of all videos.
Preferably, the text in the described step (2) is preserved with the word file form.
Preferably, the text in the described step (2) is preserved with the TXT document form.
Preferably, the individual character number of the word tag in the described step (3) is defined as 3.
Preferably, the individual character number of the word tag in the described step (3) is defined as 5.
Preferably, the individual character number of the word tag in the described step (3) is defined as 10.
Owing to adopted above-mentioned technology, compared with prior art, the present invention can carry out extensive and pointed search to video, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously.
Further specify the present invention below in conjunction with drawings and Examples.
Description of drawings
Fig. 1 is the process flow diagram of the method for video speech recognition of the present invention and retrieval;
Fig. 2 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 1;
Fig. 3 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 2;
Fig. 4 is the process flow diagram of method of video speech recognition and the retrieval of embodiment 3.
Embodiment
Introduce three kinds of specific embodiments of the present invention below by Fig. 1 to 4.
As shown in Figure 1, the method for a kind of video speech recognition of the present invention and retrieval may further comprise the steps:
(1) the sound part with all videos changes text into by speech recognition;
(2) text is stored separately respectively or is attached in its video;
(3) choose in the text and the word tag of the highest plurality of words of flat rate to occur, after described word tag is added on the filename of video as this video;
(4) retrieve the word tag of all videos.
Text in the described step (2) is preserved with the word file form, or preserves with the TXT document form.
Preferably, the individual character number of the word tag in the described step (3) is defined as 3, or is 5, or is 10.
Actual operating position of the present invention is as follows:
Embodiment 1
As shown in Figure 2, in the public safety, in admission camera video content, obtain audio files and use speech recognition technology to handle accordingly, be stored in high in the clouds, or only preserve text beyond the clouds, with the actual audio-video document of other big data quantity memory bank storages easily, can carry out the screenshotss picture of single text retrieval or text, video segment and corresponding timeslice as result for retrieval at two kinds of situations during retrieval.
Embodiment 2
As shown in Figure 3, during the individual uses, can do the Internet video medium to video file equally similarly retrieves, the special application, enroll the memory location when for example putting article in order and quote corresponding Item Title, import corresponding Item Title during search and can find the article storage position, prevent because of the difficulty problem of forgeing or situation such as non-arrangement people finder exists of looking for, take when for example cleaning up the room and say: the clothing in summer is put here, daddy's shirt is put here, old mother's overcoat is put here, younger brother's pencil is put here, elder sister's cosmetics are all put here, when searching the input shirt, retrieve a plurality of shirt results then, according to screenshotss determine the target shirt timeslice or directly find the position to get final product.This domestic. applications can have been avoided greatly because of can not find the conflict that article or house person cause the reasons such as misunderstanding of the difference of same things memory, and the old man relatively poor at memory is especially convenient.
Embodiment 3
As shown in Figure 4, search at the Internet video medium, high in the clouds is analyzed with sound video and is transformed, and indicate by Time Line in the mode of similar captions, the user only need import corresponding text or says the content of wanting to search for (being converted to text by speech recognition technology equally) and can list corresponding captioned test and the screenshotss picture of video segment and corresponding Time Line during search.Example when the user only remembers the part lines of certain film, uses this technology to carry out video frequency searching at these part lines.
In summary, owing to adopted above-mentioned technology, the present invention can carry out extensive and pointed search to video, also can use this technology to locate fast at public safety and personal objects aspect searching simultaneously.
Above-described embodiment only is used to illustrate technological thought of the present invention and characteristics, its purpose is to make those skilled in the art can understand content of the present invention and implements according to this, can not only limit claim of the present invention with present embodiment, be all equal variation or modifications of doing according to disclosed spirit, still drop in the claim of the present invention.

Claims (6)

1. the method for video speech recognition and retrieval is characterized in that: may further comprise the steps:
(1) the sound part with all videos changes text into by speech recognition;
(2) text is stored separately respectively or is attached in its video;
(3) choose in the text and the word tag of the highest plurality of words of flat rate to occur, after described word tag is added on the filename of video as this video;
(4) retrieve the word tag of all videos.
2. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the text in the described step (2) is preserved with the word file form.
3. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the text in the described step (2) is preserved with the TXT document form.
4. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the individual character number of the word tag in the described step (3) is defined as 3.
5. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the individual character number of the word tag in the described step (3) is defined as 5.
6. the method for video speech recognition as claimed in claim 1 and retrieval, it is characterized in that: the individual character number of the word tag in the described step (3) is defined as 10.
CN 201010600817 2010-12-20 2010-12-20 Method of video speech recognition and search Active CN102074235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010600817 CN102074235B (en) 2010-12-20 2010-12-20 Method of video speech recognition and search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010600817 CN102074235B (en) 2010-12-20 2010-12-20 Method of video speech recognition and search

Publications (2)

Publication Number Publication Date
CN102074235A true CN102074235A (en) 2011-05-25
CN102074235B CN102074235B (en) 2013-04-03

Family

ID=44032753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010600817 Active CN102074235B (en) 2010-12-20 2010-12-20 Method of video speech recognition and search

Country Status (1)

Country Link
CN (1) CN102074235B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186663A (en) * 2012-12-28 2013-07-03 中联竞成(北京)科技有限公司 Video-based online public opinion monitoring method and system
CN103186557A (en) * 2011-12-28 2013-07-03 宇龙计算机通信科技(深圳)有限公司 Method and device for automatically naming sound record or video files
CN103631780A (en) * 2012-08-21 2014-03-12 鸿富锦精密工业(深圳)有限公司 Multimedia recording system and method
WO2014059863A1 (en) * 2012-10-18 2014-04-24 腾讯科技(深圳)有限公司 Subtitle querying method, electronic device and storage medium
CN104023176A (en) * 2014-06-03 2014-09-03 华为技术有限公司 Method and device of processing audio frequency and image information as well as terminal equipment
CN104090955A (en) * 2014-07-07 2014-10-08 科大讯飞股份有限公司 Automatic audio/video label labeling method and system
CN104375997A (en) * 2013-08-13 2015-02-25 腾讯科技(深圳)有限公司 Method and device for adding note information to instant messaging audio information
CN104469544A (en) * 2014-11-07 2015-03-25 重庆晋才富熙科技有限公司 Video marking method based on voice technology
CN104994404A (en) * 2015-07-06 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and device for obtaining keywords for video
CN105138670A (en) * 2015-09-06 2015-12-09 天翼爱音乐文化科技有限公司 Audio file label generation method and system
CN105898204A (en) * 2014-12-25 2016-08-24 支录奎 Intelligent video recorder enabling video structuralization
CN106649807A (en) * 2016-12-29 2017-05-10 维沃移动通信有限公司 Audio file processing method and mobile terminal
CN107391679A (en) * 2017-07-23 2017-11-24 肇庆高新区长光智能技术开发有限公司 Aid in methods of review, device and equipment
CN107422858A (en) * 2017-07-23 2017-12-01 肇庆高新区长光智能技术开发有限公司 Assisted learning method, device and terminal
CN109523990A (en) * 2019-01-21 2019-03-26 未来电视有限公司 Speech detection method and device
CN109547847A (en) * 2018-11-22 2019-03-29 广州酷狗计算机科技有限公司 Add the method, apparatus and computer readable storage medium of video information
CN109977233A (en) * 2019-03-15 2019-07-05 北京金山数字娱乐科技有限公司 A kind of idiom knowledge map construction method and device
CN110059207A (en) * 2019-04-04 2019-07-26 Oppo广东移动通信有限公司 Processing method, device, storage medium and the electronic equipment of image information
CN112784062B (en) * 2019-03-15 2024-06-04 北京金山数字娱乐科技有限公司 Idiom knowledge graph construction method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001043215A (en) * 1999-08-02 2001-02-16 Sony Corp Device and method for processing document and recording medium
CN1977264A (en) * 2004-06-28 2007-06-06 松下电器产业株式会社 Video/audio stream processing device and video/audio stream processing method
CN101281534A (en) * 2008-05-28 2008-10-08 叶睿智 Method for searching multimedia resource based on audio content retrieval
CN101382937A (en) * 2008-07-01 2009-03-11 深圳先进技术研究院 Multimedia resource processing method based on speech recognition and on-line teaching system thereof
CN101539929A (en) * 2009-04-17 2009-09-23 无锡天脉聚源传媒科技有限公司 Method for indexing TV news by utilizing computer system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001043215A (en) * 1999-08-02 2001-02-16 Sony Corp Device and method for processing document and recording medium
CN1977264A (en) * 2004-06-28 2007-06-06 松下电器产业株式会社 Video/audio stream processing device and video/audio stream processing method
CN101281534A (en) * 2008-05-28 2008-10-08 叶睿智 Method for searching multimedia resource based on audio content retrieval
CN101382937A (en) * 2008-07-01 2009-03-11 深圳先进技术研究院 Multimedia resource processing method based on speech recognition and on-line teaching system thereof
CN101539929A (en) * 2009-04-17 2009-09-23 无锡天脉聚源传媒科技有限公司 Method for indexing TV news by utilizing computer system

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186557A (en) * 2011-12-28 2013-07-03 宇龙计算机通信科技(深圳)有限公司 Method and device for automatically naming sound record or video files
CN103631780A (en) * 2012-08-21 2014-03-12 鸿富锦精密工业(深圳)有限公司 Multimedia recording system and method
CN103631780B (en) * 2012-08-21 2016-11-23 重庆文润科技有限公司 Multimedia recording systems and method
WO2014059863A1 (en) * 2012-10-18 2014-04-24 腾讯科技(深圳)有限公司 Subtitle querying method, electronic device and storage medium
CN103778131A (en) * 2012-10-18 2014-05-07 腾讯科技(深圳)有限公司 Caption query method and device, video player and caption query server
CN103778131B (en) * 2012-10-18 2017-02-22 腾讯科技(深圳)有限公司 Caption query method and device, video player and caption query server
US9456175B2 (en) 2012-10-18 2016-09-27 Tencent Technology (Shenzhen) Company Limited Caption searching method, electronic device, and storage medium
CN103186663B (en) * 2012-12-28 2016-07-06 中联竞成(北京)科技有限公司 A kind of network public-opinion monitoring method based on video and system
CN103186663A (en) * 2012-12-28 2013-07-03 中联竞成(北京)科技有限公司 Video-based online public opinion monitoring method and system
CN104375997A (en) * 2013-08-13 2015-02-25 腾讯科技(深圳)有限公司 Method and device for adding note information to instant messaging audio information
CN104023176A (en) * 2014-06-03 2014-09-03 华为技术有限公司 Method and device of processing audio frequency and image information as well as terminal equipment
CN104090955A (en) * 2014-07-07 2014-10-08 科大讯飞股份有限公司 Automatic audio/video label labeling method and system
CN104469544A (en) * 2014-11-07 2015-03-25 重庆晋才富熙科技有限公司 Video marking method based on voice technology
CN105898204A (en) * 2014-12-25 2016-08-24 支录奎 Intelligent video recorder enabling video structuralization
CN104994404A (en) * 2015-07-06 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and device for obtaining keywords for video
CN105138670A (en) * 2015-09-06 2015-12-09 天翼爱音乐文化科技有限公司 Audio file label generation method and system
CN105138670B (en) * 2015-09-06 2018-12-14 天翼爱音乐文化科技有限公司 Audio file label generating method and system
CN106649807A (en) * 2016-12-29 2017-05-10 维沃移动通信有限公司 Audio file processing method and mobile terminal
CN107422858A (en) * 2017-07-23 2017-12-01 肇庆高新区长光智能技术开发有限公司 Assisted learning method, device and terminal
CN107391679A (en) * 2017-07-23 2017-11-24 肇庆高新区长光智能技术开发有限公司 Aid in methods of review, device and equipment
CN109547847A (en) * 2018-11-22 2019-03-29 广州酷狗计算机科技有限公司 Add the method, apparatus and computer readable storage medium of video information
CN109547847B (en) * 2018-11-22 2021-10-22 广州酷狗计算机科技有限公司 Method and device for adding video information and computer readable storage medium
CN109523990A (en) * 2019-01-21 2019-03-26 未来电视有限公司 Speech detection method and device
CN109523990B (en) * 2019-01-21 2021-11-05 未来电视有限公司 Voice detection method and device
CN109977233A (en) * 2019-03-15 2019-07-05 北京金山数字娱乐科技有限公司 A kind of idiom knowledge map construction method and device
CN112784062A (en) * 2019-03-15 2021-05-11 北京金山数字娱乐科技有限公司 Idiom knowledge graph construction method and device
CN112784062B (en) * 2019-03-15 2024-06-04 北京金山数字娱乐科技有限公司 Idiom knowledge graph construction method and device
CN110059207A (en) * 2019-04-04 2019-07-26 Oppo广东移动通信有限公司 Processing method, device, storage medium and the electronic equipment of image information

Also Published As

Publication number Publication date
CN102074235B (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN102074235B (en) Method of video speech recognition and search
US9786279B2 (en) Answering questions using environmental context
CN109844708B (en) Recommending media content through chat robots
US11275895B1 (en) Generating author vectors
US10642887B2 (en) Multi-modal image ranking using neural networks
US8788495B2 (en) Adding and processing tags with emotion data
JP6313298B2 (en) Method for estimating user intention in search input of interactive interaction system and system therefor
CN104428767B (en) For identifying the mthods, systems and devices of related entities
KR102140177B1 (en) Answering questions using environmental context
US9465892B2 (en) Associating metadata with media objects using time
US20140172855A1 (en) Formation and description of user subgroups
US20090292686A1 (en) Disambiguating tags in folksonomy tagging systems
CN112364624B (en) Keyword extraction method based on deep learning language model fusion semantic features
CN112883248B (en) Information pushing method and device and electronic equipment
US20170300293A1 (en) Voice synthesizer for digital magazine playback
Li et al. Static and dynamic video summaries
CN113806588A (en) Method and device for searching video
WO2017164510A3 (en) Voice data-based multimedia content tagging method, and system using same
CN112822506A (en) Method and apparatus for analyzing video stream
Wang et al. Ranking User Tags in Micro-Blogging Website
JP2021184247A (en) Artificial intelligence for discovering contents
CN113641790A (en) Cross-modal retrieval model based on distinguishing representation depth hash
Gupta et al. Considering manual annotations in dynamic segmentation of multimodal lifelog data
CN113139121A (en) Query method, model training method, device, equipment and storage medium
Scardina Voice Recognition (Speaker Recognition)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 201203 Shanghai City, Pudong New Area Zhangjiang hi tech Park Keyuan Road No. 399 Building No. 1

Patentee after: HUAQIN TELECOM TECHNOLOGY Co.,Ltd.

Address before: 201203 Shanghai City, Pudong New Area Zhangjiang hi tech Park Keyuan Road No. 399 Building No. 1

Patentee before: SHANGHAI HUAQIN TELECOM TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: Building 1, No. 399 Keyuan Road, Zhangjiang hi tech park, Pudong New Area, Shanghai, 201203

Patentee after: Huaqin Technology Co.,Ltd.

Address before: Building 1, No. 399 Keyuan Road, Zhangjiang hi tech park, Pudong New Area, Shanghai, 201203

Patentee before: Huaqin Technology Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Building 1, No. 399 Keyuan Road, Zhangjiang hi tech park, Pudong New Area, Shanghai, 201203

Patentee after: Huaqin Technology Co.,Ltd.

Address before: 201203 Shanghai City, Pudong New Area Zhangjiang hi tech Park Keyuan Road No. 399 Building No. 1

Patentee before: HUAQIN TELECOM TECHNOLOGY Co.,Ltd.