CN101038742B - Apparatus and method for assistant voice remote control using image feature - Google Patents

Apparatus and method for assistant voice remote control using image feature Download PDF

Info

Publication number
CN101038742B
CN101038742B CN 200610058563 CN200610058563A CN101038742B CN 101038742 B CN101038742 B CN 101038742B CN 200610058563 CN200610058563 CN 200610058563 CN 200610058563 A CN200610058563 A CN 200610058563A CN 101038742 B CN101038742 B CN 101038742B
Authority
CN
China
Prior art keywords
image
voice
library
speech
instruction
Prior art date
Application number
CN 200610058563
Other languages
Chinese (zh)
Other versions
CN101038742A (en
Inventor
洪进福
Original Assignee
鸿富锦精密工业(深圳)有限公司
鸿海精密工业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 鸿富锦精密工业(深圳)有限公司, 鸿海精密工业股份有限公司 filed Critical 鸿富锦精密工业(深圳)有限公司
Priority to CN 200610058563 priority Critical patent/CN101038742B/en
Publication of CN101038742A publication Critical patent/CN101038742A/en
Application granted granted Critical
Publication of CN101038742B publication Critical patent/CN101038742B/en

Links

Abstract

A device or a method for helping voice remote control by using video characteristics is suitable for a remote control device having a video and voice capturing function. A voice characteristic library, a video characteristic library and a command library for voice remote control operation are provided in the device, when operating the voice remote control operation, the method comprises: inputting a voice via a radiogram device, analyzing the voice characteristic and searching the voice characteristics library to find a command gather which is approached to the voice characteristic in a relative command library; capturing a real-time video by a video capturing member and searching the video characteristic library by using the real-time video; checking which one in the command gather is a needed command which accords with the real condition upon operation by an operator by using a searching result of the video characteristic library; and executing the needed command which accords with the real condition upon operation by the operator. The method of checking the voice command by video characteristic is capable of increasing accuracy of voice control and effectively reducing operation errors.

Description

利用影像特征辅助语音遥控的装置及方法 Wherein the auxiliary image using the remote control apparatus and method for speech

技术领域 FIELD

[0001] 本发明是有关一种利用影像特征辅助语音遥控的装置及方法,且特别是有关一种利用影像特征复核语音指令的合理性,增加对语音指令辨识的准确度。 [0001] The present invention relates to an image by using a remote control apparatus and method wherein an auxiliary voice, and more particularly to a rational use of image features review voice commands, to increase the accuracy of recognition of voice commands.

背景技术 Background technique

[0002] 过去,数码相机、音响或录放机之类的影音设备,除可利用机上的按钮操作以外, 还可以利用遥控器进行控制。 [0002] In the past, digital cameras, audio recorders or audio-visual equipment and the like, in addition to using a button on the operation of the machine, it also can be controlled using the remote controller. 操作者只需要利用遥控器上的按键操作,完全不需碰触到影音设备。 The operator only need to use the key operation on the remote control, all without touching the audio-visual equipment. 但无论如何,其先决条件是操作者必须掌握到遥控器,一旦遥控器丢失或不在手边,便立即无法享有这些遥控的便利。 But in any case, it is presupposed that the operator must master the remote control, once the remote control is lost or not at hand, they can not immediately enjoy the convenience of remote control.

[0003] 新的语音遥控技术,可让操作者无需持取任何遥控器即可进行遥控。 [0003] The new voice remote control technology, allows the operator need not hold any remote control to take remote control. 其原理是利用收音装置(如麦克风)接收操作者的语音,然后分析其语音特征,从指令库搜寻一相对的操作指令,然后执行该指令。 The principle is to receive the operator's voice by the sound pickup apparatus (e.g., a microphone) and analyze the voice feature, a search operation instruction from the instruction opposing library, then the instruction is executed. 语音辨识技术已经发展多年,国内外不断有人推陈出新,提出相关的专利。 Voice recognition technology has been developed over the years, there have been at home and abroad to introduce new, make relevant patents.

[0004] 以美国专利第US2005/0071169A1号公开文件的内容为例,该案的发明人考虑到不同操作者讲话的速度往往会快慢不一,因此其对策是在完成接收与开始执行的时间点之间自动加上一段延迟,以便确定该语音指令是否已经下达完毕。 [0004] US Patent content to US2005 / 0071169A1 number of public documents, for example, the case of the inventor taking into account the different operators often speak of slow or fast speed, so the strategy is the completion point in time to receive and begin execution a delay coupled between the automatic, to determine if the voice instruction has been issued is finished. 此专利公开的内容提出时间轴的观念,但依然是围绕着声音的信息在作处理。 The contents of this patent disclosure proposes the concept of the timeline, it is still around in the sound information for processing.

[0005] 再以美国专利第US2005/0105575A1号公开文件所揭示的内容为例,该发明考虑到的问题是同一语音指令可能会让室内不同设别同时反应,会导致无法预期的错误与混乱。 [0005] and then to the contents of US Patent No. US2005 / No. 0105575A1 public documents revealed, for example, the invention contemplates the question is the same voice commands may make different interior design do not react the same time, can lead to unexpected errors and confusion. 该案发明人所提出的对策,是使同一房间内的各项遥控器材都配置一摄影机及麦克风, 但该摄影机的装置目的只是为了侦测操作者是否对该器材发出指令,藉以避免上述混乱情形发生。 Measures proposed by the inventor of the case, is to make the remote control devices in the same room are equipped with a camera and microphone, but the purpose of the device's camera to detect only whether to issue an instruction to the operator of the equipment, in order to avoid the above situation confusion occur. 由于该发明配备摄影机的目的只是用来鉴别应接受语音指令与否,并非用于提升语音辨识的准确度,因此与本发明不同。 Since the object of the invention is equipped with the camera should only be used to identify whether or not to accept voice commands, it is not intended to improve speech recognition accuracy, and therefore unlike the present invention.

[0006] 此外,美国专利第6,452,625B1号所揭露的紧致式录像显微镜,虽然里面也设有麦克风与影像撷取设备,但其影像撷取设备主要是录像功能,而麦克风只要是提供单纯的录音或语音控制,但并未谈到如何利用影像信息辅助语音控制,同时也没谈到它如何以语音操作录像显微镜。 [0006] Further, U.S. Pat. No. 6,452,625B1 type disclosed Compact video microscope, although there is also a microphone and image capture device, but the image capture device main video function, as long as the microphone It provides simple recording or voice control, but did not talk about how to use voice control auxiliary video information, but it also did not talk about how to operate the voice recording microscope.

[0007] 美国专利第6,289,140B1号也揭露了一种可应用于影像撷取装置的语音控制技术,提供一套语音指令的辨识方法及执行上述所需的硬件架构。 [0007] U.S. Patent No. 6,289,140B1 also discloses a method for identifying the image capturing device may be applied to voice control technology to provide a voice instruction and executes the required hardware architecture. 其后的美国专利第6,762,692B1好也提出在屏幕上显示语音指令树的方式,帮助使用者读出预定的语汇来操作设备。 U.S. Patent No. 6,762,692B1 first subsequent also made good voice instruction tree display on the screen mode to help the user to read out a predetermined operation of the apparatus vocabulary. 然而,以上两专利完全没有想到以影像信息辅助语音控制指令的辨识。 However, the above two patents did not expect the image identification information to facilitate voice control command.

[0008] 以上提到的专利文献及一般的语音辨识系统,都是单纯地收集语音、分析语音特征,然后依该语音特征从指令库中找出可对应的指令。 [0008] Patent literature and general speech recognition system mentioned above, are simply collected voice, analysis of voice features, and then find the corresponding instruction from the instruction store according to the speech characteristics. 但语音辨识的条件会随操作者的口音、速度疾徐不一及当下的环境背景而异,其比较条件与影像因素可能因人、因地而异,颇为复杂。 However, speech recognition will vary with conditions accent operator, speed, Ji Xu different background and current circumstances vary, it compares the conditions and factors that may image from person to person, vary, quite complex. 如何提高语音控制的辨识率是当前研发上的一大挑战。 How to improve the voice recognition rate control is a major challenge on the current research and development. 如何提升语音指令的辨识率,已是当前各公司努力竞逐的研发重点。 How to enhance the voice command recognition rate, is currently the companies trying to compete for research and development focus. 发明内容 SUMMARY

[0009] 本发明的目的就是要在语音指令的辨识过程中加上语音特征的复核,藉以提高语音控制的正确率。 [0009] The object of the present invention is to add features in the review of voice recognition process in the voice instruction, thereby improving the accuracy of voice control.

[0010] 为达到上述目的,本发明提出一种利用影像特征辅助语音遥控的装置及方法。 [0010] To achieve the above object, the present invention provides an image feature using a remote control apparatus and method for voice-assisted. 所述装置的内部同时设有一语音特征库、一影像特征库及一指令库供语音遥控之用,且在执行语音遥控操作时,包括下列步骤:(a)通过一收音装置输入一语音,并以该语音的特征去对比一语音特征库,籍此从一与该语音特征库对应的指令库中挑出所有能与该语音特征对应的指令,将所述指令集合成一指令集;(b)通过影像撷取元件撷取一实时影像,并利用该实时影像搜寻一影像特征库;(C)利用该影像特征库的搜寻结果从该指令集中筛选出符合使用者操作时的实际状况需要的指令;及(d)执行该符合使用者操作时的实际状况需要的指令。 While the interior of the apparatus is provided with a speech feature library, a video library, and wherein a remote command library for speech purposes, and, when performing remote operation of voice, comprising the steps of: (a) an input means via a radio speech, and in contrast characteristic of the speech to a speech feature library, thereby pick the speech feature from a library corresponding to all the instructions in the instruction store energy corresponding to the voice feature, the set of instructions to a synthetic instruction set; (b) an image capturing device by capturing a live image and a search image by using the real-time image feature database; (C) by using the image feature database search result set selected instruction meet the actual condition when the user operates the instruction required from ; instruction and (d) Implementation meet the actual condition when the user operation required.

[0011] 本发明的利用影像特征辅助语音遥控的装置,适用于具有影像及语音撷取功能的遥控设备,如数码相机、数码录放机、手术房摄录像机及一般的摄像手机等。 [0011] verbal remote control device of the present invention utilizing an auxiliary image feature, applicable to the remote control device having a voice and an image capturing function, such as digital cameras, digital VCR, camcorder operating room and general camera phones and the like.

[0012] 本发明以影像特征复核语音指令的方法,可增加语音控制的准确度,并有效减少操作上的错误。 [0012] In the present invention, a method wherein an image review voice instruction, the voice control can increase accuracy and reduce operator errors.

附图说明 BRIEF DESCRIPTION

[0013] 图1是本发明利用影像特征辅助语音遥控的装置的实施例方块图。 [0013] FIG. 1 is a feature of the present invention utilizes an auxiliary image embodiment a block diagram of the voice remote control device.

[0014] 图2是本发明利用影像特征辅助语音遥控的方法的实施例方块图。 A block diagram of the embodiment [0014] FIG. 2 is a secondary image features of the present invention using the verbal remote control method.

[0015] 图3是本发明利用影像特征辅助语音遥控的方法的变化实施例方块图。 [0015] FIG. 3 is a feature of the present invention utilizing an auxiliary image changes according to the verbal remote control block diagram of a method embodiment.

[0016] 主要元件符号说明: [0016] Description of Symbols principal elements:

[0017] a、b、c、cl、c2、d、dl [0017] a, b, c, cl, c2, d, dl

[0018] 10影像撷取装置 [0018] The image capturing device 10

[0019] 12影像感测模块 [0019] The image sensing module 12

[0020] 14显示屏 [0020] Display 14

[0021] 16存储器 [0021] Memory 16

[0022] 18传输接口 [0022] 18 transmission interface

[0023] 20麦克风 [0023] Microphone 20

[0024] 2IA语音特征库 [0024] 2IA speech feature library

[0025] 2IC指令库 [0025] 2IC instruction library

具体实施方式 Detailed ways

[0026] 为让本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附图式,作详细说明如下: [0026] In order to make the above-described objects, features and advantages of the present invention can be more fully understood by reading the following preferred embodiments and accompanying figures, described in detail below:

[0027] 请参图1所示,其是一依照本发明所实施的影像撷取装置10的实施例,包括一镜头模块11、一影像感测模块12、一影像处理装置13、一显示屏14、一数据存储模块15、一存储器16、一处理器单元17、一传送接口18、一按键19、一麦克风20及一语音辨识装置21等组成。 [0027] Referring to FIG. 1, which is an embodiment of an image capturing device in accordance with embodiment 10 of the present invention, includes a lens module 11, an image sensing module 12, an image processing apparatus 13, a display screen 14, a data storage module 15, a memory 16, a processor unit 17, a transmission interface 18, a key 19, a microphone 20 and a voice recognition device 21 and other components. 其中,麦克风20用于输入声音,镜头模块11用于摄入光学影像并经由感测模块12 Wherein the microphone 20 for inputting sound, an optical lens module 11 for intake and image sensing module 12 via

步骤 step

11镜头模块13影像处理装置15数据储存模块17处理单元19按键 11 camera module 13 the image processing unit 15 data storage module 17 is a key processing unit 19

21语音辨识装置2IB影像特征库产生数组影像,再经由影像处理装置13处理而通过显示屏14显示,在数据存储模块15及存储器16存储。 2IB speech identifier 21 wherein the image array to produce an image database, and then through the image processing apparatus 13 processes the display screen 14, the data storage module 15 and the memory 16 through the memory. 操作时,通过按键19及语音辨识装置21对处理器单元17输入指令,进行拍摄、录像、浏览、增删档案或传输等操作。 In operation, input through the key 19 and the 21 pairs of the processor unit 17 commands the speech identifier, shooting, recording, browsing, add or delete other files or the transfer operation. 传输接口18可通过一般的射频发射模块,或是与计算机、手机或其它影音设备监理联机的蓝牙通信模块、USB端口、1394端口或光纤通信端口等。 Transmission interface 18 and the like by a general radio frequency transmission modules, or with a computer, cell phone or other audio-visual equipment online supervision Bluetooth communication module, USB port, a 1394 port, or an optical fiber communication port. 由于此影像撷取装置10除语音辨识装置21以外,其余都是已知组件的应用,在此对于已知组件的部分不予赘述。 Because the image capturing device 10 other than the voice recognition device 21, the remaining application components are known, are not repeated herein for the part known components.

[0028] 语音辨识装置21包括一语音特征库21A、影像特征库21B、及一指令库21C。 [0028] The speech recognition device 21 includes a speech feature library 21A, image characteristic database 21B, and a command library 21C. 其中, 指令库21C的指令分别与语音特征库21A及影像特征库21B对应,可根据通过麦克风20输入的语音特征对比语音特征库21A,藉以在指令库21C中找出该语音特征可对应的指令。 Wherein the command instruction library 21C respectively correspond to the voice and image feature library signature library 21A. 21B, may be, so as to identify the instruction corresponding to the voice feature may be in the instruction library feature 21C according to contrast the speech features of the input speech through the microphone 20 library 21A . 由于每个人讲话疾徐不一、高低不同,加上音量及口音也难免互有差异,因此如果对比的结果仅取唯一的话,极可能产生误判。 Since each person speak different Ji Xu, high and low, plus volume and accent differences are bound to each other, so if the comparison of the results were only only words most likely to produce false positives. 为此,可放宽对比条件,将接近语音特征的指令一齐挑出, 成为指令集。 For this reason, contrast can be relaxed condition, wherein the proximity voice command picked together, become instruction set.

[0029] 当镜头模块11摄入光学影像并经由影像感测模块12及影像处理装置13,产生实时反应拍摄现场环境的实时影像后,语音识别装置21撷取该实时影像与影像特征库21B对比,籍此对比的结果对从指令库21C所挑出的指令集进行复核或筛选,判断出符合使用者操作时的实际状况需要的指令,然后通过处理器单元17执行。 [0029] When the lens module 11 and the optical image via ingestion image sensing module 12 and an image processing means 13, to generate real-time response webcam shooting site environment, the speech recognition device 21 and the image capturing characteristics of the image in real time comparison database 21B whereby the result of the comparison of a set of instructions from the instruction store is picked 21C or screened for review, it is determined in line with the actual instruction operation condition when the user needs, and then executed by the processor unit 17.

[0030] 影像特征库21B储存多个与所述指令库21C的指令对应的影像特征。 A plurality of image characteristics with the instruction corresponding to the command library of 21C [0030] stored image characteristic database 21B. 所谓的影像特征,可包括但不限于所测出的不同明暗程度、形体轮廓的形态。 So-called image features, may include, but is not limited to the form of different degrees of brightness measured, compact profile. 譬如可使语音辨识装置21 根据影像所反映的拍摄现场环境的明亮程度,判断操作者要变更ISO值的语音指令究竟为变大或变小;或根据形体轮廓分析判断人形的位置,调整画面中需要锁定对焦的位置。 Voice recognition device 21 can for example according to the brightness of the image reflected in the shooting field environment, the operator determines to change the voice command exactly as ISO value becomes larger or smaller; determination or analysis position according to Figures molded contour adjustment screen You need to lock the focus position. 影像特征库21B与指令库21C的对应关系,可在组装过程预先录制,以定义不同影像特征所对应的不同指令集。 21B correspondence relationship between image feature database 21C and command library can be pre-recorded in the assembly process, different instructions to different image characteristics corresponding to the definition set. 当然,影像特征库21B的内容和指令库21C的对应,也可售后由操作者自行改变,依其职业用途或特定使用目的加以编辑或增、删等。 Of course, the content of the library and the instructions corresponding to the image feature database 21C and 21B, itself may also be changed by the operator after sales, edit or add, delete, etc. according to their occupation purpose or specific purpose.

[0031] 指令库21C的指令内容可在组装过程预先录制。 [0031] 21C instruction store instruction may be pre-recorded content during assembly. 例如可使其含有第一指令集,包括浏览是的指令:“存储(save) ”、“删除(delete) ”、“放大(zoom in) ”、“缩小(zoom out) ”、 “左(left)”、“右(right)”、“上(up)”、“下(down)”、“传送(send) ” 或“全部传送(send all)”等等语音遥控指令。 For example it may contain a first set of instructions, including instructions are browsing: "Storage (Save)", "Delete (Delete)", "enlarge (zoom in)", "reduction (zoom out)", "left (left ) "," right (right) "," on (up), "" lower (Down) "," transfer (Send) "or" total transmission (send all) "remote command voice and the like.

[0032] 本实施例的指令库21C也可包括拍摄时要移动对焦点时所需的第二指令集, 例如:要使对焦点锁定在画面中人形的脸部的“对脸(onface)”、使对焦点左移的“左(left) ”、使对焦点上移的“上(up) ”、使对焦点下移的“下(down),,等语音遥控指令。 Instruction library 21C [0032] embodiment of the present embodiment may also include a second command to move the focus set desired shooting, for example: point to make the locking face of the screen humanoid "on face (onface)" the focus of the left "left (left)", so that "the (up)", so that "the focus is down (down) ,, like the voice of the remote command to shift focus.

[0033] 此外,本实施例的指令可21C还可包括光线不足时所需的第三指令集,譬如:光线太暗时要使IS 0值提高的“上(up)”、光线太亮时而要使ISO值下降的“下(down)”、要开启闪光灯的“开(on)”、开启闪光灯之后还要增强闪光灯亮度的“上(up)”及要降低闪光灯亮度的“下(down),,等语音遥控指令。 [0033] Further, the instructions of this embodiment may further include a third set of instructions 21C required when lighting is poor, for example: when the light is dim To increase the value of IS 0 "on the (up)", sometimes light is too bright to be after the fall of the ISO value "under (down)", to open the flash "on (on)", the flash but also to enhance the brightness of the flash "on (up)" and to reduce the amount of flash light "next (down), , voice and other remote control commands.

[0034] 请参照图1及图2,其是依照本发明利用影像特征辅助语音遥控的方法的实施例, 包括如下步骤: [0034] Referring to FIGS. 1 and 2, which is an embodiment of a method according to the verbal remote control of the auxiliary image feature with the present invention, comprising the steps of:

[0035] (a)通过一收音装置输入一语音,并以该语音的特征去对比一语音特征库,籍此从一与该语音特征库对应的指令库中挑出所有能与该语音特征对应的指令,将所述指令集合成一指令集;[0036] (b)通过影像撷取元件撷取一实时影像,并利用该实时影像去对比一影像特征库; [0035] (a) by a radio means inputs a voice, and is characterized in that the speech to a speech feature comparison database, whereby all energy corresponds to pick the speech features from a speech feature library with the instruction corresponding to the library instructions, the set of instructions a synthetic instruction set; [0036] (b) by the image capturing device capturing a real image, and using the comparison to a real-time video image characteristic database;

[0037] (c)利用影像特征库的搜寻结果从该指令集中筛选出符合使用者操作时的实际状况需要的指令;及 [0037] (c) using the image feature database search results from the instruction set of the instruction selected in line with the actual situation when the user operates necessary; and

[0038] (d)执行该符合使用者操作时的实际状况需要的指令。 [0038] (d) an instruction execution condition when the actual operation requires user compliance.

[0039] 如此以影像特征辅助语音遥控的方法,可增加语音辨识的准确度,可有效减少操作上的错误。 [0039] In such a method wherein an image of the remote auxiliary voice, speech recognition accuracy can be increased, which can effectively reduce operator errors.

[0040] 其中在(a)步骤里所谓的收音装置,可利用图1中的麦克风20加以实现。 [0040] wherein in step (a) in a so-called radio apparatus 20 may be implemented using a microphone in FIG. 输入语音后在语音特征库21A找出相符的语音特征,藉以找出指令库21C中所有能与该特征对应的指令,譬如:当语音为“上(up)”时,同样的语音指令有可能为第一指令集的“往画面的上方浏览”、第二指令集的“令对焦点上移”或第三指令集的“令ISO值提高”。 After the input speech in the speech feature library to identify voice characteristics match 21A, 21C so as to find out all the instructions in the instruction store energy corresponding to the characteristics, such as: when the "on (up)" when the voice, the same voice command is possible a first instruction set for the "top of the screen to the browser", the second instruction set "command to shift the focal point" of the third set of instructions or "ISO command value increased." 此步骤将相关指令挑出来而集合成一指令集。 This step will be to pick out the relevant instructions grouped into a set of instructions.

[0041] 在(b)步骤及(C)步骤里,影像撷取元件可通过图1中的影像处理装置13加以实现。 [0041] In step (b) and (C) step, the image capturing element can be realized by the image processing apparatus 113 in FIG. 影像处理装置13产生一实时影像,语音识别装置21将此实时影像与影像特征库21B的影像特征对比,以筛选指令集,藉以从指令集中挑出符合使用者操作时的实际状况需要的指令。 The image processing means 13 generates a real-time video, speech recognition device 21 with this webcam video image characteristic feature library 21B contrast to screen a set of instructions, whereby the actual state of focus pick fit the user instruction operation from the instruction requires. 例如,当从影像特征库21B对比的结果是属于目前已无新影像输入的情景,则推定使用者目前正在浏览,因此语音指令的“上(up)”应为第一指令集中的“往画面的上方浏览”; 而当影像特征库21B所对比的结果是属于目前虽有新的画面输入但光线不足,则可推定该语音指令应为第三指令集中的“令I SO值提高”;但若对比的结果属于目前的光线正常而且有新的画面输入,则该语音指令可推定为第二指令集中的令对焦点上移”。 For example, when the comparison result from the image feature database 21B is no longer part of the current scenario of new video input, it is estimated that the user is currently browsing, and therefore the voice instruction "on (up) 'should be set to a first instruction" to the screen the above browsing "; and when the result of comparison of the image feature database 21B belonging to a new screen although the current input but insufficient light, should be presumed that the voice instruction is the third instruction set" so I SO is increased by "; however, If the comparison result belongs to the current and new normal light input screen, the voice instruction may be estimated as a second instruction set to make the shift of focus. "

[0042] 熟习这项技术的人应不难从以上说明及实施例推知其它可行的变换,并依不同消费群的偏好或需要加以调整变化。 [0042] The person skilled in the art from the foregoing description and should not be difficult to infer other possible embodiments transformation and adjusted according to changes in preferences or needs of different consumers. 譬如提供过多的语音指令,对善忘的使用者而言是一项无法接受的缺点,为此设计者必须想办法尽量缩减语音指令的数量。 Such as providing too many voice commands, the user is a forgetful terms unacceptable shortcomings, for which designers must find a way to try to reduce the number of voice commands. 然而,缩减语音指令数量的结果,势必会遇到无法预先定义出判断准则的情况。 However, the reduction in the number of voice commands result, is bound to encounter the situation can not be defined in advance of the judging criteria. 因此可以请参照图3,采取一变化实施例,其步骤如下: Referring to FIG 3 it is possible to adopt a variant embodiment, the following steps:

[0043] (a)通过一收音装置输入一语音,并以该语音的特征去对比一语音特征库,籍此从一与该语音特征库对应的指令库中挑出所有能与该语音特征对应的指令,将所述指令集合成一指令集; [0043] (a) by a radio means inputs a voice, and is characterized in that the speech to a speech feature comparison database, whereby all energy corresponds to pick the speech features from a speech feature library with the instruction corresponding to the library instructions, the set of instructions a synthetic instruction set;

[0044] (b)通过影像撷取元件撷取一实时影像,并利用该影像特征去对比一影像特征库; [0044] (b) by the image capturing device capturing a real-time image, using the image feature comparison to an image feature database;

[0045] (Cl)利用影像特征库的搜寻结果从该指令集中筛选出多个符合使用者操作时的实际状况需要的指令; Search Results [0045] (Cl) by using the image feature database from a plurality of command instruction set selected in line with the actual situation of a user operation required;

[0046] (c2)利用一显示器显示该多个符合使用者操作时的实际状况需要的指令,供操作者从中选择一指令•'及 [0046] (c2) using a display to display the plurality of instructions comply with the actual situation when a user operation required for the operator to select by a command • 'and

[0047] (dl)执行操作者所选择的指令。 [0047] instruction (DL) is selected by the operator performs.

[0048] 虽然图3的实施例最后还需要使用者以语音传达他们想要选择的指令,但是在步骤(Cl)已利用影像特征筛选出多个符合使用者操作时的实际状况需要的指令集,并且将这些筛选出的指令集显示与屏幕上(可通过图1中的显示屏14加以显示),但这种操作方式对善忘掉操作者而言,能看着屏幕再读出符合使用者操作时的实际状况需要的指令,将如释重负。 [0048] While the embodiment of Figure 3 is the last needed to convey user voice commands they wish to select, in step (Cl) was screened using a plurality of image features meet the actual condition when an instruction operation of a user needs to set , and the selected set of instructions displayed on the screen (which can be displayed by the display 14 in FIG. 1), but this mode of operation the operator forget to good, it can be read out in line with the user looking at the screen command when the actual state of operation required, to relief.

[0049] 此含有影像信息分析辅助的语音遥控方法,将有助于增加语音遥控的准确度,并可有效地减少操作上的错误。 [0049] This auxiliary information analyzing images containing voice remote control method, the verbal remote control will help to increase the accuracy, and can effectively reduce operator errors.

[0050] 本发明与已知方法相比较,其优点包括: [0050] The present invention compared with known methods, which advantages include:

[0051] 1.如本发明的方法,利用实施取得的影像特征对照影像特征库,可依所述的拍摄情境对已被挑选的语音指令进行筛选或复核,并藉以过滤出符合使用者操作时的实际状况需要的语音指令,有助于提高语音遥控的准确性。 [0051] 1. The method of the present invention when, using the image feature embodiment taken image control feature library, to follow the imaging context have been selected for screening or reviewing a voice instruction, and thereby meet the user operating filtered the actual needs that voice commands help improve the accuracy of remote voice.

[0052] 2.由于数码相机本身已经有影像撷取及处理的功能,所以实施本发明的方法并不需要再增加额外的硬件组件成本,换言之,本发明应用于数码相机只需在机内原有的储存模块加入语音特征库、影像特征库及相对应的指令库及韧体,即可提升语音遥控的准确度。 [0052] 2. Due to the digital camera itself has image capturing and processing functions, the method of the present invention does not require additional hardware components to add the cost, in other words, the present invention is applied to a digital camera in the machine only the original the added speech feature library storage module, wherein the image database and the corresponding database and firmware instructions, the remote control can improve speech accuracy.

[0053] 以上详细说明是针对本发明的较佳实施例所提供的具体说明,但该实施例并非用以限制本发明的保护范围,凡未脱离本发明技术精神所做的等效实施或更动,均应包含于本案的保护范围中。 [0053] The above detailed description is directed to particular embodiments described embodiment are equivalent to the preferred embodiment of the present invention is provided, but this embodiment is not intended to limit the scope of the present invention and which do not depart from the technical spirit of the invention or made action should be included in the scope of protection of the present case.

Claims (9)

1. 一种利用影像特征辅助语音遥控的装置,该利用影像特征辅助语音遥控的装置是在一影像撷取装置装设一收音装置及一语音辨识装置;其特征在于,该影像撷取装置包括一镜头模块、一影像感测模块、一影像处理装置及一处理器单元,该影像感测模块用于将镜头模块所摄入的光线转为影像,该影像处理装置用于提供一个实时影像;该收音装置用于接收外界的语音指令;所述语音辨识装置含有:一个指令库,存储多个指令供操作所述影像撷取装置;一个语音特征库,存储多个与所述指令库的指令对应的语音特征,藉以挑选出符合语音特征的指令成为一个指令集;及一个影像特征库,储存多个与所述指令库的指令对应的影像特征;该语音识别装置将该实时影像与该影像特征库的影像特征进行对比并产生一个对比结果,然后该处理器单元根据该对比结 A remote control apparatus using image features an auxiliary voice, video auxiliary wherein the verbal remote control device using an image capturing device is mounted in a sound pickup device and a voice recognition device; wherein the image capturing device comprises a camera module, an image sensing module, an image processing device and a processor unit, the image sensing means for the lens module ingested into image light, the image processing apparatus for providing a real image; the sound pickup means for receiving an external voice instruction; said voice recognition apparatus comprising: an instruction store, storing a plurality of instructions for operating the image capturing means; a voice command feature database storing a plurality of instructions and the library corresponding to speech features so as to meet the command selected speech features into a set of instructions; and a library image characteristics, image characteristics and a plurality of instructions stored corresponding to said instruction store; the speech recognition means the image and the real image image feature comparison signature database and generating a comparison result, then the processor unit based on the comparison junction 从该指令集中筛选一个指令并执行。 A filter from the instruction set and instruction execution.
2.如权利要求1所述的利用影像特征辅助语音遥控的装置,其特征在于,所述收音装置是一个麦克风。 2 using the image features of claim 1 of the remote control device assisted speech, wherein said sound pickup means is a microphone.
3.如权利要求1所述的利用影像特征辅助语音遥控的装置,其特征在于,所述影像撷取装置是一数码照相机、一数码摄录像机或拍照手机。 Using the image features as claimed in claim 1 or 2 of the remote control device assisted speech, wherein said image capture device is a digital camera, a digital camcorder or a camera phone.
4.如权利要求1所述的利用影像特征辅助语音遥控的装置,其特征在于,所述指令库包含有一拍摄时要移动对焦点的指令。 4. The use of the image features of claim 1 of the remote control device assisted speech, wherein said library contains instruction command to move a focus during photographing.
5.如权利要求1所述的利用影像特征辅助语音遥控的装置,其特征在于,所述影像特征库包含有可供对比拍摄时亮度是否足够的特征,且该指令库中至少包含有一指令与影像对应。 5 using the image features of claim 1 of the remote control device assisted speech, wherein said image characteristic database contains for comparison when the object brightness is adequate features of the command library comprises at least one instruction image map.
6.如权利要求1所述的利用影像特征辅助语音遥控的装置,其特征在于,所述指令库包含光线太暗时使ISO值提高的“上”及光线太亮时使ISO值下降的“下”。 6. The use of an auxiliary image feature verbal remote control device according to claim 1, wherein the instruction store comprises contacting ISO ISO value decreases when the value so increased when the light is too dark "on" light is too bright and "lower . "
7.如权利要求1所述的利用影像特征辅助语音遥控的装置,其特征在于,所述指令库包含要开启闪光灯时的“开”、在开启闪光灯之后要增强闪光灯亮度的“上”及要降低闪光灯亮度的“下”的语音遥控指令。 7. The apparatus wherein the auxiliary image using the remote control voice claim 1, wherein said library contains instructions "on" to "On" to the flash, the flash after flash to enhance the brightness and to reducing the amount of flash light "lower" remote command voice.
8. 一种利用影像特征辅助语音遥控的方法,该方法是利用一影像撷取装置装设一语音辨识装置,并利用一收音装置接收操作者所发出的语音指令加以遥控;所述语音辨识装置里含有一语音特征库、一影像特征库及一指令库,其特征在于,所述方法包含下列步骤:(a)通过所述收音装置输入一语音,并以该语音的特征去对比语音特征库,籍此从指令库中挑出所有能与该语音特征对应的指令,将所述指令集合成一指令集;(b)通过影像撷取装置撷取一实时影像,并利用该影像特征去对比影像特征库;(c)利用影像特征库的对比结果从所述指令集中筛选出一符合使用者操作时的实际状况需要的指令;及(d)执行符合使用者操作时的实际状况需要的指令。 A verbal remote control using image features assist method, which uses an image capturing apparatus mounted a speech recognition apparatus using a sound pickup means for receiving voice commands issued by an operator to be remote; said voice recognition means contains a speech feature library, a video library, and wherein a command library, wherein said method comprises the steps of: (a) means for inputting a speech by the radio, and a characteristic of the speech to the speech feature comparison library whereby instructions can pick all corresponding to the speech feature from the instruction store, the set of instructions a synthetic instruction set; (b) a real-time image captured by the image capturing apparatus using the image characteristic to contrast images feature database; (c) using the results of comparison of image features library screened instruction set from the instruction in line with the actual situation of a user operation required; and (d) meet the instruction execution condition when the user operates the actual needs.
9. 一种利用影像特征辅助语音遥控的方法,该方法是利用一影像撷取装置装设一语音辨识装置,并利用一收音装置接收操作者所发出的语音指令加以遥控;所述语音辨识装置里含有一语音特征库、一影像特征库及一指令库,其特征在于,所述方法包含下列步骤:(a)通过所述收音装置输入一语音,并以该语音的特征去对比语音特征库,籍此从指令库中挑出所有能与该语音特征对应的指令,将所述指令集合成一指令集;(b)通过影像撷取装置撷取一实时影像,并利用该影像特征去对比影像特征库; (Cl)利用影像特征库的搜寻结果从所述指令集中筛选出多个符合使用者操作时的实际状况需要的指令;(c2)利用一显示器显示该多个符合使用者操作时的实际状况需要的指令;供操作者从中选择一指令;及(dl)执行操作者所选择的指令。 A speech characteristic auxiliary image using the remote control, the method using an image capture device is mounted a speech recognition apparatus using a sound pickup means for receiving voice commands issued by an operator to be remote; said voice recognition means contains a speech feature library, a video library, and wherein a command library, wherein said method comprises the steps of: (a) means for inputting a speech by the radio, and a characteristic of the speech to the speech feature comparison library whereby instructions can pick all corresponding to the speech feature from the instruction store, the set of instructions a synthetic instruction set; (b) a real-time image captured by the image capturing apparatus using the image characteristic to contrast images feature database; (CI) using the image feature database search result set filtering the plurality of instructions from the instruction in line with the actual state of operation of user needs; (C2) by means of a display to display the plurality of user operation of the compliance instruction needs the actual situation; for an operator to select a command; and command (DL) is selected by the operator performs.
CN 200610058563 2006-03-16 2006-03-16 Apparatus and method for assistant voice remote control using image feature CN101038742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610058563 CN101038742B (en) 2006-03-16 2006-03-16 Apparatus and method for assistant voice remote control using image feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610058563 CN101038742B (en) 2006-03-16 2006-03-16 Apparatus and method for assistant voice remote control using image feature

Publications (2)

Publication Number Publication Date
CN101038742A CN101038742A (en) 2007-09-19
CN101038742B true CN101038742B (en) 2011-06-22

Family

ID=38889604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610058563 CN101038742B (en) 2006-03-16 2006-03-16 Apparatus and method for assistant voice remote control using image feature

Country Status (1)

Country Link
CN (1) CN101038742B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101345668A (en) * 2008-08-22 2009-01-14 中兴通讯股份有限公司 Control method and apparatus for monitoring equipment
CN103136927A (en) * 2011-11-24 2013-06-05 亚旭电子科技(江苏)有限公司 Control object identification module of universal remote controller and identification method therefor
CN103135447A (en) * 2011-11-24 2013-06-05 亚旭电子科技(江苏)有限公司 Remote control switching device
CN103248633B (en) * 2012-02-01 2017-05-24 深圳中兴力维技术有限公司 One kind of method and system for controlling PTZ
CN104516772B (en) * 2013-09-27 2018-12-14 联想(北京)有限公司 A data processing method and an electronic device
CN103780843A (en) * 2014-03-03 2014-05-07 联想(北京)有限公司 Image processing method and electronic device
CN106331890A (en) * 2015-06-24 2017-01-11 中兴通讯股份有限公司 Processing method and device for video communication image
CN106653023A (en) * 2016-12-30 2017-05-10 深圳天珑无线科技有限公司 Method and system for triggering image acquisition by virtue of voice signal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5737707A (en) 1996-01-11 1998-04-07 At&T Corp. Pager-controlled wireless radiotelephone
US6167251A (en) 1998-10-02 2000-12-26 Telespree Communications Keyless portable cellular phone system having remote voice recognition
CN1345029A (en) 2000-09-19 2002-04-17 汤姆森许可贸易公司 Voice-operated method and device for electronic equipment for consumption
US6498970B2 (en) 2001-04-17 2002-12-24 Koninklijke Phillips Electronics N.V. Automatic access to an automobile via biometrics
US20050144009A1 (en) 2001-12-03 2005-06-30 Rodriguez Arturo A. Systems and methods for TV navigation with compressed voice-activated commands

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5737707A (en) 1996-01-11 1998-04-07 At&T Corp. Pager-controlled wireless radiotelephone
US6167251A (en) 1998-10-02 2000-12-26 Telespree Communications Keyless portable cellular phone system having remote voice recognition
CN1345029A (en) 2000-09-19 2002-04-17 汤姆森许可贸易公司 Voice-operated method and device for electronic equipment for consumption
US6498970B2 (en) 2001-04-17 2002-12-24 Koninklijke Phillips Electronics N.V. Automatic access to an automobile via biometrics
US20050144009A1 (en) 2001-12-03 2005-06-30 Rodriguez Arturo A. Systems and methods for TV navigation with compressed voice-activated commands

Also Published As

Publication number Publication date
CN101038742A (en) 2007-09-19

Similar Documents

Publication Publication Date Title
CN101101205B (en) Inspection apparatus for inspecting articles
JP5552767B2 (en) Display processing apparatus, a display processing method and display processing program
JP5268595B2 (en) Image processing apparatus, image display method and image display program
US9659561B2 (en) Recording support electronic device and method
US7831598B2 (en) Data recording and reproducing apparatus and method of generating metadata
CN1841187B (en) Image sensing device and control method thereof
US20150269236A1 (en) Systems and methods for adding descriptive metadata to digital content
JP4752897B2 (en) Image processing apparatus, image display method and image display program
US20130181900A1 (en) Non-contact selection device
RU2421775C2 (en) Information processing device and method, and computer programme
JP2010063104A (en) Digital camera for importing and organizing image before and after shutter signal
CN103945132B (en) The image acquisition method of electronic equipment and electronic equipment
US8314854B2 (en) Apparatus and method for image recognition of facial areas in photographic images from a digital camera
CN103052960A (en) Rapid auto-focus using classifier chains, mems and/or multiple object focusing
US20070022372A1 (en) Multimodal note taking, annotation, and gaming
JP2012008772A (en) Gesture recognition apparatus, gesture recognition method, and program
JPH07184160A (en) Device for processing picture data and audio data
CN108270903A (en) Method and apparatus for controlling lock/unlock state of terminal through voice recognition
JP2013502637A (en) Tagging system metadata, image search method, tagging method gesture applied device and its
WO2011123334A1 (en) Searching digital image collections using face recognition
US20030189642A1 (en) User-designated image file identification for a digital camera
CN101535996A (en) Method and apparatus for identifying an object captured by a digital image
US20150088515A1 (en) Primary speaker identification from audio and video data
JP2009059257A (en) Information processing apparatus and information processing method, and computer program
CN101364265A (en) Method for auto configuring equipment parameter of electronic appliance and ccd camera

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C41 Transfer of patent application or patent right or utility model
ASS Succession or assignment of patent right

Owner name: HONGFUJIN PRECISION INDUSTRY (SHENZHEN) CO., LTD.

Free format text: FORMER OWNER: PULIER SCI-TECH CO., LTD.

Effective date: 20090515

C14 Grant of patent or utility model
EXPY Termination of patent right or utility model