CN101742110A

CN101742110A - Video camera set by speech recognition system

Info

Publication number: CN101742110A
Application number: CN200810152901A
Authority: CN
Inventors: 李妮; 郑龙周
Original assignee: Tianjin Samsung Electronics Co Ltd
Current assignee: Tianjin Samsung Electronics Co Ltd
Priority date: 2008-11-10
Filing date: 2008-11-10
Publication date: 2010-06-16

Abstract

The invention relates to a video camera set by a speech recognition system, comprising a CPU (central processing unit) circuit, a coding and decoding circuit and a speech module. The CPU circuit is connected with an electronic circuit of the video camera; the speech module is respectively connected with the CPU circuit and the coding and decoding circuit; the CPU circuit stores control procedures by the following steps: after the video camera is electrified and started, the system enters in a first selection, i.e. a speech control mode, to activate the action of the speech control module; at this time, the video camera enters in a speech control starting state; then, the recognition of key words is carried out, wherein key words which can be operated by speech includes video recording, stop, video play and shutdown; and the system can judge the information, send information obtained after judgment to the CPU and carry out the command after a user says 'video recording'. The video camera can facilitate the operation, improve the effect and the speed of snapshot by using the intelligent speech system and overcome the defect of the loss of the best picture due to menu adjustment.

Description

The video camera that adopts speech recognition system to be provided with

Technical scheme

The present invention relates to a kind of video camera, particularly a kind of video camera that adopts speech recognition system to be provided with.

Background technology

Present man-machine communication mainly is that manual manipulation mode is realized human-computer dialogue, limited the flexibility that Mechatronic Systems exchanges of unifying of people and department of computer science.In order to improve digitlization appliance system human-computer dialogue flexibility, make things convenient for special population needs such as old man, disabled person, aspect human-computer dialogue, need seek better information exchange means.Because language is human main and the most basic exchange waies, and along with the development of Digital Signal Processing software and hardware, up to the present voice processing technology reaches its maturity, and is own near the practicability stage.

Summary of the invention

In view of the deficiency that prior art exists, the invention provides and a kind ofly can control the video camera that the employing speech recognition system of camera operation is provided with by phonetic entry.

The technical scheme that the present invention takes for real above-mentioned purpose is: a kind of video camera that adopts speech recognition system to be provided with, comprise the cpu circuit and the coding and decoding circuit that are connected with the video camera electronic circuit, it is characterized in that, also comprise voice module, described voice module is connected with cpu circuit and coding and decoding circuit respectively, described cpu circuit storage control program, its controlled step is:

1, after the energized start, system enters first and selects promptly whether to enter the voice control model, this selection of system automatic option that is set to start shooting, if select NO then system enters manual mode, at this moment the function of video camera is the same with common video camera;

If 2 select YES, then this information can be communicated to speech control module by CPU, the action of voice activated control module, and at this moment video camera will enter voice control starting state;

3, carry out the identification of keyword then, here can have by the keyword of voice operating: record a video, stop, video reproduction, shutdown, the user says " video recording " back system can judge this information, the information that judgement is obtained is carried out after sending to CPU then, the user is every to say that a keyword program just once judges, till " shutdown " this program is performed.

The invention has the beneficial effects as follows: can use the voice system technology, carry out operation to video camera, make operation better convenient, and can improve the effect and the speed of capturing by the use of intelligent voice system, improved the defective of the best picture loss that causes because of the adjusting menu, perfect along with the voice voice activated control, new product also will be at emerge in multitude.

Description of drawings

Fig. 1 is that circuit of the present invention connects block diagram.

Fig. 2 is a control flow chart of the present invention.

Embodiment

The video camera that adopts speech recognition system to be provided with as shown in Figure 1, 2, comprise the cpu circuit and the coding and decoding circuit that are connected with the video camera electronic circuit, also comprise voice module, voice module is connected with cpu circuit and coding and decoding circuit respectively, the cpu circuit storage control program, its controlled step is:

1, after the energized start, system enters first and selects promptly whether to enter the voice control model, this option also has in menu, this selection of system automatic option that is set to start shooting, if select NO then system enters manual mode, at this moment the function of video camera is the same with common video camera;

The present invention includes hardware designs and software design two parts, hardware components has increased pronounciation processing chip (RSC-364) in the video camera electronic circuit, has increased the code of control speech processes on the software.

Voice module (RSC-364) is the cmos device that a slice is a core with 8 MCU, and also integrated assemblies such as ROM, RAM, A/D, D/A, front-end amplifier and power amplifier.RSC-364 has accurately, and reaction time, low cost fast, and multi-functional as long as add seldom external module, just can be formed a speech recognition system.Its operational capability is 4MIPS (Million Instructions Per Second) in order to improve operational capability, more than on the chip multiplier of a 24bit * 24bit.

RSC-364 uses the artificial neural net of succeeding in school in advance to carry out nonspecific language person's speech recognition, does not promptly need just can discern simple statements such as " Yse ", " No ", " Ok " through training, claims on its Data Book that its discrimination is more than 97%.

RSC-364 also has the speech-sound synthesizing function of 5～15kb/s, and its phonetic synthesis is by the Sensory specialized designs, more general good of its tonequality.It also has improved ADPCM (adaptive differential pulse modulation) speech coder and decoder function.

The design of RSC-364 comprises that microphone signal enlarges data transaction, identification and comprehensive function also have in ROM holder (only the RSC-364 chip has), and the core of a single-chip CPU is arranged, therefore, RSC-364 can provide the 4MIPS of integer performance at 14.32MHz.This can make the consumer obtain maximum usefulness with the expense of minimum.The RSC-364 command list is very similar to 8051 groups of microprocessor.Its processor avoids limiting special-purpose internal memory, and seeing through has complete symmetry source and purpose, is fit to all instructions.

Voice activated control all carries the microphone part at most DVC, therefore can be directly connected to the microphone among the DVC.Provide power supply by the DVC mainboard.

Many identification engine identifier workflows are:

(1) the input voice is carried out preliminary treatment, comprise the cutting of voice signal and noise remove etc.The cutting algorithm that is based on energy window calculating that the cutting of voice signal is adopted makes that the end points of voice signal is more accurate.

(2) physical length and other physical features anticipation input voice according to the input voice are still continuous speech input of isolated word input.If voice signal is shorter, then adopt identification engine 1,2 to discern; If signal is longer, then adopt identification engine 2,3 to discern; If can not determine isolated voice or continuous speech, then adopt three identification engines to discern simultaneously.

(3), the recognition result that obtains sent into as candidate keywords (if recognition result difference then be many candidates) confirm that module confirms for different identification engines.

Because the identifier based on many identification engine has started two or three identification engines at least simultaneously, so the response time of system will be affected inevitably.So when the voice modeling, the method that adopts parameter to share, thus reduced the computing method complexity, improved system response time.Notice simultaneously,,, therefore can satisfy the requirement of real-time response fully because the recognition speed of identification engine 1,2 is very fast for isolated voice; For continuous speech, its recognition time mainly expends on the identification engine 3, and this is inevitably, and it is additional consuming time very little that system introduces, and therefore can therefore not reduce the response speed of system basically.

And discern the foundation of the identifier of engine more, make no matter continuous speech is imported still isolated voice input, can both adopt suitable identification engine to discern, thereby on the basis that allows the user freely to exchange, guarantee that the discrimination of system is greatly enhanced.Especially the user is when adopting the continuous speech input system correctly not discern, can lower the requirement, regard it as the isolated voice input, can correctly control household electrical appliances so on the one hand normally moves, pass through self adaptation on the other hand, the model of different identification engines has all obtained portrayal more accurately, has improved system recognition rate gradually, thereby has made the continuous speech recognition rate also be improved.In addition, all adopted in all cases and connected the identification engine, mainly be to consider often more subsidiary common burst noise and modal particles in the voice of disabled user,, can remove the noise of voice signal head and the tail and the influence of modal particle therefore by this being carried out independent modeling.

Claims

1. video camera that adopts speech recognition system to be provided with, comprise the cpu circuit and the coding and decoding circuit that are connected with the video camera electronic circuit, it is characterized in that, also comprise voice module, described voice module is connected with cpu circuit and coding and decoding circuit respectively, and described cpu circuit storage control program step is:

(1) after the energized start, system enters first and selects promptly whether to enter the voice control model, this selection of system automatic option that is set to start shooting, if select NO then system enters manual mode, at this moment the function of video camera is the same with common video camera;

(2) if select YES, then this information can be communicated to speech control module by CPU, voice activated control module action, and at this moment video camera will enter voice control starting state;

(3) speech control module carries out the identification of keyword, can have by the keyword of voice operating here: record a video, stop, video reproduction, shutdown; The user says " video recording " back system can judge this information, and the information that judgement is obtained is carried out after sending to CPU then, and the user is every to say that a keyword program just once judges, till " shutdown " this program is performed.