KR20110124568A - Robot system having voice and image recognition function, and recognition method thereof - Google Patents

Robot system having voice and image recognition function, and recognition method thereof Download PDF

Info

Publication number
KR20110124568A
KR20110124568A KR1020100044027A KR20100044027A KR20110124568A KR 20110124568 A KR20110124568 A KR 20110124568A KR 1020100044027 A KR1020100044027 A KR 1020100044027A KR 20100044027 A KR20100044027 A KR 20100044027A KR 20110124568 A KR20110124568 A KR 20110124568A
Authority
KR
South Korea
Prior art keywords
image
recognition
voice
robot
phoneme
Prior art date
Application number
KR1020100044027A
Other languages
Korean (ko)
Other versions
KR101171047B1 (en
Inventor
이상엽
오덕신
Original Assignee
삼육대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼육대학교산학협력단 filed Critical 삼육대학교산학협력단
Priority to KR20100044027A priority Critical patent/KR101171047B1/en
Publication of KR20110124568A publication Critical patent/KR20110124568A/en
Application granted granted Critical
Publication of KR101171047B1 publication Critical patent/KR101171047B1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/08Controls for manipulators by means of sensing devices, e.g. viewing or touching devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/026Acoustical sensing devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/04Viewing devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Manipulator (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)

Abstract

The present invention relates to a robot system, and more particularly to a robot system having a voice and image recognition function and a recognition method thereof.
The robot system of the present invention obtains the surrounding sounds, extracts the optimized phonemes, matches the extracted phonemes with a plurality of pre-stored words, recognizes the user's voice, and captures the user's image to obtain a predetermined pattern from the captured image. The input / output device that extracts and compares the previously stored image data to recognize the image, the drive control device that controls the driving of the robot's driver according to the recognition result of the robot input from the input / output device, and the input / output device is set as the basic input / output device. And a hardware control device for controlling the movement of the driver according to a control method of the management operation device and the drive control device to support the drive control device and to efficiently provide and manage the system environment and required information.
Therefore, the present invention can reduce the cost of the robot by incorporating a robot-specific management and operation apparatus capable of voice and image recognition in the system, and by using a fast Fourier transform function, the wavelength of sound acquired by the voice recognition method for each frequency band. By switching, it is easy to remove noise and accurate phoneme analysis can be made.

Description

Robot system having a voice and image recognition function and a recognition method thereof {ROBOT SYSTEM HAVING VOICE AND IMAGE RECOGNITION FUNCTION, AND RECOGNITION METHOD THEREOF}

The present invention relates to a robot system, and more particularly, to a robot system having a voice and image recognition function and a recognition method thereof.

Recently, according to the development of network and communication technology, various robots using the same have been developed and their use is being gradually progressed. Most of these robots were industrial robots such as manipulators and transport robots for the purpose of automating and unmanning production operations in factories.

Recently, development of a practical robot that supports life as a human partner, that is, supports human activities in various scenes of daily life outside the residential environment, has been progressed. Unlike industrial robots, such practical robots have the ability to learn for themselves how to adapt to different humans or various environments in various aspects of the human living environment. In particular, an autonomous mobile robot having an appearance shape close to that of human appearance can perform a motion close to a human motion, and can perform various motions that focus more on entertainment.

Some mobile robots are equipped with a small camera corresponding to an eye, a sound collecting microphone corresponding to an ear, and the like. In this case, the individual mobile robot may recognize the surrounding environment input as the image information or recognize the language from the input ambient sound by performing image processing on the acquired image.

However, although the voice recognition of the conventional robot can recognize the user's voice well in a quiet space, the user's voice recognition is not properly performed due to a number of noises in the open space. Accordingly, current speech recognition is often made of a remote controller or a touch sensor.

In addition, such a mobile robot does not make a significant contribution to the current society. The biggest reason is that no robot-specific operating system is developed. As a result, a robot usually uses a general PC operating system.

However, the input device of a general PC operating system is a mouse or a keyboard, which is different from image recognition or voice recognition, which is an input device used by a robot. Accordingly, the cost is increased due to the development of the operating system dedicated to the robot, there is a problem that the cost of the robot is formed high.

The problem to be solved by the present invention is to obtain a voice, extract the optimized phoneme of the voice, match the phoneme of the pre-stored words, and remove the noise in the speech recognition process, accurate speech recognition can be achieved robots capable of voice and image recognition It is to provide a system and a recognition method thereof.

In addition, the problem to be solved by the present invention is to provide a robot system and a recognition method capable of speech and image recognition that can reduce the cost of the robot by providing a robot system with a built-in robot operating system capable of voice and image recognition will be.

The robot system having a voice and image recognition function according to the present invention acquires surrounding sounds, extracts optimized phonemes, matches the extracted phonemes with a plurality of pre-stored words, recognizes the user's voice, and displays the user's image. An input / output device for recognizing an image by extracting a predetermined pattern from a captured image and capturing a predetermined image, a drive control device for controlling driving of a driver of the robot according to a recognition result of the robot input from the input / output device; A hardware control device that controls the movement of the driver according to the control method of the management operation device and the drive control device to support the drive control device and to provide and manage the system environment and required information efficiently by setting the input / output device as the basic input / output device. Include.

In this case, the input / output device includes a voice recognition storage unit in which a correspondence relation between a word and the phoneme of the word is stored as a dictionary for voice recognition, an image recognition storage unit stored as a dictionary for image recognition for recognizing a user's image, and surroundings. A sound acquisition unit for acquiring the sound of a voice, an image pickup unit for capturing an image of a user, and an optimized phoneme from a sound acquired by the sound acquisition unit, and matching the phoneme of a plurality of words stored in the phoneme and the voice recognition storage unit. It may include a voice recognition unit for recognizing the user's voice and an image recognition unit for extracting a predetermined pattern from the image captured by the image pickup unit to recognize the image by comparing the image data stored in the image recognition storage unit.

Here, the speech recognizer separates the wavelength of the sound acquired by the sound acquirer using a fast Fourier transform function, and extracts optimized phonemes from the separated frequency domain to consonant-based maximum matching method. Recognize the user's voice by matching the phoneme extracted with the phoneme of the pre-stored word.

The image recognition unit may recognize a face using any one of a knowledge-based determination method, a template combination determination method, a shape vector determination method, and a maximum flow matching method using a sector template.

According to an embodiment of the present invention, a voice recognition method of a robot system having a voice and image recognition function includes: (a) dividing a wavelength of a sound obtained by acquiring surrounding sounds into band-specific frequencies; and (b) frequency by bands. Detecting a valid region by switching to a Z plane and (c) matching a phoneme extracted from the valid region with a phoneme of a pre-stored word to recognize a user's voice.

In this case, in step (a), the sound wavelength is separated into band-specific frequencies using a fast Fourier transform function. Recognize the voice.

The image recognition method of the robot system having a voice and image recognition function according to the present invention includes the steps of (a) capturing a subject and determining whether face recognition or shape recognition is performed on the captured image, and (b) face recognition. (C) if it is determined that shape recognition is performed, converting the captured image into digital data and expressing the converted data as a figure. Recognizing comprises the step of.

At this time, in step (b), the face is recognized by one of a knowledge-based determination method, a template combination determination method, a shape vector determination method, and a maximum flow matching method using a sector template.

The present invention can reduce the unit cost of the robot by embedding the robot dedicated management operation apparatus capable of voice and video recognition in the system.

In addition, according to the present invention, by converting the wavelength of the sound obtained by the speech recognition method for each frequency band using a fast Fourier transform function, it is easy to remove noise and accurate phoneme analysis can be performed.

In addition, the present invention can recognize the speech faster than using the conventional Hidden Markov model method by matching the phoneme extracted from the speech recognition method with the phoneme of the pre-stored word using a consonant-based maximum flow matching method.

In addition, the present invention recognizes a face by using a partial image template method in the image recognition method, it is possible to recognize the image quickly and various forms, it is also possible to recognize the face at a medium distance.

1 is a view showing the configuration of a robot control system according to an embodiment of the present invention.
2 is a view showing a voice recognition method of the robot control system according to an embodiment of the present invention.
3 is a view showing the wavelength of sound in frequency blocks.
4 is a diagram illustrating an effective area detected by separating a wavelength of sound by frequency band and then switching to a Z plane.
5 is a diagram illustrating a phoneme distribution of 'ga' sounds in an effective area.
6 is a diagram illustrating a phoneme distribution of a 'ka' sound and a 'ga' sound in an effective area.
7 is a diagram illustrating matching phonemes by a consonant based maximum flow matching method.
8 is a view showing an image recognition method of the robot system according to an embodiment of the present invention.
9 is a diagram illustrating cells of an image quantized by a knowledge-based face recognition method.
FIG. 10 illustrates a method of recognizing a face using a color histogram as a knowledge-based face recognition method.
11 is a diagram illustrating a template image list of a face.
12 is a diagram illustrating a face recognition method using a shape vector.
13 is a diagram illustrating a maximum flow matching method using a sector template.
FIG. 14 is a diagram illustrating sector division between an obscured face and a rotated face.

Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings. However, one embodiment of the present invention may be modified in various forms, and the scope of the present invention should not be construed as being limited due to the embodiments described below. One embodiment of the present invention is provided to more easily explain the present invention to those skilled in the art.

 1 is a view showing the configuration of a robot control system according to an embodiment of the present invention.

Referring to FIG. 1, the robot control system according to an exemplary embodiment of the present invention includes an input / output device 110, a drive control device 120, a management operation device 130, and a hardware control device 140.

The input / output device 110 captures an image of a sound acquisition unit 111 for acquiring surrounding sounds, a speech recognition storage unit 112 in which a correspondence between a word and a phoneme of the word is stored as a dictionary for speech recognition, and a user's image. Noise is removed from the sound acquired by the image pickup unit 113, the image recognition storage unit 114 stored as the image recognition dictionary for recognizing the user's image, and the sound acquisition unit 111, and then the user's voice is optimized. Extract a predetermined phoneme and extract a predetermined pattern from an image captured by the voice recognition unit 115 and the image pickup unit 113 to recognize a user's voice by matching a plurality of words stored in the phoneme and the voice recognition storage unit 112. The image recognition unit 116 recognizes an image by comparing the image data stored in the image recognition storage unit 114. At this time, the speech recognition unit 115 converts the wavelength of the sound acquired by the sound acquisition unit 111 into a frequency for each band by using a frequency conversion function to obtain the user's voice, and the Hidden Markov Model method. Alternatively, an optimized phoneme is extracted based on any one of consonant-based Max Flow Matching methods. Here, the frequency transform function is preferably a Fast Fourier Transform function. The reason for using the fast Fourier transform function is that it is possible to know what values are in the low frequency band and what values are in the high frequency band, which is useful for removing noise and enables accurate phoneme analysis. In addition, the image recognition unit 116 recognizes a face by any of a knowledge-based determination method, a partial image template combining determination method, a shape vector determination method, and a maximum flow matching method using a sector template. Detailed description of this method will be described later.

The driving control apparatus 120 may include a logger 121 that stores a state of the input / output device 110, a behavior expression unit 122 that determines a behavior expression according to a recognition result of the input / output device 110, and an input / output device ( Control the driver according to the decision of the mapper unit (123) for providing the mapping information (110), the environment setting unit (config, 124), the behavior expression unit 122 to load and reference the configuration information to 110; And a communication unit 126 that supports a communication environment between the driver control unit 125, the input / output device 110, and the hardware control device 140.

The management operation device 130 serves as a base of the input / output device 110 and the drive control device 120, and provides an action map that provides mapping information such as action information, action expression, and control command according to a recognition result; 131), I / O log data (log data, 132), which records important information generated during the input / output process, environment configuration data (config data, 133) having the system configuration information, and a file unit. It includes a resource file (134) consisting of an action file (action file), an action script (action script) that can be executed, and a sound file for expressing sound effects or voice information. In this manner, by incorporating the management operation device 130 into the robot system, the unit price of the robot can be reduced.

The hardware control device 140 controls the movement of the driver according to the control method of the drive control device 120.

The phoneme recognition method of the robot control system according to the exemplary embodiment of the present invention configured as described above will be described.

2 is a view showing a voice recognition method of the robot control system according to an embodiment of the present invention.

Referring to FIG. 2, in the voice recognition method of the robot control system according to an exemplary embodiment of the present disclosure, when surrounding sounds are acquired by the sound acquisition unit 111 (S210), the wavelength of the acquired sound is determined by the band-specific frequency. Separated into (S220). In this case, a method of separating the acquired sound wavelength into frequencies for each band is performed by using a fast Fourier transform function. In this way, in order to extract the optimized phonemes in the areas separated by the frequency band, it is converted to the Z plane (S230). Thereafter, the effective region is detected to extract optimized phonemes in the valid region (S240). The extracted phoneme is matched with a phoneme of a pre-stored word using a consonant-based maximum matching method (S250) to recognize a user's voice (S260).

Looking at the voice recognition method of the robot control system according to an embodiment of the present invention as described above in more detail with reference to Figures 3 to 7 as follows.

3 is a diagram showing a wavelength of sound as a frequency block, and FIG. 4 is a diagram showing an effective area detected by separating a wavelength of a sound by frequency band and then switching to a Z plane, and FIG. 6 is a diagram illustrating a phoneme distribution of a phoneme, FIG. 6 is a diagram illustrating a phoneme distribution of a 'car' and a 'ga' sound in an effective area, and FIG. 7 is a diagram illustrating a phoneme matching using a consonant-based maximum flow matching method. .

Referring to FIG. 3, the wavelength of the acquired sound is separated into frequencies for each band by using a frequency transform function. In this case, the frequency transform function to be used is a fast Fourier transform (FFT) function as described above. to be. Here is the fast Fourier transform function:

Figure pat00001

Using the fast Fourier transform function, the wavelength of sound is divided into a high frequency region and a low frequency region, and it is possible to know what values are present in the high frequency region and what values are present in the low frequency region. In addition, when the fast Fourier transform is performed as a complex number, values of a real part and an imaginary part are separated. Accurate phoneme analysis can be achieved only by analyzing the combination of these two values. Moreover, the noise of the acquired sound can be removed using such a fast Fourier transform function. In this way, one block is selected, and if the average value corresponding to noise exists by comparing the average frequency of each band of the previous block and the selected block, it is determined as noise and the corresponding frequency is removed.

In this way, the real part and the imaginary part obtained by separating the frequency bands using the fast Fourier function are switched to the Z plane. In this way, when the switch to the Z plane, as shown in Figure 4, the effective area is shown in the unit circle form. The following equation converts to the Z plane.

Figure pat00002

In the voice frequency, a region of a specific frequency becomes a valid value for vowels and consonants. An effective region detected using an equation for converting to a Z plane is represented as an ellipse region in the Z plane. Here, the phoneme distribution in the effective area is taken as an example of the 'false' sound, and as shown in FIG. 5, the vowel 'ㅏ' is a long elliptic region 510 of the first quadrant of the frequency plane and a small quadrant of the second quadrant. The ellipse region 520 is distributed, and the consonant 'a' is distributed in the small ellipse regions 530, 540, and 550 of the 2, 3, and 4 quadrants. However, even if the frequency value is matched to the above region, the phone may not be extracted exactly as 'ga', but 'car' similar to 'ga' may be extracted. This is because there is a requirement that a value exists in the region, not a sufficient condition.

Referring to FIG. 6, the vowel 'ㅏ' is distributed into the long ellipse region 610 of the first quadrant of the frequency plane and the small ellipse region 620 of the two quadrants, and the consonant 'a' is 2,3,4 yarns. The elliptic regions 630, 640, and 650 of the quadrant are distributed, and the consonant 'ㅋ' is distributed into the elliptic regions 660, 670, and 680 of the 2,3,4 quadrants. At this time, the voice area of the 'car' overlaps with a part of the voice area of the 'ga', and the specific area includes all parts. Therefore, even in the voice region of 'ga', the overlap between 'ga' and 'car' is formed, so it is difficult to analyze the desired phoneme without using an efficient matching method.

Accordingly, the Hidden Markov Model, which is commonly used for analyzing desired phonemes, can be used. The Hidden Markov model recognizes the phoneme in a chain way that first checks the 'B' sound and then uses the additional information to check the 'ㅋ' and then checks the vowels. However, such a chain method is very complicatedly connected, and there is a problem in that the execution complexity is very large and a lot of time is generated for phoneme recognition. Accordingly, the robot control system of the present invention intends to analyze phonemes using a consonant-based maximum flow matching method.

Referring to FIG. 7, the consonant-based maximum flow matching method extracts consonants by analyzing consonant frequencies in one block of input data after switching to the Z plane. The extracted consonants 710 are recognized using a phoneme data 720 of a pre-stored word and weighted bipartite max flow matching to infer a maximum flow value. When the expected flow does not come out by such consonant matching, the vowel is additionally extracted to match. As a result of the matching, the robot recognizes the word 'hello'. Accordingly, the robot performs an action (nodding) that matches the recognized word 'hello' by the control of the driving control device 120.

Next, an image recognition method of the robot system according to an exemplary embodiment will be described.

8 is a view showing an image recognition method of the robot system according to an embodiment of the present invention.

Referring to FIG. 8, in the image recognition method of the robot system according to an exemplary embodiment of the present disclosure, after imaging the user's image by the imaging unit 113 (S810), it is determined whether the face is recognized or the shape recognition is performed. S820).

If it is determined that the face is recognized, the stored data is compared with the captured image data (S830). In this case, the comparison method is any one of a knowledge-based face recognition method, a template face recognition method, a face recognition method using a shape vector, and a maximum flow matching method using a sector template. A detailed process of recognizing a face in these four methods will be described later.

When the image captured by any one of the above methods is compared with the pre-stored data, the robot recognizes the user image (S840).

On the other hand, if it is determined that shape recognition, the captured image data is converted into digital data (S850).

After that, the converted data is represented as a figure (S860), the shape of the image is recognized as the expressed figure (S870), and the frequency value and color histogram information in the recognized form are transferred (S880), and the robot displays the image of the user. It will be recognized (S840).

First, in the face recognition method, the face recognition method includes one of four methods, a knowledge-based determination method, a partial image template combining determination method, a shape vector determination method, and a face region divided into sectors to determine the maximum flow matching method. Is done in a way.

Among these four methods, we first look at the knowledge-based decision method. The knowledge-based judgment method is a method of recognizing a face using basic knowledge about a face shape using the corresponding knowledge. For example, a facial recognition method using a prior knowledge such as "the color of a human skin is generally what color".

FIG. 9 is a diagram illustrating a cell of an image quantized by a knowledge-based face recognition method, and FIG. 10 is a diagram illustrating a face recognition method using a color histogram as a knowledge-based face recognition method.

Referring to FIG. 9, after quantizing the captured image for face recognition, the cells of the quantized image are leveled. (a) is the original image, (b) is quantization level 2, (c) is quantization level 3, and (d) is quantization level 4. In this case, although four levels are expressed in FIG. 9, the number of levels may be more or less. The quantization level 4 (d) of the cells of the quantized image is used to obtain the overall color distribution and the approximate shape coordinates of the image and to identify the face. Then, as the level is reduced, the face is recognized while watching a more detailed image. At this time, the size of the face should guarantee the validity area. That is, face recognition is possible on the assumption that an image of at least 200 × 200 size is a face in a 320 × 240 size area.

The knowledge-based face recognition method can recognize an image in a more advanced manner, that is, by dynamically setting the size of a face using a color frequency change rate of the image.

Referring to FIG. 10, when the color is viewed at the quantization level 4, an area of the face may be accurately detected, and the face may be recognized based on the color change rate value of the original image based on this information.

11 is a diagram illustrating a template image list of a face.

The partial image template determination method is a method of determining the position of a face, making a basic form of an image into several templates, and then matching and reading the corresponding templates. As illustrated in FIG. 11, the partial image template determination method divides a face to be recognized into partial regions. In this case, the template is divided into a low frequency region template, a high frequency region template, a rotational template, a partial image template, and the like.

When the face of the image is detected, the template in the low frequency region is first matched. This is because the template in the low frequency region has a lot of the same information, so the template comparison is very fast. In this way, a face candidate is recognized by finding a valid candidate image and gradually comparing the high frequency region templates, and then comparing the templates of the partial image. At this time, the face recognition method using the template does not read the face without the template of the rotated face.

Such a partial image template determination method has an advantage in that image recognition is quick and can recognize various shapes, and face recognition at a medium distance is also possible.

12 is a diagram illustrating a face recognition method using a shape vector.

Referring to FIG. 12, a face recognition method using a shape vector is a method of recognizing a face with directionality of a face shape. Before you use this method, look for a face and know the location of your face's eyes, nose, and mouth. Thereafter, it may be used after face recognition using the template face recognition method described above. The face recognition method using the shape vector can recognize the change of the user's facial expression and can even recognize the facial expression.

FIG. 13 is a diagram illustrating a maximum flow matching method using a sector template, and FIG. 14 is a diagram illustrating sector division between an obscured face and a rotated face.

Referring to FIG. 13, in the maximum flow matching method using the sector template, the image captured by the imaging unit 111 is divided into partial images as shown in (a), and the color histogram and the monochrome values of the images are displayed as shown in (b). The extracted shape information is used and the information is compared with the information in the image recognition storage unit 114 as shown in (c) to recognize the face.

In the maximum flow matching method using the sector template, face recognition can be performed even when the face is occluded as shown in (a) or when rotation occurs as shown in (b). Here, the circle means recognition. This is because the matching result is the same even when rotated because the image is divided into sectors and maximum flow matching is used.

Finally, in the shape recognition method, the robot must first convert the image into digital data in order to recognize the shape. In this case, the shape labeling is used to convert an image into digital data. In addition, the digital data includes shape data, pattern data, and color data.

The shape labeling is then expressed as a series of figures (lines, lines, rectangles, circles, ellipses, etc.). When expressed as a series of figures, the robot is very easy to recognize the shape. After the robot recognizes the shape, if the frequency pattern value and color histogram information in the shape are transmitted to the application level, the robot recognizes the user.

The robot system according to an exemplary embodiment of the present invention obtains a voice, extracts an optimized phoneme of the voice, and matches the phoneme of a pre-stored word, thereby enabling accurate voice recognition.

In addition, the robot system according to an embodiment of the present invention can reduce the unit cost of the robot by embedding the robot dedicated management operation apparatus in the system.

In addition, the robot system according to an embodiment of the present invention by converting the wavelength of the sound obtained by the speech recognition method for each frequency band by using a fast Fourier transform function, it is easy to remove noise and accurate phoneme analysis can be made.

In addition, the robot system according to an embodiment of the present invention matches phonemes extracted from a speech recognition method with phoneme of a pre-stored word using a consonant-based maximum flow matching method, which is faster than using a conventional Hidden Markov model method. Can recognize voice

In addition, the robot system according to an embodiment of the present invention recognizes the face using the partial image template method in the image recognition method, it is possible to recognize the image quickly and various forms, it is also possible to recognize the face in the middle distance.

110: input and output device 120: drive control device
130: management operating unit 140: hardware control unit
111: sound acquisition unit 112: storage unit for speech recognition
113: imaging unit 114: storage unit for image recognition
115: voice recognition unit 116: image recognition unit

Claims (9)

In the robot system having a voice and image recognition function,
Acquire the surrounding sound to extract the optimized phonemes, match the extracted phonemes with a plurality of pre-stored words, recognize the user's voice, capture the user's image, extract a predetermined pattern from the captured image, and store the pre-stored image data. An input / output device that recognizes an image in comparison with the image;
A drive control device for controlling driving of the driver of the robot according to a recognition result of the robot inputted from the input / output device;
A management operation apparatus configured to set the input / output device as a basic input / output device to support the driving control device and to efficiently provide and manage a system environment and required information; And
A hardware control device for controlling the movement of the driver in accordance with a control method of the drive control device;
Robot control system comprising a.
The method of claim 1, wherein the input / output device,
A speech recognition storage unit in which a correspondence between words and phonemes of the words is stored as a dictionary for speech recognition;
An image recognition storage unit stored as a dictionary for image recognition for recognizing a user's image;
A sound acquisition unit for acquiring ambient sounds;
An imaging unit for capturing an image of the user;
A voice recognition unit extracting an optimized phoneme from the sound acquired by the sound acquisition unit, and matching a phoneme with a phoneme of a plurality of words stored in the voice recognition storage unit to recognize a user's voice; And
An image recognizing unit recognizing an image by extracting a predetermined pattern from the image captured by the image capturing unit and comparing the image data stored in the image recognizing storage unit;
Robot system having a voice and image recognition function, characterized in that comprising a.
The method of claim 2, wherein the speech recognition unit,
The wavelength of the sound obtained by the sound acquisition unit is separated by a frequency using a fast Fourier transform function, the optimized phoneme is extracted from the separated frequency domain, and extracted using a consonant based maximum matching method. The robot system having a voice and image recognition function, characterized in that to recognize the user's voice by matching the phoneme of the phoneme and the pre-stored words.
The method of claim 2, wherein the image recognition unit,
A robot system having a voice and image recognition function, wherein a face is recognized by any one of a knowledge-based determination method, a template combination determination method, a shape vector determination method, and a maximum flow matching method using a sector template.
In the voice recognition method of the robot system having a voice and image recognition function,
(a) dividing a wavelength of a sound obtained by acquiring surrounding sounds into band-specific frequencies;
(b) converting the band-specific frequencies to a Z plane to detect an effective region; And
(c) recognizing a user's voice by matching the phoneme extracted from the valid region with a phoneme of a pre-stored word;
Speech recognition method of the robot system comprising a.
The method of claim 5, wherein in step (a),
A speech recognition method of a robotic system, characterized by separating sound wavelengths into band-specific frequencies using a fast Fourier transform function.
The method of claim 5, wherein in step (c),
A speech recognition method of a robotic system, characterized in that a user's voice is recognized by matching a phoneme extracted from a consonant-based maximum flow matching method with a phoneme of a pre-stored word.
In the image recognition method of the robot system having a voice and image recognition function,
(a) photographing a subject to determine whether face recognition or shape recognition is performed on the captured image;
(b) if it is determined that face recognition is performed, recognizing a face by comparing previously stored data with a captured image; And
(c) if it is determined that shape recognition is performed, converting the photographed image into digital data and recognizing the shape by representing the converted data as a figure;
Image recognition method of a robotic system comprising a.
The method of claim 8, wherein in step (b),
A method of image recognition in a robotic system, characterized in that the face is recognized by any one of a knowledge-based determination method, a template combination determination method, a shape vector determination method, and a maximum flow matching method using a sector template.
KR20100044027A 2010-05-11 2010-05-11 Robot system having voice and image recognition function, and recognition method thereof KR101171047B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR20100044027A KR101171047B1 (en) 2010-05-11 2010-05-11 Robot system having voice and image recognition function, and recognition method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR20100044027A KR101171047B1 (en) 2010-05-11 2010-05-11 Robot system having voice and image recognition function, and recognition method thereof

Publications (2)

Publication Number Publication Date
KR20110124568A true KR20110124568A (en) 2011-11-17
KR101171047B1 KR101171047B1 (en) 2012-08-03

Family

ID=45394298

Family Applications (1)

Application Number Title Priority Date Filing Date
KR20100044027A KR101171047B1 (en) 2010-05-11 2010-05-11 Robot system having voice and image recognition function, and recognition method thereof

Country Status (1)

Country Link
KR (1) KR101171047B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108942925A (en) * 2018-06-25 2018-12-07 珠海格力智能装备有限公司 Robot control method and device
WO2020251074A1 (en) * 2019-06-12 2020-12-17 엘지전자 주식회사 Artificial intelligence robot for providing voice recognition function and operation method thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102447647B1 (en) * 2022-05-20 2022-09-27 주식회사 패스트레인 Method for thumbnail instance exposure adaptive to estimated user type, and device implementing thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108942925A (en) * 2018-06-25 2018-12-07 珠海格力智能装备有限公司 Robot control method and device
WO2020251074A1 (en) * 2019-06-12 2020-12-17 엘지전자 주식회사 Artificial intelligence robot for providing voice recognition function and operation method thereof
US11810575B2 (en) 2019-06-12 2023-11-07 Lg Electronics Inc. Artificial intelligence robot for providing voice recognition function and method of operating the same

Also Published As

Publication number Publication date
KR101171047B1 (en) 2012-08-03

Similar Documents

Publication Publication Date Title
CN107799126B (en) Voice endpoint detection method and device based on supervised machine learning
CN112088402B (en) Federated neural network for speaker recognition
KR102379954B1 (en) Image processing apparatus and method
CN108346427A (en) Voice recognition method, device, equipment and storage medium
CN111048113B (en) Sound direction positioning processing method, device, system, computer equipment and storage medium
CN104361276A (en) Multi-mode biometric authentication method and multi-mode biometric authentication system
US11825278B2 (en) Device and method for auto audio and video focusing
CN111386531A (en) Multi-mode emotion recognition apparatus and method using artificial intelligence, and storage medium
KR20210052036A (en) Apparatus with convolutional neural network for obtaining multiple intent and method therof
KR20080050994A (en) System and method for integrating gesture and voice
CN109558788B (en) Silence voice input identification method, computing device and computer readable medium
KR102290186B1 (en) Method of processing video for determining emotion of a person
CN112507311A (en) High-security identity verification method based on multi-mode feature fusion
JP2019200671A (en) Learning device, learning method, program, data generation method, and identification device
CN111326152A (en) Voice control method and device
KR20210044475A (en) Apparatus and method for determining object indicated by pronoun
CN116312512A (en) Multi-person scene-oriented audiovisual fusion wake-up word recognition method and device
KR101171047B1 (en) Robot system having voice and image recognition function, and recognition method thereof
KR20210066774A (en) Method and Apparatus for Distinguishing User based on Multimodal
US10917721B1 (en) Device and method of performing automatic audio focusing on multiple objects
CN114239610A (en) Multi-language speech recognition and translation method and related system
Ivanko et al. A novel task-oriented approach toward automated lip-reading system implementation
KR102564570B1 (en) System and method for analyzing multimodal emotion
US11218803B2 (en) Device and method of performing automatic audio focusing on multiple objects
US20220262363A1 (en) Speech processing device, speech processing method, and recording medium

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20150522

Year of fee payment: 4

FPAY Annual fee payment

Payment date: 20160428

Year of fee payment: 5

FPAY Annual fee payment

Payment date: 20170518

Year of fee payment: 6

FPAY Annual fee payment

Payment date: 20180723

Year of fee payment: 7

FPAY Annual fee payment

Payment date: 20190716

Year of fee payment: 8