WO2018190668A1 - Speech intention expression system using physical characteristics of head and neck articulator - Google Patents

Speech intention expression system using physical characteristics of head and neck articulator Download PDF

Info

Publication number
WO2018190668A1
WO2018190668A1 PCT/KR2018/004325 KR2018004325W WO2018190668A1 WO 2018190668 A1 WO2018190668 A1 WO 2018190668A1 KR 2018004325 W KR2018004325 W KR 2018004325W WO 2018190668 A1 WO2018190668 A1 WO 2018190668A1
Authority
WO
WIPO (PCT)
Prior art keywords
sensor
unit
speech
data
neck
Prior art date
Application number
PCT/KR2018/004325
Other languages
French (fr)
Korean (ko)
Inventor
이우기
심봉섭
권헌도
김덕환
신진호
Original Assignee
인하대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020170126469A external-priority patent/KR20180115602A/en
Application filed by 인하대학교 산학협력단 filed Critical 인하대학교 산학협력단
Priority to US16/605,361 priority Critical patent/US20200126557A1/en
Publication of WO2018190668A1 publication Critical patent/WO2018190668A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids

Definitions

  • the present invention is to recognize the physical characteristics of the articulation organs of the head and neck, including the oral cavity through the articulation sensor to measure the change according to the overall ignition of the head and neck and grasp the intention to speak through this, visual, auditory, tactile
  • the present invention relates to a system for providing to the speaker himself or the outside and transferring the intention of uttering the image to the head and neck of the robot.
  • Characters produced in articulatory organs are called speech or verbal sounds for the communication of linguistic information and vocalization in non-verbal cases.
  • the main organs of the human body involved in the production of the letters are the nervous system and the respiratory system.
  • the nervous system is involved in the central nervous system and the peripheral nervous system.
  • the cranial or cranial cell nuclei are located in the brain stem, and the cerebellum has the function of precisely controlling the muscle control for movement. Play a dominant role in function
  • the cranial nerve involved in speech production includes the fifth cranial nerve involved in jaw movement, the seventh cranial nerve involved in lip movement, the tenth cranial nerve involved in pharynx and larynx, and the eleventh cranial nerve involved in pharyngeal movement. And the 12th nerve involved in the movement of the tongue.
  • the peripheral nerves especially the laryngeal nerves and the recurrent laryngeal nerves branched from the vagus nerve are directly involved in laryngeal movement.
  • Speech is also produced by the lower respiratory tract, the larynx, and the vocal tract.
  • the vocal cords are the source of letters, and the flow of exhalation from the lungs causes the vocal cords to vibrate, and during vocalization, the aerobic control provides an efficient and efficient supply of sound energy.
  • the vocal cords When the vocal cords are properly tensioned and closed, the vocal cords vibrate due to exhalation, and the gates are opened and closed at regular intervals to control the expiratory streams that pass through the gates.
  • the articulation process refers to a process of forming phonemes, which are units of speech sounds, after the sound is amplified and supplemented through the resonance process.
  • the tongue is the most important articulator, but in fact, the phoneme involves not only the tongue but also various structures of the mouth and face.
  • These articulators include movable structures such as the tongue, lips, soft palate, jaws, and immovable structures such as teeth or hard palates. These articulators block or restrict the flow of air to form consonants and vowels.
  • the tongue can be divided from the front into the apex (tip), the blade (blade), the dorsum, the body of the tongue, and the root of the tongue (root).
  • the tip of the tongue is the part used when we point out the tongue or articulate / d / (such as "Larara"), which is the first sound of the syllable, and the tongue is made from the front of the mouth, such as alveolar sounds. It is mainly used to articulate phonemes, and the tongue is the part of the tongue that is commonly used to articulate back sounds such as velar sounds.
  • the lips which are the second articulators, form the mouth of the mouth and play an important role in facial expression and articulation of the head and neck.
  • the vowels are distinguished by phonemes as well as the movement of the tongue.
  • the bilabial sounds can be pronounced only when the lips are closed.
  • the shape of the lips is modified by the surrounding muscles.
  • the circumference of the mouth around the lips (orbicularis oris muscle) plays an important role in pronounced lip vowels such as head lip consonants and / right / by closing or pinching the lips.
  • Quadratus labii superior muscle and quadrates labii inferior muscle open the lips.
  • the risorius muscle plays an important role in pulling the corners of the lips and smiling or contracting the lips to produce sounds like /.
  • the jaw is divided into the immobilized upper jaw (maxilla) and the lower jaw (mandible) which moves up and down and left and right.
  • maxilla the immobilized upper jaw
  • the lower jaw the mandible which moves up and down and left and right.
  • These jaws are the strongest and largest of the facial bones and are driven by four pairs of muscles.
  • the movement of the lower jaw changes the size of the mouth, which is important not only for chewing but also for vowel production.
  • the gum is the area where the / c / or / s / speech sounds are articulated
  • the hard palate is the hard and rather flat part of the gums where the sound of the / ⁇ / series is articulated Site.
  • the Yeongrin Palate is classified as a moving articulator because the muscles of the Yeongrin Palate contract and form a closed lover's head and thus oral sounds.
  • the sounds are those which are produced with some obstruction in the oral cavity, or more precisely in the middle of the oral passage, while the air passage through the vocal cords passes through the saints.
  • the former is usually called the consonant vowel.
  • Consonants should be examined according to how and where they are spoken. Each column represents the articulation position and each line represents the articulation method.
  • Clogging sound is the sound that completely blocks and blows the airflow in the oral cavity.
  • Clogging sound is the sound of narrowing a part of the saint and passing the airflow through the narrow passage.
  • the sound of clogging can again be divided into nasal resonances and non-acoustic sounds.
  • the nasal stops which are accompanied by the nasal passages and the nasal passages, are included in the former, and the nasal stops are raised against the pharyngeal wall.
  • the latter are the oral blockage sounds (oral stops) that block and prevent airflow from reaching the nasal passages.
  • oral clog sounds are considered to be closed (stop) or ruptured (plosive), electric (trill), and snowballs (or flap / tap). can see.
  • the mute sound is divided into a fricative (approach) and an approach sound (approximant).
  • an approach sound approximately 1 cm
  • lateral sound When a passage of air flow is formed on the side of the tongue, it is called lateral sound.
  • bilabial When classified according to the position of articulation, bilabial refers to the sound of two lips related to the articulation, and Korean / ⁇ , ⁇ , ⁇ , ⁇ / and the like belong to this.
  • the Yangpyeon in modern Korean (standard) is a sound that blocks both lips, but it can also narrow the gap between the two lips, rubbing the airflow between them (sheep friction), and dropping both lips (sheep rolling) .
  • Labiodentals refer to sounds related to articulation of the lower lip and upper teeth, and do not exist in Korean. There is no pure sound in Korean, but [f, v] in English belongs to this pure sound.
  • Dental refers to the sound that the airflow narrows or closes at the back of the upper teeth, and is sometimes called interdental because of friction between them.
  • Alveolar is a sound produced by the constriction or closure of air currents near the upper gums, which belong to the Korean ⁇ , ⁇ , ⁇ , ,, ⁇ , ⁇ /.
  • Korean / ⁇ , ⁇ / are the sounds of airflow constriction in the alveolar area, and / s, z / in English and the airflow constriction are almost similar.
  • Palatoalveolar also known as postalveolar, is the sound of the tip of the tongue or forearm touching the posterior neck, not in Korean, but in English or French.
  • Alveolopalatal is also called prepalatal because it is articulated in front of the palatal or near the alveolar.
  • Retroflex differs from other tongues where the tip of the tongue or the upper surface of the tongue is articulated by touching or approaching the palate, and the lower surface of the tongue is articulated by touching or approaching the palate.
  • a palatal sound refers to the sound that the body touches or approaches an oral palatal articulation.
  • Velar is the sound that the body touches or approaches the research site. Korean closed sound / ⁇ , ⁇ , ⁇ / and nasal / ⁇ / belong to this.
  • the palatal masturbate refers to the sound that the body touches or approaches the palate, the tip of the study palate.
  • Pharyngeal refers to the sound that the articulation is made in the pharyngeal cavity.
  • the glottal refers to the sound that the vocal cords are used as an articulation organ, and there is only a voiceless vocal friction / ⁇ / as a phoneme in Korean.
  • Vowel articulation is the three most important variables, such as the height of the tongue, the position of the front and back, and the shape of the lips.
  • the opening of the vowel is determined by the height of the tongue.
  • the sound of opening the mouth less is called a close vowel, or a high vowel, and the sound of throwing away the mouth greatly. Is called open vowel, or low vowel.
  • the sound between the high vowels and the low vowels is called the mid vowel, which is the close-mid vowel, or half-close vewel, with the mouth open again. It can be subdivided into larger open-mid vewels or half-open vewels.
  • the second variable, the front and rear position of the tongue is in fact determined based on which part of the tongue is the narrowest, that is, which part of the tongue is closest to the palate.
  • the narrow part of the tongue is called the front vowel
  • the back vowel is called the back vowel
  • the middle vowel is called the central vowel.
  • rounded vowels are rounded vowels
  • rounded vowels are called rounded vowels.
  • Speech impairment refers to the inability of soundness, intensity, sound quality, and fluidity to be appropriate for gender, age, size, social environment, and geographical location. It can be made either innately or acquiredly and can be treated to some extent by increasing or decreasing the vocal cords that are part of the larynx. However, the treatment is not perfect, and the effect is not accurate.
  • the larynx functions include swallowing, coughing, occlusion, breathing, and vocalization, and there are various evaluation methods (e.g., firing history test, speech pattern, acoustic test, aerodynamic test ). This assessment can be used to determine whether or not there is a speech impairment.
  • a vibration generator capable of artificially generating vibrations.
  • the method of the vibration generator can use the principle of the speaker. Looking at the structure of the speaker, it consists of a magnet and a coil, and when the current is reversed in the state of flowing a current to the coil, the pole of the magnet is reversed. Therefore, attraction and repulsive force act according to the direction of the current of the magnet and the coil, which causes the coil to reciprocate. The reciprocating motion of the coil vibrates the air to generate vibration.
  • Another method using the piezoelectric phenomenon is that the piezoelectric crystal unit receives a low frequency signal voltage and causes distortion, thereby causing the diaphragm to vibrate to generate sound. Therefore, the vibration generator using these principles can be performed to perform the function of the vocal cords.
  • the sound that appears outside is simply a function of vibrating the vocal cords, and it is not easy to identify the speaker's intention.
  • the vibration generator should be located at the vocal cords and always have a hand.
  • the above-mentioned utterance disorder and the above-described utterance abnormalities can be sought for therapeutic methods such as surgery on the larynx or vocal cords, but such surgical methods or treatments are sometimes impossible to provide a complete solution.
  • the Rion EPG, Flecher which was widely commercialized under the name of Rion in 1973 by the University of Reading, Fujimura, Japan, and Tatsumi, used by companies such as WinEPG and Articulate Instruments Ltd.
  • This application includes Kay Palatometer, developed by UCLA Phonetics Lab for research purposes, and Complete Speech (Logomertix), developed by Schmidt.
  • the conventional techniques have a limitation in igniting based on passive articulation organs, and in order to implement utterances according to the actual articulation method by using oral tongue, which is the active articulation organ itself, or linkage between oral tongue and other articulation organs. There was a definite limit.
  • Speech and facial synchronization is to copy the speech, facial expressions, etc., including speech and articulation, which are the most important factors that determine the identity of the object or object, to characters, robots, various electronic products, autonomous vehicles, etc.
  • Application is a key means of determining and extending an individual's identity.
  • Conventional general techniques have been limited to the use of simple lip library to create low quality animations. Overseas animation content producers such as Pixar and Disney spend a lot of time and money creating realistic character animations through Lip Sync.
  • an object of the present invention is to grasp the user's articulation method according to the user's utterance through the sensor of the head and neck including the oral cavity, it is the hearing, visual, tactile It is to provide a device and a method for complementing the speech that can be expressed in the form of a good quality voice, that is, speech can be expressed.
  • Another object of the present invention is to implement an appropriate utterance of good quality when the normal function in the utterance and correction or treatment is impossible.
  • Still another object of the present invention is to grasp the user's articulation method according to the user's utterance intention through the sensor of the head and neck including the oral cavity, and to map it to the head and neck of the image object including the animation utterance and expression of the image object It is intended to provide a way to implement more similar to humans and naturally.
  • Still another object of the present invention is to grasp the user's articulation method according to the user's utterance intention through the sensor of the head and neck including the oral cavity, and to map it to the actuator of the head and neck of the robot including the humanoid ignition of the head and neck of the robot And it is to provide a method of embodying the expression more similar to the human naturally.
  • the sensor unit for measuring the physical characteristics of the articulation engine adjacent to one surface of the head and neck portion of the speaker;
  • a data analysis unit which grasps a utterance characteristic of a speaker based on the position of the sensor unit and the physical characteristics of the articulator;
  • a data converting unit converting the position of the sensor unit and the speech feature into language data; It includes a data expression unit for expressing the language data to the outside, the sensor unit provides a speech intent expression system including a mouth verbal sensor corresponding to the oral cavity.
  • the oral tongue sensor is fixed to one side of the oral tongue, wraps the surface of the oral tongue, or is inserted into the oral tongue, and the x-axis, the y-axis, and the z-axis of the oral tongue according to the ignition.
  • the physical characteristics of at least one of low altitude, anteroposterior, flexion, extension, rotational, tension, contraction, relaxation, and vibration of the oral tongue can be identified. have.
  • the oral tongue sensor is fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, or inserted into the oral tongue, x-axis, y-axis, z-axis of the oral tongue according to the ignition
  • the change amount of the rotation angle per unit time based on the direction, it is possible to grasp the physical characteristics of the articulation organ including the oral cavity.
  • the oral tongue sensor is fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, the polarization due to the change in the crystal structure according to the physical force generated by the contraction and relaxation of the oral tongue due to ignition
  • the degree of bending of the oral tongue through the piezoelectric element that generates an electrical signal corresponding to the low altitude, the front and rear tongue, the degree of flexion, the extension, the rotation, the tension, the contraction degree, the relaxation degree, the vibration degree of the oral tongue At least one of the physical characteristics can be identified.
  • the sensor unit the rupture degree, friction degree, resonance degree, the approach degree of the triboelectric power (Tribo Electric Generator) corresponding to the approach and contact caused by the oral cavity is due to the interaction with other articulation organs inside and outside the head and neck It may include a triboelectric charging element for identifying at least one physical property.
  • the data interpreter may include a consonant, lexical unit stress, and sentence stress that are spoken by the speaker through physical characteristics of the oral culprit and other articulation organs measured by the sensor unit. At least one speech feature can be identified.
  • the data analysis unit in grasping the speech characteristics by the physical characteristics of the articulation engine measured by the sensor unit, the pronunciation of the speaker based on a standard speech feature matrix consisting of a binary number or a real number
  • the utterance characteristic of at least one of hyper noon, pseudo-proximity and utterance intent can be measured.
  • the data analyzing unit may recognize physical characteristics of the articulation engine as measured by the sensor unit, and recognize the physical characteristics of the articulation engine as a pattern of each consonant unit. Extracting a feature of the pattern of the consonant, classifying the extracted feature of the pattern of the consonant unit according to similarity, recombining the feature of the pattern of the classified consonant unit, and uttering the physical characteristics of the articulation organ According to the interpretation of the feature, the speech feature may be identified.
  • the data analysis unit according to the physical characteristics of the articulation engine measured by the sensor unit, assimilation, dissimilation, elimination, attachment, stress and stress of the consonants Asthma, Syllabic cosonant, Flapping, Tensification, Labilization, Velarization, and Dinification caused by Reduction It is possible to measure the utterance variation, which is the secondary articulation of at least one of Dentalizatiom, Palatalization, Nasalization, Stress Shift, and Lengthening.
  • the oral cavity sensor may include a circuit unit for sensor operation, a capsule unit surrounding the circuit unit, and an adhesive unit attached to one surface of the oral cavity.
  • the oral tongue sensor may be operated adjacent to the oral tongue as a film having a thin film circuit.
  • the sensor unit may include a face sensor including at least one reference sensor generating a reference potential for measuring nerve signals of the head and neck muscles, at least one anode sensor and at least one cathode sensor for measuring nerve signals of the head and neck muscles. It may include.
  • the data analysis unit in acquiring the position of the sensor unit based on the face sensor, by grasping the potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor the face unit sensor I can figure out the position of.
  • the data analyzer may be configured to determine a potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor in acquiring a utterance characteristic of the speaker based on the face sensor. Ignition characteristics due to the physical characteristics of the articulation engine generated in the head and neck of the.
  • the sensor unit may include a vocal cord sensor that grasps EMG or tremor of the vocal cords adjacent to the vocal cords of the head and neck of the speaker and grasps at least one utterance information of the utterance start, ignition stop, or utterance of the narrator. Can be.
  • the sensor unit may include a dental sensor that detects a signal generation position due to a change in electrical capacity generated due to contact between the oral tongue and the lower lip adjacent to one surface of the tooth.
  • the data analyzer may acquire a voice of the speaker according to the utterance through a voice acquisition sensor adjacent to one surface of the head and neck of the speaker.
  • the sensor unit may detect at least one of change information of the head and neck articulation organ of the speaker, head and neck facial expression change information of the speaker, and non-verbal expression of the head, neck, chest, upper and lower extremities moving according to the speaker's intention to speak. It may include an imaging sensor for imaging the head and neck of the speaker.
  • the speech intent expression system may further include a power supply unit for supplying power to at least one of the oral tongue sensor, facial sensor, voice acquisition sensor, vocal cord sensor, dental sensor, imaging sensor.
  • the speech intent representation system may further include a wired or wireless communication unit capable of interworking communication when the data analysis unit and the database unit are located outside and operated.
  • the data analysis unit may be linked with a database unit including at least one language data index corresponding to the position of the sensor unit, the speaker's speech feature, and the speaker's voice.
  • the database unit at least one of the duration of the speech, the frequency according to the speech, the amplitude of the speech, the electromyography of the head and neck muscles according to the speech, the position change of the head and neck muscles according to the speech, the position change due to the bending and rotation of the oral tongue
  • the at least one language data index among the phoneme index, the syllable unit index, the word unit index, the phrase unit index, the sentence unit index, the continuous speech unit index, and the high and low index of the pronunciation may be constructed. .
  • the data expression unit in conjunction with the language data index of the database unit, the speaker's utterance characteristics (Phoneme) unit, at least one word unit, at least one phrase unit (Citation Forms), at least one It may represent at least one speech expression of the sentence unit of the continuous speech (Consecutive Speech).
  • the speech expression represented by the data expression unit may be visualized as at least one of a letter symbol, a picture, a special symbol, and a number, or may be audited in a sound form and provided to the speaker and the listener.
  • the utterance expression represented by the data expression unit may be provided to the speaker and the listener by at least one tactile method of vibration, snoozing, tapping, pressing, and relaxing.
  • the data converter converts the position of the sensor unit and the head and neck facial expression change information into first basis data, and converts the utterance feature, the change information of the articulation organ, and the head and neck facial expression change information into second basis data.
  • the head and neck of the image object or the head and neck of the robot object may be generated as object head and neck data required for at least one object.
  • the speech intention expression system in representing the head and neck data processed by the data analysis unit in the head and neck of the image object or the head and neck of the robot object, the static basic coordinates based on the first basis data of the data conversion unit And setting the dynamic variable coordinates based on the second basis data to generate a matching position.
  • the head and neck data of the robot is transmitted to an actuator located on one surface of the head and neck of the robot object by the data matching unit, and the actuator is head and neck of the robot object including at least one of articulation, speech, and facial expression according to the head and neck data. Can implement the movement.
  • the speech intention expression system based on the physical characteristics of the head and neck articulation organs of the present invention grasps the intention to speak in the use of the head and neck articulator centering on the oral cavity of the speaker and expresses it in the form of hearing, sight, and tactile sound, namely Talking has the effect that can be expressed.
  • the articulatory organs inside and outside the head and neck, including the oral cavity are used to grasp the intention of speaking.
  • movements include the independent physical characteristics of the oral tongue or the passive articulator and the lips, the gate, the vocal cord, the pharynx, the epiglottis
  • the degree of closure, rupture, friction, and resonance caused by interaction with one or more of the active articulators consisting of one or more of the following:
  • One or more characteristics of the approach should be identified, and various sensors capable of identifying azimuth, elevation, rotation angle, pressure, friction, distance, temperature, sound, etc. are used to understand these characteristics.
  • the existing artificial vocal cords have the disadvantage of sounding vibrations from the outside, the one hand movement is unnatural and the quality of speech is very low.
  • the palatal which is a passive articulator. there was.
  • phonetic articulatory phonetics which attempts to measure the speaker's speech using artificial palate, has been recognized as the mainstream until now, but in the measurement of speech, only the presence or absence of discrete speech of speech caused by a specific consonant's articulation I could figure it out.
  • this claim of articulation does not mean that human speech is discrete.
  • each phoneme, in particular vowels divers is aroused by academic acoustics, which claims that each vowel is a continuous system that cannot be segmented and cannot be segmented or pronounced.
  • human speech is articulated by "speaking.” Or “did not ignite”, but not discretely, but rather proportional, proportional, or phased by similarity.
  • acoustic phonology quantifies the physical properties of language itself according to the speaker's utterance, and grasps the similarity or proximity of utterances, so that the proportional, proportional, stepwise similarity of pronunciation that conventional articulation phonology could not implement It opens the possibility for measuring ignition by degree.
  • the present invention is based on articulation phonology, and it is very innovative to grasp and implement more accurate speech intention according to scaling of articulation to be pursued by acoustic phonology. It has advantages.
  • the quality of the communication and the convenience of life are very excellent because the present invention intuitively presents the speech intention in the form of hearing, sight, and touch by scaling the articulation generated by the speaker's articulation. It is expected to be.
  • Silent Speech speech conversation
  • Speech to Text Speech to Text
  • the speaker speaks and the hearing impaired as the listener recognizes this as a visual material, thus eliminating communication difficulties.
  • it can be used for public transportation, public facilities, military installations and operations and underwater activities that are affected by noise in communication.
  • the present invention transmits and matches the speaker articulation information to an actuator that implements the movement of the head and neck of the robot object, thereby reproducing head and neck movements including articulation, utterance, and expression of a robot similar to a human speaker
  • Humanoid robots insisted by Masahiro have the effect of overcoming the "Uncanny Valley", a chronic cognitive dissonance caused by humans.
  • human-friendly articulation of humanoids and other robots becomes possible, and it is possible to replace human roles of robots and Androids, and furthermore, human-robot dialogues are achieved, thereby increasing the elderly population due to aging. It is effective in preventing mental and psychological diseases such as isolation and depression in elderly society.
  • FIG. 1 is a view showing a sensor unit of a speech intention representation system according to a first embodiment of the present invention.
  • FIG. 2 is a view showing the position of the sensor unit of the speech intention representation system according to the first embodiment of the present invention.
  • FIG. 3 is a diagram illustrating a speech intent representation system according to a first embodiment of the present invention
  • FIG. 4 is a view showing the positional name of the oral cavity used in the speech intent expression system according to the first embodiment of the present invention.
  • FIG. 5 is a view showing the action of oral tongue for vowel speech utilized in the speech intent expression system according to the first embodiment of the present invention.
  • FIG. 6 to 10 are views illustrating various oral cavity sensors of a speech intention expression system according to a first embodiment of the present invention, respectively.
  • 11 and 12 are a cross-sectional view and a perspective view respectively showing the attachment state of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
  • FIG. 13 is a view showing a circuit portion of the oral cavity sensor of the speech intention expression system according to the first embodiment of the present invention.
  • FIG. 14 is a view illustrating various utilization states of the oral cavity sensor of the speech intention expression system according to the first embodiment of the present invention.
  • FIG. 15 is a diagram illustrating a speech intent representation system according to a second embodiment of the present invention.
  • FIG. 16 is a diagram illustrating a principle in which a data interpreter of a speech intent expression system according to a second embodiment of the present invention grasps speech characteristics.
  • 17 is a view showing a principle of grasping the physical characteristics of the articulation engine measured by the data analysis unit of the speech intent representation system according to the second embodiment of the present invention as speech characteristics.
  • FIG. 18 is a diagram illustrating a standard speech feature matrix for a vowel utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
  • FIG. 19 is a diagram showing a standard speech feature matrix relating to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
  • FIG. 19 is a diagram showing a standard speech feature matrix relating to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
  • 20 is a diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features;
  • FIG. 21 is a detailed diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation organ as speech features;
  • FIG. 22 illustrates in detail the principle of an algorithmic process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation organ as speech features;
  • FIG. 23 is a diagram illustrating an algorithm process of identifying a specific vowel uttered by the oral cavity sensor of the utterance intention expression system according to the second embodiment of the present invention as a utterance characteristic
  • FIG. 24 is a diagram illustrating a case in which the data analysis unit of the speech intent representation system according to the second embodiment of the present invention utilizes Alveolar Stop; FIG.
  • FIG. 25 is a diagram illustrating a case in which a data analysis unit of a speech intention representation system according to a second embodiment of the present invention utilizes a bilabial stop;
  • FIG. 26 is a diagram illustrating an experimental result using a voiced bilabial stop of a data interpreter of a speech intention expression system according to a second embodiment of the present invention.
  • FIGS. 27 and 28 are diagrams illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment of the present invention utilizes a voiced labiodental fricative;
  • FIG. 29 is a diagram illustrating interworking between a data interpreter and a database of a speech intent representation system according to a second embodiment of the present invention.
  • FIG. 30 is a diagram illustrating a case in which a data interpreter of a speech intention expression system according to a second embodiment of the present invention recognizes a specific word.
  • FIG. 31 is a diagram showing a database unit of a speech intent representation system according to a second embodiment of the present invention.
  • FIG. 32 is a diagram showing a speech intention expression system according to a third embodiment of the present invention.
  • 33 and 34 are diagrams each showing an actual form of a database unit of the speech intent representation system according to the third embodiment of the present invention.
  • 35 is a diagram showing a speech intention expression system according to a fourth embodiment of the present invention.
  • FIG. 36 is a view showing interlocking of a sensor unit, a data analysis unit, a data expression unit, and a database unit in a speech intention representation system according to a fourth embodiment of the present invention
  • FIG. 42 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention expresses language data visually and acoustically;
  • FIG. 42 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention expresses language data visually and acoustically;
  • FIG. 43 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention visually expresses language data
  • FIG. 44 is a diagram illustrating a case in which a data expression unit of a speech intention expression system in accordance with a fourth embodiment of the present invention visually expresses language data
  • FIG. 45 is a diagram illustrating a case in which a data expression unit of a speech intention expression system in accordance with a fourth embodiment of the present invention expresses language data in a continuous speech unit;
  • FIG. 46 is a diagram illustrating a confusion matrix utilized by a speech intent representation system according to a fourth embodiment of the present invention.
  • FIG. 46 is a diagram illustrating a confusion matrix utilized by a speech intent representation system according to a fourth embodiment of the present invention.
  • FIG. 47 is a diagram showing, as a percentage, the Confusion Matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention.
  • FIG. 48 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention assists a speaker in language correction and guidance through a screen;
  • FIG. 49 is a diagram illustrating a case where a speech intention expression system according to a fourth embodiment of the present invention captures and captures an image of a head and neck articulation organ;
  • FIG. 50 is a diagram illustrating a case in which a speech intent representation system according to a fourth embodiment of the present invention combines mutual information through a standard speech feature matrix;
  • FIG. 51 is a diagram illustrating a speech intention presentation system according to a fifth embodiment of the present invention.
  • FIG. 52 illustrates a case in which a speech intention expression system according to a fifth embodiment of the present invention matches object head and neck data to head and neck portions of an image object based on static basic coordinates.
  • FIG. 53 is a view illustrating static basis coordinates based on a position of a face sensor utilized by a speech intention representation system according to a fifth embodiment of the present invention.
  • FIG. 54 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on dynamic basic coordinates.
  • FIG. 55 is a view showing dynamic basis coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention.
  • FIG. 56 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches the head and neck data of the robot object to the actuator of the head and neck of the robot object based on the static basic coordinates.
  • FIG. 57 is a view showing static basic coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention.
  • FIG. 58 is a diagram illustrating a case in which the speech intent representation system according to the fifth embodiment of the present invention matches the head and neck data with an actuator of the head and neck of a robot object based on dynamic variable coordinates.
  • FIG. 59 is a view illustrating dynamic variable coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention.
  • 60 and 61 are views showing the operation of the actuator of the head and neck portion of the robot object of the speech intent representation system according to the fifth embodiment of the present invention.
  • FIG. 62 is a view showing an actuator of the head and neck part of the robot object of the speech intent expression system according to the fifth embodiment of the present invention.
  • FIG. 1 is a diagram illustrating a sensor unit of a speech intent representation system according to a first embodiment of the present invention
  • FIG. 2 is a diagram illustrating a position of a sensor unit of a speech intention representation system according to a first embodiment of the present invention
  • 3 is a diagram illustrating a speech intention expression system according to a first embodiment of the present invention.
  • the sensor unit 100 is oral tongue sensor 110, face sensor 120 located in the head and neck portion ), The voice acquisition sensor 130, the vocal cord sensor 140, the tooth sensor 150.
  • the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 positioned in the head and neck part are located in the sensor unit 210 in which the respective sensors are located. ),
  • the speech feature 220 according to the utterance of the speaker 10, the voice of the speaker 230, the speech detail information 240, and the speech variation 250 are provided.
  • the data interpreter 200 acquires such data, and the data converter 300 processes the data as language data 310.
  • FIG 4 is a view showing the positional name of the oral cavity used in the speech intent representation system according to the first embodiment of the present invention
  • Figure 5 is used in the speech intent representation system according to the first embodiment of the present invention It is a diagram showing the action of oral tongue for vowel speech.
  • the oral tongue sensor 12 is fixed to one side of the oral tongue 12, surrounds the surface, or is inserted therein, and has a low altitude, front and rear tongue shape, and bending Identify the independent physical properties of the oral tongue itself of at least one of degrees, extension, rotation, tension, contraction, relaxation, and vibration.
  • FIGS. 6 to 10 are views illustrating various oral cavity sensors of a speech intention expression system according to a first embodiment of the present invention.
  • the oral tongue sensor 110 rotates per unit time and acceleration in the x-axis, y-axis, and z-axis directions.
  • the ignition characteristic 220 by the physical characteristics of another articulation organ including the oral cavity 12 is grasped.
  • the oral tongue sensor 110 generates a polarization signal due to a change in the crystal structure 111 according to the physical force generated by contraction or relaxation of the oral tongue 12 due to ignition. doing By grasping the bending degree of the oral tongue 12 through the piezoelectric element 112, it is possible to grasp the ignition characteristic 220 due to the physical characteristics of the articulation organ including the oral tongue 12.
  • the oral tongue sensor 110 has a triboelectric generator generated by the approach and contact caused by the oral tongue 12 interacting with other articulation organs inside and outside the head and neck.
  • the speaker's ignition characteristic 220 is identified using the triboelectric charging element 113.
  • the integrated oral tongue sensor 110 includes an acceleration and angular velocity in the x-axis, y-axis, and z-axis directions, an electrical signal by piezoelectricity, and a triboelectricity by contact. Identify the utterance characteristics 220 by the physical characteristics of the articulation organ, including.
  • 11 and 12 are a cross-sectional view and a perspective view respectively showing the attachment state of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
  • the oral tongue sensor 110 may be configured as a composite thin film circuit and implemented as a single film.
  • the oral tongue sensor 110 includes a circuit part 114 for operating the sensor part 100, a capsule part 115 surrounding the circuit part 114, and an oral tongue sensor 110 on one surface of the oral tongue 12. It consists of an adhesive part 116 to adhere.
  • the oral tongue sensor 110 As shown in Figures 6 to 9, the oral tongue sensor 110, the degree of rupture, friction, resonance, approaching degree caused by the adjacent or in contact with other articulation organs inside and outside the head and neck according to the characteristics of each sensor One or more physical characteristics can be identified.
  • FIG. 13 is a diagram illustrating a circuit unit of an oral cavity sensor of a speech intention expression system according to a first embodiment of the present invention.
  • the circuit unit 114 of the oral cavity sensor 110 includes a communication chip, a sensing circuit, and an MCU.
  • FIG. 14 is a diagram illustrating various utilization states of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
  • the oral culprit sensor 110 may grasp the state of the oral culprit 12 according to the utterance of the speaker's various consonants, and identify the utterance characteristics 220 according to the consonant vowels.
  • the oral cavity sensor 110 may grasp the utterance feature 220 according to the Bilabial Sound, the Alveolar Sound, and the Palatal Sound.
  • FIG. 15 is a diagram illustrating a speech intention expression system according to a second embodiment of the present invention.
  • the sensor unit 100 near the head and neck articulation organ including the sensor 150 is located in the head and neck articulation organ, where the sensor is located 210, the utterance feature 220 according to the utterance, and the speaker's voice 230 according to the utterance.
  • the identification history information 240 including the start of the utterance, the utterance stop, and the utterance end is grasped.
  • the utterance feature 220 may include one or more basic physical utterance characteristics of closed rupture speech, frictional speech, rupture speech, non-negative speech, voiced speech, active speech, sibilant speech, voiceless speech, and voiced speech. it means.
  • the speaker's voice 230 is also an auditory speech feature that accompanies the speech feature.
  • the utterance history information 240 through the vocal cord sensor 140, grasps the information by EMG or tremor of the vocal cords.
  • the data analysis unit 200 includes a sensor unit 100 near the head and neck articulation organ including the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150.
  • the speech variation 250 generated according to the speaker's gender, race, age, and native language is identified.
  • the utterance variation 250 is an associative syllable caused by assimilation, dissimilation, elimination, attachment, stress, and reduction of consonants.
  • the data converter 300 may include a position 210 of the sensor unit measured by the head and neck articulator sensors 110, 120, 130, 140, and 150, a speech feature 220 according to speech, and a speaker's voice according to speech. 230, the utterance history information 240 and the utterance variation 250 are recognized as language data 310 and processed.
  • the data analysis unit 200 is linked with the database unit 350.
  • the database unit 350 includes a phoneme unit 361, an index syllable unit index 362, a word unit index 363, a phrase unit index 364, a sentence unit index 365, and a continuous speech index 366. ), A linguistic data index 360 including a high and low index 367 of the pronunciation. Through the language data index 360, the data analysis unit 200 may process various speech-related information acquired by the sensor unit 100 as language data.
  • FIG. 16 is a diagram illustrating a principle in which a data interpreter of a speech intent representation system according to a second embodiment of the present invention grasps speech characteristics
  • FIG. 17 illustrates data of a speech intent representation system according to a second embodiment of the present invention
  • FIG. 18 is a diagram illustrating a principle of identifying physical characteristics of an articulation engine measured as an utterance feature
  • FIG. 18 is a standard utterance feature regarding a vowel used by a data interpreter of a speech intent representation system according to a second embodiment of the present invention
  • 19 is a diagram showing a matrix
  • FIG. 19 is a diagram illustrating a standard speech feature matrix related to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
  • the data analysis unit 200 first acquires physical characteristics of the articulation organ measured from the sensor unit 100 including the oral cavity sensor 110. .
  • the oral cavity sensor 110 senses the articulation organ physical characteristics and creates a matrix value of the sensed physical characteristics.
  • the data analysis unit 200 grasps the speech feature 220 of the consonant corresponding to the matrix value of the physical property in the standard speech feature matrix 205 of the consonant.
  • the standard speech feature matrix 205 of the consonant may have one or more of its values as a consonant utterance symbol, binary or real number.
  • FIG. 20 is a diagram illustrating an algorithm process utilized by a data interpreter of a speech intention expression system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features.
  • the algorithm process utilized by the data analysis unit 200 includes the steps of acquiring the physical characteristics of the articulation engine in grasping the physical characteristics of the articulation engine measured by the sensor unit 100; Identifying a pattern of each consonant unit possessed by the acquired physical characteristics of the articulated organ, extracting a unique feature from each consonant pattern, classifying the extracted features, and recombining the features of the classified pattern It consists of stages, through which the final identification of the specific speech characteristics.
  • FIG. 21 is a detailed diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features
  • FIG. FIG. 23 is a diagram illustrating in detail the principle of an algorithm process utilized by a data interpreter of a speech intention representation system according to a second embodiment to identify physical characteristics of an articulation engine as speech characteristics
  • FIG. 23 is a diagram illustrating a second embodiment of the present invention. It is a diagram illustrating an algorithm process of identifying a specific vowel as a speech feature by the oral cavity sensor of the speech intent expression system.
  • the step of grasping the pattern of the unit of each consonant is a sensor grasping the physical characteristics of the articulation engine.
  • the pattern of the consonant unit is determined based on the x, y, and z axes.
  • ANN Artificial Neural Network
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Network
  • RBM Restricted Boltzmann Machine
  • HMM Hidden Markov Model
  • the change in the vector amount and the angle change amount are determined by measuring the utterance of the speaker. It is recognized as a vowel [i] having Tongue Height and Tongue Frontness.
  • the oral culprit sensor 110 may be caused by the change of the electrical signal due to piezoelectricity and the proximity or friction between the oral tongue sensor 110 and the internal and external articulation organs.
  • the triboelectric signal is identified and recognized as a vowel [i] with dullness and legend.
  • the height of the tongue and the backness of the vowel are measured to identify the vowel.
  • the oral cavity sensor 110 measures vowels such as [i], [u], and [] generated by the utterance of the speaker as the utterance feature 220.
  • This vowel speech feature 220 corresponds to the phoneme unit index 361 of the consonant of the database 350.
  • FIG. 24 is a diagram illustrating a case in which the data analysis unit of the speech intention expression system according to the second embodiment of the present invention utilizes Alveolar Stop.
  • the oral cavity sensor 110 measures the specific consonant uttered by the speaker as the utterance feature 220.
  • the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes Alveolar Stop as the language data 310.
  • FIG. 25 is a diagram illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment uses Bilabial Stop.
  • the oral cavity sensor 110 and the face sensor 120 measure the specific consonant uttered by the speaker as the utterance feature 220.
  • the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes the bilabial stop as the language data 310.
  • FIG. 26 is a diagram illustrating an experiment result using a voiced bilabial stop of a data interpreter of a speech intent expression system according to a second embodiment of the present invention.
  • the oral tongue sensor 110 and the face sensor 120 measure the specific consonant uttered by the speaker as the utterance feature 220.
  • the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 is a voiced bilabial stop, which is language data 310, and / or /.
  • the voiceless bilabial stop was identified as / per /.
  • FIGS. 27 and 28 are diagrams illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment of the present invention utilizes Voiced Labiodental Fricative.
  • oral tongue sensor 110 As shown in Figure 27 and Figure 28, oral tongue sensor 110, facial sensor 120, voice acquisition sensor 130.
  • the vocal cord sensor 140 and the dental sensor 150 measure the specific consonant uttered by the speaker as the utterance feature 220.
  • the consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes the voiced Labiodental Fricative as the language data 310.
  • FIG. 29 is a diagram illustrating interworking between a data interpreter and a database of a speech intention expression system according to a second embodiment of the present invention.
  • the imaging sensor 160 includes a speaker having an oral cavity sensor 110, a face sensor 120, and a voice acquisition sensor 130.
  • the voice data 310 is recognized and processed.
  • the face sensor located on one surface of the head and neck has its own position with the potential difference between the anode sensor 122 and the cathode sensor 123 relative to the reference sensor 121, which is captured by the imaging sensor 160 by imaging.
  • the physical head and neck joint articulation change information 161, head and neck facial expression change information 162, and non-verbal expression information 163 are transmitted to the data converter 300 as language data 310.
  • FIG. 30 is a diagram illustrating a case in which a data interpreter of a speech intention expression system according to a second embodiment of the present invention recognizes a specific word.
  • the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 make a specific consonant and a vowel spoken by the speaker.
  • the measurement and data analysis unit 200 recognizes consonants and vowels as speech features 220.
  • the speech features 220, [b], [i], and [f] of the respective consonants correspond to phoneme unit indexes 361 of the consonants of the database unit 350, respectively, and the data interpreter determines this as / beef /. To [bif].
  • FIG. 31 is a diagram showing a database unit of a speech intent representation system according to a second embodiment of the present invention.
  • the language data index 360 of the database unit 350 includes a phoneme unit index 361, a syllable unit index 362, a word unit index 363, and a phrase unit index 364 of a consonant. ), A sentence unit index 365, a continuous speech index 366, and a high and low index 367 of the pronunciation.
  • FIG. 32 is a diagram illustrating a speech intention expression system according to a third embodiment of the present invention.
  • the communication unit 400 may communicate with each other.
  • the communication unit 400 is implemented by wire and wireless, and in the case of wireless, various methods such as Bluetooth, Wi-Fi, 3G, 4G, and NFC may be used.
  • 33 and 34 are diagrams each showing an actual form of the database unit of the speech intent representation system according to the third embodiment of the present invention.
  • the database unit 350 linked to the data analysis unit 200 has a language data index and has a speech feature 220 according to the actual speech, the speaker's voice 230, and the speech details.
  • the information 240 and the speech variation 250 are identified as the language data 310.
  • FIG. 33 is a diagram illustrating various consonant speech features including the high front tense vowel and the high back tense vowel of FIG. 23, the alveolar sounds of FIG. 24, and the voiceless labiodental fricative of FIG. 27.
  • the actual data of the database unit 350 reflected by the 200 is shown.
  • FIG. 34 illustrates a database in which the sensor unit 100 measures various consonant utterance characteristics including the high front lax vowel of FIG. 23, the alveolar sounds of FIG. 24, and the bilabial stop sounds of FIG. 25, and the data interpreter 200 reflects. Actual data of the unit 350.
  • FIG. 35 is a diagram illustrating a speech intent expression system according to a fourth embodiment of the present invention
  • FIG. 36 is a sensor part, a data interpreter, a data expression part, and a speech intent expression system according to a fourth embodiment of the present invention. It is a figure which shows the interlocking of a database part.
  • the speech intention expression system includes a sensor unit 100, a data analysis unit 200, a data conversion unit 300, and a database unit which operate organically in cooperation. 350 and the data expression unit 500.
  • the sensor unit 100 is located in an actual articulation engine, measures physical characteristics of the articulation engine according to the utterance of the speaker, and transmits the physical characteristics to the data analysis unit 200, which the data analysis unit 200 performs.
  • Interpret as language data The interpreted language data is transmitted to the data expression unit 500, and it can be seen that the database unit 350 works in conjunction with the interpretation process and the expression process of the language data.
  • 37 to 41 are diagrams showing means for expressing language data by the data expression unit of the speech intention expression system according to the fourth embodiment of the present invention.
  • the physical characteristics of the head and neck articulator of the speaker obtained by the sensor unit 100 is the position of the sensor unit 210, the ignition feature 220, through the data analysis unit 200,
  • the speaker 230 is identified as the speaker 230, the speech detail information 240, and the speech variation 250.
  • the imaging sensor 160 captures an appearance change of the speaker's head and neck articulation organ, and the data interpreter 200 recognizes the change information 161 and the head and neck facial expression change information 162 of the speaker's head and neck articulation organ.
  • FIG. 37 illustrates that the data expression unit 500 acoustically expresses the language data 310
  • FIG. 38 illustrates that the data expression unit 500 visually expresses the language data 310.
  • the physical characteristics of the speaker's articulation organ measured by the analyzing unit 200 are compared with the language data index 360 of the database unit 350, and the accent noon and the like are accompanied by a broad description of the actual standard pronunciation. It is provided to provide a numerical value measured at least one of proximity, intention to ignite.
  • the data expression unit 500 expresses the language data 310 visually and aurally, and indicates the physical characteristics of the speaker's articulation organ measured by the data interpreter 200. Compared with the index 360, it provides a narrow description of the actual standard pronunciation along with a measure of one or more of stressed noon, pseudo-proximity, and speech intent.
  • the data expressing unit 500 visually expresses the language data 310
  • the physical characteristics of the speaker's articulation organ measured by the data analyzing unit 200 are indexed by the language unit 360 of the database unit 350.
  • a number of measurements of one or more of stressed noon, pseudo-proximity, or speech intent and the corresponding language data 310 is a word-by-word index as a word ( 363)
  • the corresponding image is provided together with the corresponding image.
  • the data expression unit 500 expresses the language data 310 visually and aurally, and indicates the physical characteristics of the speaker's articulation organ measured by the data interpreter 200. Compared with the index 360, a numerical description of one or more of stressed noon, pseudo-proximity, and speech intent, together with a narrow description of the actual standard pronunciation, is provided. It shows that the speech correction image is provided together to utter the pronunciation so that it can be corrected and strengthened.
  • FIG. 42 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually and acoustically.
  • the data expression unit 500 visualizes the language data 310 by text and provides audio by audio, so that the physical characteristics of the speaker's articulation engine measured by the data interpreter 200 are measured.
  • the linguistic data 310 may include one or more of stressed noon, pseudo-proximity, and speech intent, together with a narrow description of the actual standard pronunciation associated with the speaker's linguistic data 310. By providing the measured letters and sounds to help the speaker to correct and reinforce the language data (310).
  • FIG. 43 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually.
  • the data expression unit 500 visualizes and provides the language data 310 as one or more of a text, a picture, a picture, and an image.
  • the data analysis unit 200 measures the physics characteristics of the speaker's articulation organ by using the consonant phoneme unit index 361, syllable unit index 362, word unit index 363, and phrase unit of the database unit 350.
  • the language data index 360 is compared to one or more of the index 364 and the sentence unit index 365.
  • the language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310.
  • the intention to speak the measured character and sound to help the speaker to correct and reinforce the language data (310).
  • FIG. 44 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually.
  • the database unit 350 displays the physical characteristics of the speaker's articulation organ measured by the data analysis unit 200.
  • One or more of the phoneme unit index (361), syllable unit index (362), word unit index (363), phrase unit index (364), sentence unit index (365), continuous speech index (366) Compare with index 360.
  • the language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310.
  • by providing a character and a sound of a continuous speech unit measured at least one of the speech intent to help the speaker to correct and reinforce the language data (310).
  • 45 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data in units of continuous speech.
  • the data expression unit 500 when the data expression unit 500 visualizes the language data 310 by text and auditoryizes it with sound, the physical characteristics of the speaker's articulation organ measured by the data analysis unit 200 are measured. Consonant phoneme unit index (361), syllable unit index (362), word unit index (363), phrase unit index (364), sentence unit index (365), and continuous speech index (366) of the database unit 350. Compare the linguistic data index 360 with one or more of the phonetic high and low indexes 367.
  • the language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310.
  • the speaker may provide one or more of the utterance intentions as measured letters and sounds.
  • FIG. 46 is a diagram illustrating a confusion matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention
  • FIG. 47 is a percentage of the confusion matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention.
  • the data interpreter 200 extracts one or more features using time domain variance, frequency domain Cepstral Coefficient, and linear predictive coding coefficient in identifying the language data 310. Represents an algorithm.
  • n is the network of population
  • I the mean of a population of data that is the articulator physical characteristics collected
  • x i the data of the articulator physical characteristics collected.
  • Cepstral Coefficient is calculated by the following Equation 2 to formulate the strength of the frequency.
  • F- 1 represents an Inverse Fourrier Transform, which is an Inverse Fourier series transform
  • X (f) represents a spectrum of frequencies for a signal.
  • ANN was used to classify each data by grouping the data of the articulator physical properties according to similarity and generating prediction data.
  • the speaker can grasp the noon, proximity similarity, and intention of the utterance of his utterance in preparation for the standard utterance. Based on this, the speaker gets feedback on the contents of the utterance and continuously re-ignites the utterance.
  • this repetitive articulation data input method a large amount of articulation data of physical arts is gathered and the accuracy of ANN is increased.
  • the physical properties of the articulation organs which are input data, were selected as 10 consonants, and classified into 5 articulation positions, Bilabial, Alveolar, Palatal, Velar, and Glottal.
  • 10 consonants corresponding to the five articulation positions were pronounced 100 times in order, 1000 times in total, 50 times in total, and 500 times in total.
  • FIG. 48 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention assists a speaker in language correction and guidance through a screen.
  • the data interpreter 200 recognizes that the speaker does not speak properly [ ⁇ ] through comparison with the standard speech feature matrix 205. Then, the data expression unit 300 provided the speaker's utterance noon and similarity, and the result was only 46%. Then, the data expression unit 300 helps the speaker to pronounce [ki ⁇ ] correctly through the screen.
  • the data expression unit 300 provides Speech Guidance (Image) to intuitively show which articulation organs the speaker should manipulate.
  • Speech Guidance (Image) presented by the data expression unit 300 performs utterance correction and guidance based on the sensor unit attached to or adjacent to the articulator for ignition of the [ ⁇ ]. For example, in the case of [ki ⁇ ], [k] raises the tongue of the tongue (Tongue Body, Root) in the direction of the Velar (drug) and makes a rupture sound while making a play. Fire at / k /
  • the oral tongue sensor 110 also detects the tongue's height and frontness. In addition, when igniting [i], both ends of the lips are pulled to both cheeks. The face sensor 120 will grasp this. In the case of [ ⁇ ], the back of the tongue (Tongue Body, Tongue Root) must be lifted in the direction of Velar and snorted to ignite. Therefore, the oral tongue sensor 110 also grasps the high hypothermia and the anteroposterior tongue of the tongue.
  • the muscles around the nose and its surroundings tremble. This phenomenon can be identified by attaching the face sensor 120 around the nose.
  • FIG. 49 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention captures and captures an image of a head and neck articulation organ.
  • the imaging sensor 160 captures an appearance change of the speaker's head and neck articulation engine according to the utterance, and the data interpreter 200 changes information 161 of the speaker's head and neck articulation engine through this.
  • the head and neck facial expression change information 162 is grasped.
  • the speaker's speech characteristics 210 identified through the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 of the sensor unit 100 are also included.
  • the data analysis unit 200 will be considered together.
  • FIG. 50 is a diagram illustrating a case in which a speech intent representation system according to a fourth embodiment combines mutual information through a standard speech feature matrix.
  • the oral cavity sensor 110, the face sensor 120, the voice acquisition sensor 130, and the vocal cord sensor 140 of the sensor unit 100 grasps the speaker's speech characteristics 210
  • the imaging sensor 160 grasps the change information 161 of the head and neck articulation organ and the head and neck facial expression change information 162.
  • the data interpreter 200 combines the speech information corresponding to the change information 161 of the head and neck articulation organ and the head and neck facial expression change information 162 based on the standard speech feature matrix 205.
  • 51 is a diagram illustrating a speech intent expression system according to a fifth embodiment of the present invention.
  • the data converter 300 may include an object head and neck unit.
  • the first basis data 211 is generated among the data 320.
  • the data matching unit 600 may match the head and neck data to one or more objects 20 of the head and neck 21 and the head and neck 22 of the image object based on the first basis data 211.
  • Static base coordinates 611 are generated and matched out of 610.
  • the data converter 300 generates second basis data 221 of the head and neck data 320.
  • the data matching unit 600 may change the dynamic movement of the head and neck part that changes as one or more objects 20 of the head and neck part 21 of the image object and the head and neck part 22 of the robot object ignite based on the second basis data 221. Dynamic variable coordinates 621 are generated and matched for implementation.
  • FIG. 52 is a diagram illustrating a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on static basic coordinates
  • FIG. 53 is a fifth embodiment of the present invention.
  • FIG. 3 is a diagram illustrating static basic coordinates based on a position of a face sensor utilized by a speech intention representation system according to an example.
  • FIG. 3 is a diagram illustrating static basic coordinates based on a position of a face sensor utilized by a speech intention representation system according to an example.
  • the first basis data 211 is used to generate the static basis coordinates 611.
  • the position of the face sensor 120 is detected by using the potential difference.
  • the reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's non-ignition state have a reference position equal to (0.0), respectively.
  • This position is the static basic coordinates 611.
  • FIG. 54 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on dynamic basic coordinates
  • FIG. 55 is a fifth embodiment of the present invention.
  • FIG. 4 is a diagram illustrating dynamic basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an example.
  • FIG. 4 is a diagram illustrating dynamic basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an example.
  • the data matching unit 600 is attached to the head and neck of the speaker to match the object head and neck data 320 to the head and neck 21 of the image object, and thus the head and neck muscles according to the speaker's speech.
  • the dynamic variable coordinates 621 are generated using the second basis data 221 which is the potential difference of the face sensor 120 by the operation of.
  • the face sensor 120 measures the electromyography of the head and neck moving in accordance with the utterance of the speaker to determine the physical characteristics of the head and neck articulation.
  • the reference sensor 121, the positive electrode 122, and the negative electrode sensor 123 of the face sensor 120 attached in the speaker's ignition state detect the EMG of the head and neck muscles that change according to the utterance, respectively (0, -1). ), (-1, -1), and (1, -1)). This position becomes the dynamic variable coordinate 621.
  • FIG. 56 illustrates a case in which the speech intent expression system according to the fifth embodiment of the present invention matches object head and neck data to an actuator of a head and neck part of a robot object based on static basic coordinates
  • FIG. 57 is a view illustrating an embodiment of the present invention.
  • 5 is a diagram illustrating static basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an exemplary embodiment.
  • the static basis coordinates 611 are generated using the first basis data 211, which is the position of 120.
  • the position of the face sensor 120 is detected by using the potential difference.
  • the reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's non-ignition state are respectively (0.0) in the actuator 30 of the head and neck portion 22 of the robot object. It will have the same reference position as This position is the static basic coordinates 611.
  • FIG. 58 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to an actuator of the head and neck portion of a robot object based on dynamic variable coordinates
  • FIG. 59 is a view illustrating an embodiment of the present invention.
  • FIG. 5 is a diagram illustrating dynamic variable coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to an exemplary embodiment.
  • the data matching unit 600 is attached to the head and neck of the speaker to match the object head and neck data 320 to the actuator 30 of the head and neck 22 of the robot object.
  • the dynamic variable coordinates 621 are generated using the second basis data 221 which is the potential difference of the face sensor 120 caused by the action of the head and neck muscles due to the utterance.
  • the face sensor 120 measures the electromyography of the head and neck moving in accordance with the utterance of the speaker to determine the physical characteristics of the head and neck articulation.
  • the reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's ignition state grasp the EMG of the head and neck muscles that change according to the utterance, and thus, the head and neck (
  • the actuator 30 of 22) has variable positions such as (0, -1), (-1, -1), and (1, -1), respectively, to move accordingly. This position becomes the dynamic variable coordinate 621.
  • FIG. 60 and 61 are views showing the operation of the actuator of the head and neck portion of the robot object of the speech intent representation system according to the fifth embodiment of the present invention
  • Figure 62 is a speech intent representation according to a fifth embodiment of the present invention It is a figure which shows the actuator of the head and neck of the robot object of a system.
  • At least one actuator of the head and neck portion 22 of the robot object may be used by the data matching unit 600 to obtain the object head and neck data 310 obtained from the data interpreter 200 and the data converter 300.
  • the actuator 30 is an artificial musculoskeletal structure of the head and neck portion 22 of the robot object, and may be driven by a motor including a DC motor, a step motor, and a servo motor, and may be pneumatic or hydraulic. Can be protruded and embedded and operated in a manner. Through this, the actuator 30 may implement various dynamic movements of one or more of articulation, speech, and facial expression of the head and neck part 22 of the robot object.
  • the actuator 30 may be driven by a motor including a DC motor, a step motor, and a servo motor, and operated in a pneumatic or hydraulic manner, thereby allowing contraction or relaxation in tension. It is characterized by.
  • the actuator 30 may be located at the head and neck 22 of the robot object.
  • the sensor unit 100 may include the following.
  • Pressure sensor MEMS sensor, Piezoelectric method, Piezoresistive method, Capacitive method, Pressure sensitive rubber method, Force sensing resistor (FSR) method, Inner particle deformation method, Buckling measurement method.
  • FSR Force sensing resistor
  • Friction sensor Micro hair array method, friction temperature measurement method.
  • Electrostatic sensor electrostatic consumption, electrostatic generation.
  • Electric resistance sensor DC resistance measurement method, AC resistance measurement method, MEMS, Lateral electrode array method, Layered electrode method, Field Effect Transistor (FET) method (Organic-FET, Metal-oxide-semiconductor-FET, Piezoelectric-oxide) -semiconductor -FET etc.).
  • FET Field Effect Transistor
  • Tunnel Effect Tactile Sensor Quantum tunnel composites, Electron tunneling, Electroluminescent light.
  • Thermal resistance sensor thermal conductivity measurement method, thermoelectric method.
  • Optical sensor light intensity measurement, refractive index measurement.
  • Magnetism based sensor Hall-effect measurement method, Magnetic flux measurement method.
  • Ultrasonic based sensor Acoustic resonance frequency method, Surface noise method, Ultrasonic emission measurement method.
  • Soft material sensors rubber, powder, porous materials, sponges, hydrogels, aerogels, carbon fibers, nanocarbon materials, carbon nanotubes, graphene, graphite, composites, nanocomposites, metal-polymer composites, ceramic-polymer composites Pressure, stress, or strain measurement method using materials such as conductive polymers, Stimuli responsive method.
  • Piezoelectric sensor Ceramic materials such as quartz, lead zirconate titanate (PZT), polymer materials such as PVDF, PVDF copolymers, and PVDF-TrFE, and nanomaterials such as cellulose and ZnO nanowires.
  • PZT lead zirconate titanate
  • polymer materials such as PVDF, PVDF copolymers, and PVDF-TrFE
  • nanomaterials such as cellulose and ZnO nanowires.

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Toys (AREA)

Abstract

The present invention comprises: a sensor part for measuring the physical characteristics of an articulator while being adjacent to one surface of a speaker's head and neck; a data analysis part for identifying speech characteristics of the speaker on the basis of the position of the sensor part and the physical characteristics of the articulator; a data conversion part for converting the position of the sensor part and the speech characteristics into language data; and a data expression part for externally expressing the language data, wherein the sensor part comprises a mouth and tongue sensor corresponding to the mouth and tongue.

Description

두경부 조음기관의 물리 특성을 이용한 발화 의도 표현 시스템 Speech Intent Representation System Using Physical Characteristics of Head and Neck Articulator
본 발명은 조음 센서를 통해 구강설을 포함한 두경부의 조음기관의 물리 특성을 인지하여 두경부 전반의 발화에 따른 변화를 측정하고 이를 통해 발화 의도를 파악하여, 시각, 청각, 촉각화를 통해 발화 의도를 화자 본인 내지 외부로 제공하고, 발화 의도를 영상 내지 로봇의 두경부에 전이하여 표현하는 시스템에 관한 것이다.The present invention is to recognize the physical characteristics of the articulation organs of the head and neck, including the oral cavity through the articulation sensor to measure the change according to the overall ignition of the head and neck and grasp the intention to speak through this, visual, auditory, tactile The present invention relates to a system for providing to the speaker himself or the outside and transferring the intention of uttering the image to the head and neck of the robot.
조음기관에서 생성되는 문자가 언어학적 정보전달인 의사소통을 위한 경우에는 발화 혹은 언어음으로 불리며 비언어학적인 경우에는 발성으로 불린다. Characters produced in articulatory organs are called speech or verbal sounds for the communication of linguistic information and vocalization in non-verbal cases.
문자의 생성에 관여하는 인체의 주요한 기관은 신경계통과 호흡기계통이다.The main organs of the human body involved in the production of the letters are the nervous system and the respiratory system.
신경계통은 중추신경계와 말초신경계가 관여하는데 중추신경 중 뇌간에는 언어의 생성에 필요한 두개골 혹은 뇌신경 세포핵이 위치하며 소뇌는 동작에 대한 근육의 제어를 정밀하게 조율하는 기능이 있으며, 대뇌의 반구는 언어기능에 지배적인 역할을 한다. 언어음 생성을 위해 관여하는 두개골 신경에는 턱의 움직임을 관여하는 제 5 뇌신경, 입술운동에 관여하는 제 7 뇌신경, 인두 및 후두의 운동에 관여하는 제 10 뇌신경, 인두의 운동에 관여하는 제 11 뇌신경, 그리고 혀의 운동에 관여하는 제 12 신경 등이 있다. 말초신경 중에는 특히 미주신경에서 분지되는 상후두신경과 반회후두신경이 후두운동에 직접 관여하게 된다.The nervous system is involved in the central nervous system and the peripheral nervous system. In the central nervous system, the cranial or cranial cell nuclei are located in the brain stem, and the cerebellum has the function of precisely controlling the muscle control for movement. Play a dominant role in function The cranial nerve involved in speech production includes the fifth cranial nerve involved in jaw movement, the seventh cranial nerve involved in lip movement, the tenth cranial nerve involved in pharynx and larynx, and the eleventh cranial nerve involved in pharyngeal movement. And the 12th nerve involved in the movement of the tongue. Among the peripheral nerves, especially the laryngeal nerves and the recurrent laryngeal nerves branched from the vagus nerve are directly involved in laryngeal movement.
또한 언어음은 하부 호흡기계, 후두와 성도가 상호 밀접하게 작용하여 생성된다. 성대는 문자의 근원으로, 폐로부터 송출되는 호기의 흐름이 성대를 진동시키고 발성 시 호기조절은 소리 에너지를 적절히 능률적으로 공급한다. 성대가 적당히 긴장하여 폐쇄되면 호기에 의해 성대가 진동하고 성문을 일정한 주기로 개폐시켜 성문을 통과하는 호기류를 단속하는데 이 호기의 단속류가 문자의 음원이다.Speech is also produced by the lower respiratory tract, the larynx, and the vocal tract. The vocal cords are the source of letters, and the flow of exhalation from the lungs causes the vocal cords to vibrate, and during vocalization, the aerobic control provides an efficient and efficient supply of sound energy. When the vocal cords are properly tensioned and closed, the vocal cords vibrate due to exhalation, and the gates are opened and closed at regular intervals to control the expiratory streams that pass through the gates.
사람이 의사소통을 목적으로 말을 사용하기 위해서는 여러 가지 생리적인 과정을 거쳐야 한다. 조음과정은 발성된 소리가 공명과정을 거쳐 증폭 및 보완된 후, 말소리의 단위인 음소를 형성해 가는 과정을 의미한다. In order for a person to use a horse for communication, it must go through several physiological processes. The articulation process refers to a process of forming phonemes, which are units of speech sounds, after the sound is amplified and supplemented through the resonance process.
조음기관으로는 혀가 가장 중요하게 생각하지만, 실제로 음소를 만드는 데는 혀뿐 아니라 구강 및 안면의 여러 가지 구조들이 관여한다. 이러한 조음기관에는 혀, 입술, 여린입천장(연구개, soft palate), 턱 등과 같이 움직일 수 있는 구조와 치아나 굳은입천장(경구개, hard palate)과 같이 움직일 수 없는 구조들이 포함된다. 이러한 조음기관들이 공기의 흐름을 막거나 제약하여 자음과 모음을 형성하게 되는 것이다.The tongue is the most important articulator, but in fact, the phoneme involves not only the tongue but also various structures of the mouth and face. These articulators include movable structures such as the tongue, lips, soft palate, jaws, and immovable structures such as teeth or hard palates. These articulators block or restrict the flow of air to form consonants and vowels.
첫 번째 조음기관으로서의 혀는 그 부위들이 뚜렷한 경계선을 나타내지 않기 때문에 구별하는 것이 쉽지는 않으나 기능적인 측면에서 혀의 외부구조를 구별하는 것은 정상적인 조음뿐 아니라 병리적인 조음을 이해하는데 도움이 된다. 혀는 앞에서부터 혀끝(apex, tip), 혀날(blade), 혀등(dorsum), 혀몸통(body), 그리고 혀뿌리(root)로 나눌 수 있다. 혀끝을 우리가 혀를 뾰족하게 내밀거나 음절의 첫소리로 오는 /ㄹ/(예: "라라라")를 조음할 때 사용되는 부위이고, 혀날은 잇몸소리(치조음 alveolar sounds)와 같은 입의 앞쪽에서 만드는 음소들을 조음할 때 주로 사용되며, 혀등은 여린입천장소리(연구개음 velar sounds)와 같은 뒷소리 음소들을 조음할 때 주로 사용되는 혀의 부분이다. It is not easy to distinguish the tongue as the first articulator because its parts do not show distinct boundaries, but in terms of function, it is helpful to understand pathological articulation as well as normal articulation. The tongue can be divided from the front into the apex (tip), the blade (blade), the dorsum, the body of the tongue, and the root of the tongue (root). The tip of the tongue is the part used when we point out the tongue or articulate / d / (such as "Larara"), which is the first sound of the syllable, and the tongue is made from the front of the mouth, such as alveolar sounds. It is mainly used to articulate phonemes, and the tongue is the part of the tongue that is commonly used to articulate back sounds such as velar sounds.
두 번째 조음기관으로서의 입술은 입의 입구를 이루는 부분으로 두경부 표정이나 조음에 중요한 기능을 한다. 특히 여러 가지 모음들은 혀의 움직임뿐만 아니라 입술의 모양에 의하여 음소가 구별되며, 두입술자음(양순자음 bilabial sound)들은 입술이 닫혀져야만 발음될 수 있다. 입술의 모양은 주변의 근육들에 의하여 변형된다. 예를 들어, 입술 주변을 둘러싸고 있는 입둘레근(구륜근 orbicularis oris muscle)은 입술을 다물거나 오므라들게 하여 두입술자음이나 /우/와 같은 원순모음들 발음하는 데 중요한 역할을 하며, 윗입술올림근(quadratus labii superior muscle)과 아랫입술내림근(quadrates labii inferior muscle)은 입술을 열게 한다. 또한, 입꼬리당김근(소근 risorius muscle)은 입술의 모서리를 잡아당겨 미소를 짓거나 입술을 수축시켜서 발음해야 하는 /이/와 같은 소리를 낼 때 중요한 역할을 한다. The lips, which are the second articulators, form the mouth of the mouth and play an important role in facial expression and articulation of the head and neck. In particular, the vowels are distinguished by phonemes as well as the movement of the tongue. The bilabial sounds can be pronounced only when the lips are closed. The shape of the lips is modified by the surrounding muscles. For example, the circumference of the mouth around the lips (orbicularis oris muscle) plays an important role in pronounced lip vowels such as head lip consonants and / right / by closing or pinching the lips. Quadratus labii superior muscle and quadrates labii inferior muscle open the lips. In addition, the risorius muscle plays an important role in pulling the corners of the lips and smiling or contracting the lips to produce sounds like /.
세 번째 조음기관으로서의 턱과 치아 중, 턱은 움직이지 않는 위턱(상악 maxilla)과 상하 및 좌우 운동을 하는 아래턱(하악 mandible)으로 구분된다. 이들 턱은 얼굴 뼈 중에서 가장 튼튼하고 큰 뼈로서 4쌍의 근육들에 의해서 움직인다. 아래턱의 움직임은 입안의 크기를 변화시키기 때문에 씹기뿐 아니라 모음산출에 있어서도 중요하다. Among the jaw and teeth as the third articulator, the jaw is divided into the immobilized upper jaw (maxilla) and the lower jaw (mandible) which moves up and down and left and right. These jaws are the strongest and largest of the facial bones and are driven by four pairs of muscles. The movement of the lower jaw changes the size of the mouth, which is important not only for chewing but also for vowel production.
네 번째 조음기관으로서의 잇몸 및 굳은입천장 중, 잇몸은 /ㄷ/나 /ㅅ/계열의 말소리들이 조음되는 부위이며, 굳은 입천장은 잇몸 뒤의 단단하고 다소 편편한 부분으로 /ㅈ/계열의 소리들이 조음되는 부위이다. Among the gums and the hard palate as the fourth articulator, the gum is the area where the / c / or / s / speech sounds are articulated, and the hard palate is the hard and rather flat part of the gums where the sound of the / ㅈ / series is articulated Site.
마지막 조음기관으로서의 여린입천장은 움직이는 조음기관으로 분류되는데, 이는 여린입천장의 근육들이 수축함으로써 연인두폐쇄를 이루고 그에 따라 입소리들(oral sounds)을 조음하기 때문이다. As the last articulation organ, the Yeongrin Palate is classified as a moving articulator because the muscles of the Yeongrin Palate contract and form a closed lover's head and thus oral sounds.
<조음과정>Articulation
소리들 중에는 성대를 거친 기류가 성도를 통과하는 동안 구강에서, 더 정확히 말하면 구강 통로의 중앙부에서 어떠한 방해(장애)를 받으면서 생성되는 것과, 이와는 달리 아무런 방해를 받지 않고 생성되는 것이 있다. 보통 전자를 자음(consonant) 후자를 모음(vowel)이라고 한다. Among the sounds are those which are produced with some obstruction in the oral cavity, or more precisely in the middle of the oral passage, while the air passage through the vocal cords passes through the saints. The former is usually called the consonant vowel.
1) 자음의 조음1) consonant articulation
자음은 발성되는 방법과 위치에 따라 살펴보아야 하는데 국제문자기호표상에서 각 칸은 조음위치를, 각 줄은 조음방법을 각각 나타내고 있다. Consonants should be examined according to how and where they are spoken. Each column represents the articulation position and each line represents the articulation method.
우선 조음방법에 따라 분류해 본다면, 기류가 중앙부에서 어떤 종류의 방해를 받아서 조음되는가에 따라서 다막음 소리와 덜막음 소리로 크게 나누어 볼 수 있다. 다막음 소리는 구강에서 기류를 완전히 막았다가 터트리면서 내는 소리이고, 덜막음 소리는 성도의 한 부분을 좁혀서 그 좁아진 통로로 기류를 통과시켜 내는 소리이다. First of all, according to the articulation method, depending on what kind of disturbances are caused by the air flow in the middle, it can be divided into the sound of clogging and the sound of silence. Clogging sound is the sound that completely blocks and blows the airflow in the oral cavity. Clogging sound is the sound of narrowing a part of the saint and passing the airflow through the narrow passage.
다막음 소리는 다시 비강의 공명을 동반하고 나는 소리와 동반하지 않고 나는 소리로 나눌 수 있다. 성도의 일부를 완전히 막음과 동시에 연구개를 내려 비강 통로를 열고 비강의 공명을 동반하면서 내는 비강 다막음 소리(비강 폐쇄음, nasal stop)들이 전자에 속하며, 연구개를 올려 인두벽에 대고 비강 통로를 차단하여, 기류가 비강으로 통하는 것을 막은 상태로 내는 구강 다막음 소리(구강 폐쇄음, oral stop)들이 후자에 속한다. 구강 다막음 소리는 폐쇄의 길이와 방법에 따라서 폐쇄음(막음소리, stop) 혹은 파열음(터짐소리, plosive), 전동음(떨소리, trill), 탄설음(혹을 설탄음, flap/tap)으로 생각해 볼 수 있다. The sound of clogging can again be divided into nasal resonances and non-acoustic sounds. At the same time, the nasal stops (nasal stops), which are accompanied by the nasal passages and the nasal passages, are included in the former, and the nasal stops are raised against the pharyngeal wall. The latter are the oral blockage sounds (oral stops) that block and prevent airflow from reaching the nasal passages. Depending on the length and method of occlusion, oral clog sounds are considered to be closed (stop) or ruptured (plosive), electric (trill), and snowballs (or flap / tap). can see.
그리고 덜막음 소리는 마찰음(갈이소리, fricative)과 접근음(approximant)으로 나누는데, 기류의 통로가 혀의 측면에 만들어지는 경우 이를 통틀어 설측음(lateral)이라고 한다. The mute sound is divided into a fricative (approach) and an approach sound (approximant). When a passage of air flow is formed on the side of the tongue, it is called lateral sound.
또한 다막음과 덜막음의 조음방법을 복합적으로 사용하는 파찰음(터짐갈이, affricate)이 있으며, 마지막으로 알파벳으로는 "r"이나 "l"로 표현되나 국어의 경우 /ㄹ/로 표현되는 유음(liquid)과 국어에는 없지만 조음기관을 진동시켜서 소리를 말하는 전동음이 있다. In addition, there is a wave sound (affricate) that uses a combination of multi-block and unblocked articulation methods. Finally, it is expressed as "r" or "l" in the alphabet, but it is expressed as / ㄹ / in Korean. (liquid) and not in Korean, but there is an electric tone that vibrates the articulation organs.
조음위치에 따라 분류해보면, 양순음(bilabial)이란, 두 입술이 그 조음에 관계하는 소리를 지칭하는 것으로, 한국어의 /ㅂ, ㅃ, ㅍ, ㅁ/등이 이에 속한다. 현대 한국어(표준어)에 존재하는 양순음들은 모두 두 입술을 막아서 내는 소리들이지만, 두 입술의 간격을 좁혀서 그 사이로 기류를 마찰시켜 낼 수도 있으며(양순 마찰음) 두 입술을 떨어서 낼 수도 있다(양순 전동음). When classified according to the position of articulation, bilabial refers to the sound of two lips related to the articulation, and Korean / ㅂ, ㅃ, ッ, ㅁ / and the like belong to this. The Yangpyeon in modern Korean (standard) is a sound that blocks both lips, but it can also narrow the gap between the two lips, rubbing the airflow between them (sheep friction), and dropping both lips (sheep rolling) .
순치음(labiodentals)이란 아랫입술과 윗니가 조음에 관계하는 소리를 지칭하는 것으로 한국어에는 존재하지 않는다. 한국어에는 순치음이 없지만, 영어에 있는[f, v]가 바로 이 순치음(순치 마찰음)에 속한다. Labiodentals refer to sounds related to articulation of the lower lip and upper teeth, and do not exist in Korean. There is no pure sound in Korean, but [f, v] in English belongs to this pure sound.
치음(dental)은 기류의 협착이나 폐쇄가 윗니의 뒷부분에서 일어나는 소리를 말하는데, 이 사이에서 마찰이 이루어지기도 해서 치간음(interdental)이라고도 한다. Dental (dental) refers to the sound that the airflow narrows or closes at the back of the upper teeth, and is sometimes called interdental because of friction between them.
치경음(alveolar)은 윗잇몸 부근에서 기류의 협착이나 폐쇄가 일어나면서 나는 소리로 한국어의 /ㄷ, ㄸ, ㅌ, ㄴ, ㅆ, ㅅ/등이 이에 속한다. 한국어의 /ㅅ, ㅆ/는 치경 부분에서 기류의 협착이 이루어져 나는 소리로 영어의 /s, z/와 기류의 협착이 이루어지는 장소가 거의 비슷하다. Alveolar is a sound produced by the constriction or closure of air currents near the upper gums, which belong to the Korean ㄷ, ㄸ, ㅌ, ,, ㅆ, ㅅ /. In Korean, / ㅅ, 는 / are the sounds of airflow constriction in the alveolar area, and / s, z / in English and the airflow constriction are almost similar.
경구개치경음(palatoalveolar)은 후치경음(postalveolar)이라고도 불리는데, 혀끝이나 혓날이 후치경부에 닿아서 나는 소리로 국어에는 존재하지 않지만, 영어나 불어에는 존재한다. Palatoalveolar, also known as postalveolar, is the sound of the tip of the tongue or forearm touching the posterior neck, not in Korean, but in English or French.
치경경구개음(alveolopalatal)은 전경구개음(prepalatal)이라고도 불리는데, 이 소리가 경구개의 앞쪽 즉 치경과 가까운 쪽에서 조음되기 때문이다. 국어의 세 파찰음 /ㅈ, ㅊ, ㅉ/가 이에 속한다. Alveolopalatal is also called prepalatal because it is articulated in front of the palatal or near the alveolar. The three patters of the Korean language, / ㅈ, ㅊ and ㅉ /, belong to this.
권설음(retroflex)은 혀끝이나 혀의 위 표면이 입천장에 닿거나 접근하여서 조음되는 여타의 설음들과는 달리 혀의 아래 표면이 입천장에 닿거나 접근하여서 조음된다는 점에서 뚜렷한 차이가 있다. Retroflex differs from other tongues where the tip of the tongue or the upper surface of the tongue is articulated by touching or approaching the palate, and the lower surface of the tongue is articulated by touching or approaching the palate.
경구개음(palatal)은 혓몸이 경구개부에 닿거나 접근하여 조음되는 소리를 말한다. A palatal sound refers to the sound that the body touches or approaches an oral palatal articulation.
연구개음(velar)은 혓몸이 연구개부에 닿거나 접근하여 조음되는 소리를 말한다. 국어의 폐쇄음/ㄱ, ㅋ, ㄲ/와 비음 /ㅇ/이 이에 속한다. Velar is the sound that the body touches or approaches the research site. Korean closed sound / ㄱ, ㅋ, ㄲ / and nasal / ㅇ / belong to this.
구개수음(uvular)은 혓몸이 연구개의 끝부분인 구개수에 닿거나 접근하여 조음되는 소리를 말한다.The palatal masturbate (uvular) refers to the sound that the body touches or approaches the palate, the tip of the study palate.
인두음(pharyngeal)은 그 조음이 인두강에서 이루어지는 음을 지칭한다.  Pharyngeal refers to the sound that the articulation is made in the pharyngeal cavity.
마지막으로 성문음(glottal)은 성대가 조음기관으로 사용되어 조음되는 소리를 지칭하며 우리말에는 음소로서 성문 무성 마찰음 /ㅎ/만이 존재한다.Lastly, the glottal refers to the sound that the vocal cords are used as an articulation organ, and there is only a voiceless vocal friction / ㅎ / as a phoneme in Korean.
2) 모음의 조음2) articulation of vowels
모음의 조음은 혀의 고저와 전후 위치, 그리고 입술의 모양 등 세가지가 가장 중요한 변수로 작용한다. Vowel articulation is the three most important variables, such as the height of the tongue, the position of the front and back, and the shape of the lips.
첫 번째 변수로, 혀의 고저에 의하여 모음의 개구도, 즉 입을 벌린 정도가 결정되는데, 입을 적게 벌리고 내는 소리를 폐모음(close vowel), 혹은 고모음(high vowel)이라고 하며, 입을 크게 버리고 내는 소리를 개모음(open vowel), 혹은 저모음(low vowel)이라고 한다. 그리고 고모음과 저모음의 사이에서 나는 소리를 중모음(mid vewel)이라고 하는데, 이중모음은 다시 입을 벌린 정도가 더 작은 중고모음(close-mid vowel), 혹은 반폐모음(half-close vewel)과 입을 벌린 정도가 더 큰 중저모음(open-mid vewel), 혹은 반개모음(half-open vewel)으로 세분할 수 있다. In the first variable, the opening of the vowel, or the degree of opening of the vowel, is determined by the height of the tongue.The sound of opening the mouth less is called a close vowel, or a high vowel, and the sound of throwing away the mouth greatly. Is called open vowel, or low vowel. The sound between the high vowels and the low vowels is called the mid vowel, which is the close-mid vowel, or half-close vewel, with the mouth open again. It can be subdivided into larger open-mid vewels or half-open vewels.
두 번째 변수인 혀의 전후 위치란 사실 혀의 어느 부분이 가장 좁혀졌는가, 다시 말해서 혀의 어느 부분이 입천장과 가장 가까운가를 기준으로 앞뒤를 따지는 것이다. 그 좁아진 부분이 혀의 앞쪽에 있는 모음을 전설모음(front vowel), 뒤쪽에 있는 모음을 후설모음(back vowel)이라고 하며, 그 중간쯤에 있는 모음을 중설모음(central vowel)이라고 한다. The second variable, the front and rear position of the tongue, is in fact determined based on which part of the tongue is the narrowest, that is, which part of the tongue is closest to the palate. The narrow part of the tongue is called the front vowel, the back vowel is called the back vowel, and the middle vowel is called the central vowel.
마지막으로 모음의 조음에서 중요한 변수가 되는 것은 입술의 모양이다. 조음 시 입술이 동그랗게 모아져 앞으로 튀어나오는 모음을 원순모음(rounded vowel)이라고 하고, 그렇지 않은 모음을 평순모음(unrounded vowel)이라고 한다.Finally, the most important variable in vowel articulation is the shape of the lips. When the articulation is rounded vowels are rounded vowels, rounded vowels are called rounded vowels.
발화 장애란 음도, 강도, 음질, 유동성이 성별, 연령, 체구, 사회적 환경, 지리적 위치에 적합하지 않은 것을 이야기 한다. 이는 선천적으로 혹은 후천적으로 만들어 질 수 있으며, 수술을 통해 후두의 일부분인 성대를 늘이거나 줄여 어느 정도 치료하는 것이 가능하다. 하지만 완벽한 치료는 되지 않으며, 그 효과 또한 정확하다고 할 수 없다. Speech impairment refers to the inability of soundness, intensity, sound quality, and fluidity to be appropriate for gender, age, size, social environment, and geographical location. It can be made either innately or acquiredly and can be treated to some extent by increasing or decreasing the vocal cords that are part of the larynx. However, the treatment is not perfect, and the effect is not accurate.
이러한 후두의 기능으로는 삼킴, 기침, 폐색, 호흡, 발성 등의 기능을 가지고 있으며, 이를 위한 다양한 평가 방식(ex. 발화 내역 검사, 발화패턴, 음향학적 검사, 공기역학적 검사...)이 있다. 이러한 평가를 통해 발화 장애의 여부를 어느 정도 판단할 수 있다.The larynx functions include swallowing, coughing, occlusion, breathing, and vocalization, and there are various evaluation methods (e.g., firing history test, speech pattern, acoustic test, aerodynamic test ...). . This assessment can be used to determine whether or not there is a speech impairment.
발화 장애의 유형도 다양하며 크게 기능적 발화장애와 기질적 발화장애로 나뉘게 된다. 이러한 유형의 대부분은 후두의 일부분인 성대에 이상이 생기는 경우가 많으며, 이러한 성대가 외부의 환경적 요인으로 인해 부어오름, 찢어짐, 이상 물질의 발생 등에 의해 장애가 오는 경우가 많다.Different types of speech disorders are classified into functional speech disorders and organic speech disorders. Most of these types have abnormalities in the vocal cords, which are part of the larynx, and these vocal cords are often disturbed due to swelling, tearing, and abnormal substances caused by external environmental factors.
이러한 성대의 기능을 대신하기 위해 인위적으로 진동을 발생시킬 수 있는 진동발생기를 이용할 수 있다. 진동발생기의 방법은 스피커의 원리를 사용할 수 있는데 스피커의 구조를 보면, 자석과 코일로 이루어져 있으며, 이러한 코일에 전류를 흘려주는 상태에서 전류의 방향을 반대로 하면 자석의 극이 반대로 바뀌게 된다. 따라서 자석과 코일의 전류의 방향에 따라 인력과 척력이 작용하게 되고, 이는 코일의 왕복운동을 발생시킨다. 이러한 코일의 왕복운동이 공기를 진동하여 진동을 발생시킨다.In order to replace the function of the vocal cords, a vibration generator capable of artificially generating vibrations may be used. The method of the vibration generator can use the principle of the speaker. Looking at the structure of the speaker, it consists of a magnet and a coil, and when the current is reversed in the state of flowing a current to the coil, the pole of the magnet is reversed. Therefore, attraction and repulsive force act according to the direction of the current of the magnet and the coil, which causes the coil to reciprocate. The reciprocating motion of the coil vibrates the air to generate vibration.
다른 방식으로 압전 현상을 이용한 방식이 있는데 압전 결정 유닛이 저주파 신호 전압을 받아서 일그러짐을 발생하고, 그에 의해서 진동판이 진동하여 음향을 발행하도록 만들 수 있다. 따라서 이러한 원리들을 이용한 진동발생기를 이용하여 성대의 기능을 수행하도록 할 수 있다. Another method using the piezoelectric phenomenon is that the piezoelectric crystal unit receives a low frequency signal voltage and causes distortion, thereby causing the diaphragm to vibrate to generate sound. Therefore, the vibration generator using these principles can be performed to perform the function of the vocal cords.
하지만 이러한 방법의 경우 외부의 위치하여 단순히 성대를 진동시켜 주는 기능에 불과하기 때문에 나타나는 음이 매우 부정확할 뿐 아니라 화자의 말하기 의도를 파악하는 것이 쉽지 않다. 또한 진동 발생기를 가지고 성대에 위치하여 항상 소지해야 되며 말할 때는 한 손을 이용하기 때문에 일상생활에 어려움을 준다. 전술한 발화 장애와 이러한 발화 이상에 대해서는 후두나 성대의 일부를 수술하는 등의 치료적 방법을 모색할 수 있으나, 이러한 수술 방법이나 치료가 불가능한 경우가 있어서 완전한 해결책이 되지 못하고 있다.However, in this case, the sound that appears outside is simply a function of vibrating the vocal cords, and it is not easy to identify the speaker's intention. In addition, the vibration generator should be located at the vocal cords and always have a hand. The above-mentioned utterance disorder and the above-described utterance abnormalities can be sought for therapeutic methods such as surgery on the larynx or vocal cords, but such surgical methods or treatments are sometimes impossible to provide a complete solution.
특히 관련 업계에 있어서는 유럽 및 홍콩을 구심점으로 WinEPG, Articulate Instruments Ltd 등의 회사에서 사용 중인 University of Reading, 일본의 Fujimura, Tatsumi가 1973년에 개발하여 Rion 이라는 회사 이름으로 널리 상용화 시킨 The Rion EPG, Flecher이 출원하여 UCLA Phonetics Lab이 연구목적으로 개발하여 사용하는 Kay Palatometer, Schmidt가 개발하여 Complete Speech(Logomertix) 등이 있다.The Rion EPG, Flecher, which was widely commercialized under the name of Rion in 1973 by the University of Reading, Fujimura, Japan, and Tatsumi, used by companies such as WinEPG and Articulate Instruments Ltd. This application includes Kay Palatometer, developed by UCLA Phonetics Lab for research purposes, and Complete Speech (Logomertix), developed by Schmidt.
그러나 종래의 기술들은 수동적 조음기관을 기반으로 발화하는 것에 한계가 있으며, 능동적 조음기관 자체인 구강설을 이용하거나, 구강설과 다른 조음기관과의 연계성에 의한 실제 조음 방식에 따른 발화를 구현하는 데 명확한 한계가 있었다.However, the conventional techniques have a limitation in igniting based on passive articulation organs, and in order to implement utterances according to the actual articulation method by using oral tongue, which is the active articulation organ itself, or linkage between oral tongue and other articulation organs. There was a definite limit.
기존에 상태 변화나 움직임을 파악하기 위한 다양한 센서가 개발되어 있으며, 센서를 바탕으로 압력, 온도, 거리, 마찰 등의 변화를 파악하는 것이 가능하다.Various sensors have been developed to detect changes in state or movement. Based on the sensors, it is possible to detect changes in pressure, temperature, distance, and friction.
더불어, 발화 및 표정 동기화(Lip Sync)는, 대상 내지 객체의 아이덴티티를 결정하는 가장 중요한 요소인 말하는 음성 및 조음을 포함하는 발화, 표정 등을 캐릭터, 로봇, 다양한 전자제품, 자율주행 운송수단 등에 복제 적용하여 개인의 아이덴티티를 결정하고 확장하는 핵심 수단이다. 특히, 고품질의 Lip Sync 에니메이션을 만들기 위해서 전문 에니메이션 팀이 작업하므로, 높은 비용 및 많은 시간을 필요로 하고 대량의 작업에 어려움이 존재한다. 종래의 일반적인 기술은 단순한 입술 모양 라이브러리를 사용하여 저급한 에니메이션을 생성하는 수준에 국한되었다. Pixar나 Disney와 같은 해외의 에니메이션 콘텐츠 제작사들은 Lip Sync를 통한 실감나는 캐릭터 에니메이션을 생성하는데 많은 비용과 시간을 투입하는 실정이다.In addition, speech and facial synchronization (Lip Sync) is to copy the speech, facial expressions, etc., including speech and articulation, which are the most important factors that determine the identity of the object or object, to characters, robots, various electronic products, autonomous vehicles, etc. Application is a key means of determining and extending an individual's identity. In particular, since professional animation teams work to create high-quality Lip Sync animations, high cost and time are required, and a large amount of work is difficult. Conventional general techniques have been limited to the use of simple lip library to create low quality animations. Overseas animation content producers such as Pixar and Disney spend a lot of time and money creating realistic character animations through Lip Sync.
본 발명은 전술한 문제점을 해결하기 위하여 제안된 것으로, 본 발명의 목적은, 사용자의 발화 의도에 따른 사용자의 조음 방식을 구강설을 포함한 두경부의 센서를 통해 파악하고, 이를 청각, 시각, 촉각의 형태로 나타내어 양호한 품질의 음성 형성, 즉 발성이 표출될 수 있는 발화 보완용 기기 및 그 방법을 제공하는 것이다. The present invention has been proposed in order to solve the above-mentioned problems, an object of the present invention is to grasp the user's articulation method according to the user's utterance through the sensor of the head and neck including the oral cavity, it is the hearing, visual, tactile It is to provide a device and a method for complementing the speech that can be expressed in the form of a good quality voice, that is, speech can be expressed.
본 발명의 다른 목적은, 발화에 있어서 정상적인 기능을 수행하지 못하고 교정이나 치료가 불가능한 경우에 양질의 적절한 발화를 구현하는 것이다. Another object of the present invention is to implement an appropriate utterance of good quality when the normal function in the utterance and correction or treatment is impossible.
본 발명의 또 다른 목적은, 발화를 위한 조음 의도에 따라 사용자가 원하는 정도의 정확한 발화가 외부로 표출될 수 있는 내/외부에 위치한 발화 보완용 기기 및 그 제어 방법을 제공하는 것이다. It is still another object of the present invention to provide a device for supplementing a utterance located inside / outside so that an accurate utterance of a desired degree can be expressed to the outside according to the articulation intention for utterance.
본 발명의 또 다른 목적은, 사용자의 발화 의도에 따른 사용자의 조음 방식을 구강설을 포함한 두경부의 센서를 통해 파악하고, 이를 애니메이션을 포함하는 영상 객체의 두경부에 맵핑시켜 해당 영상 객체의 발화 및 표정을 보다 인간과 흡사하고 자연스럽게 구현하는 방법을 제공하고자 하는 것이다. Still another object of the present invention is to grasp the user's articulation method according to the user's utterance intention through the sensor of the head and neck including the oral cavity, and to map it to the head and neck of the image object including the animation utterance and expression of the image object It is intended to provide a way to implement more similar to humans and naturally.
본 발명의 또 다른 목적은, 사용자의 발화 의도에 따른 사용자의 조음 방식을 구강설을 포함한 두경부의 센서를 통해 파악하고, 이를 휴머노이드를 포함하는 로봇의 두경부의 액츄에이터에 맵핑시켜 해당 로봇의 두경부의 발화 및 표정을 보다 인간과 흡사하고 자연스럽게 구현하는 방법을 제공하는 것이다. Still another object of the present invention is to grasp the user's articulation method according to the user's utterance intention through the sensor of the head and neck including the oral cavity, and to map it to the actuator of the head and neck of the robot including the humanoid ignition of the head and neck of the robot And it is to provide a method of embodying the expression more similar to the human naturally.
본 발명의 다른 목적 및 이점은 후술하는 발명의 상세한 설명 및 첨부하는 도면을 통해서 더욱 분명해질 것이다.Other objects and advantages of the present invention will become more apparent from the following detailed description of the invention and the accompanying drawings.
상기 목적을 달성하기 위하여, 본 발명은, 화자의 두경부의 일면에 인접하여 조음기관의 물리특성을 측정하는 센서부; 상기 센서부의 위치와 상기 조음기관의 물리특성을 기반으로 화자의 발화 특징을 파악하는 데이터해석부; 상기 센서부의 위치와 상기 발화특징을 언어데이터로 변환하는 데이터변환부; 상기 언어데이터를 외부로 표현하는 데이터표현부를 포함하고, 상기 센서부는, 구강설에 대응되는 구강설 센서를 포함하는 발화 의도 표현 시스템을 제공한다.In order to achieve the above object, the present invention, the sensor unit for measuring the physical characteristics of the articulation engine adjacent to one surface of the head and neck portion of the speaker; A data analysis unit which grasps a utterance characteristic of a speaker based on the position of the sensor unit and the physical characteristics of the articulator; A data converting unit converting the position of the sensor unit and the speech feature into language data; It includes a data expression unit for expressing the language data to the outside, the sensor unit provides a speech intent expression system including a mouth verbal sensor corresponding to the oral cavity.
그리고, 상기 구강설 센서는, 상기 구강설의 일측면에 고착되거나, 상기 구강설의 표면을 감싸거나, 상기 구강설 내부에 삽입되고, 발화에 따른 상기 구강설의 x축, y축, z축 방향 기반의 시간에 따른 벡터량의 변화량을 파악하여, 상기 구강설의 저고도, 전후설성, 굴곡도, 신전도, 회전도, 긴장도, 수축도, 이완도, 진동도 중 적어도 하나의 물리특성을 파악할 수 있다.The oral tongue sensor is fixed to one side of the oral tongue, wraps the surface of the oral tongue, or is inserted into the oral tongue, and the x-axis, the y-axis, and the z-axis of the oral tongue according to the ignition. By grasping the amount of change in the amount of vector over time based on direction, the physical characteristics of at least one of low altitude, anteroposterior, flexion, extension, rotational, tension, contraction, relaxation, and vibration of the oral tongue can be identified. have.
또한, 상기 구강설 센서는, 상기 구강설의 일측면에 고착되거나, 상기 구강설의 표면을 감싸거나, 상기 구강설 내부에 삽입되고, 발화에 따른 상기 구강설의 x축, y축, z축 방향 기반의 단위 시간 당 회전하는 각도의 변화량을 파악하여, 상기 구강설을 포함한 상기 조음기관의 물리 특성을 파악할 수 있다.In addition, the oral tongue sensor is fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, or inserted into the oral tongue, x-axis, y-axis, z-axis of the oral tongue according to the ignition By grasping the change amount of the rotation angle per unit time based on the direction, it is possible to grasp the physical characteristics of the articulation organ including the oral cavity.
그리고, 상기 구강설 센서는, 상기 구강설의 일측면에 고착되거나, 상기 구강설의 표면을 감싸고, 발화에 따른 상기 구강설의 수축 및 이완으로 발생하는 물리력에 따라 결정 구조의 변화에 기인하는 편극에 대응되는 전기신호가 발생하는 압전소자를 통해 상기 구강설의 굽힘도를 파악하여, 상기 구강설의 저고도, 전후설성, 굴곡도, 신전도, 회전도, 긴장도, 수축도, 이완도, 진동도 중 적어도 하나의 물리특성을 파악할 수 있다.In addition, the oral tongue sensor is fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, the polarization due to the change in the crystal structure according to the physical force generated by the contraction and relaxation of the oral tongue due to ignition The degree of bending of the oral tongue through the piezoelectric element that generates an electrical signal corresponding to the low altitude, the front and rear tongue, the degree of flexion, the extension, the rotation, the tension, the contraction degree, the relaxation degree, the vibration degree of the oral tongue At least one of the physical characteristics can be identified.
또한, 상기 센서부는, 상기 구강설이 상기 두경부 내외의 다른 조음기관과의 상호작용에 기인하는 접근 및 접촉에 대응되는 마찰전기(Tribo Electric Generator)에 따른 파열도, 마찰도, 공명도, 접근도 중 적어도 하나의 물리특성을 파악하는 마찰대전소자를 포함할 수 있다.In addition, the sensor unit, the rupture degree, friction degree, resonance degree, the approach degree of the triboelectric power (Tribo Electric Generator) corresponding to the approach and contact caused by the oral cavity is due to the interaction with other articulation organs inside and outside the head and neck It may include a triboelectric charging element for identifying at least one physical property.
그리고, 상기 데이터해석부는, 상기 센서부에서 측정되는 상기 구강설과 다른 조음기관과의 물리특성을 통해 상기 화자가 발화하는 자모음, 어휘 단위 강세 (Lexical Stress), 문장 단위 강세(Tonic stress) 중 적어도 하나의 발화 특징을 파악할 수 있다.The data interpreter may include a consonant, lexical unit stress, and sentence stress that are spoken by the speaker through physical characteristics of the oral culprit and other articulation organs measured by the sensor unit. At least one speech feature can be identified.
또한, 상기 데이터해석부는, 상기 센서부에 의해 측정되는 상기 조음기관의 물리특성에 의한 발화 특징을 파악함에 있어서, 2진수 내지 실수를 포함하는 수치로 구성된 표준 발화 특징 행렬을 기반으로 상기 화자의 발음과 강세의 정오도, 유사근접도, 발화 의도 중 적어도 하나의 발화 특징을 측정할 수 있다.In addition, the data analysis unit, in grasping the speech characteristics by the physical characteristics of the articulation engine measured by the sensor unit, the pronunciation of the speaker based on a standard speech feature matrix consisting of a binary number or a real number The utterance characteristic of at least one of hyper noon, pseudo-proximity and utterance intent can be measured.
그리고, 상기 데이터해석부는, 상기 센서부에 의해 측정되는 상기 조음기관의 물리특성을 발화 특징을 파악함에 있어서, 상기 조음기관의 물리특성을 각 자모음 단위의 패턴으로 인식하는 단계, 상기 자모음 단위의 패턴의 특징을 추출하고, 추출된 상기 자모음 단위의 패턴의 특징을 유사도에 따라 분류하는 단계, 분류된 상기 자모음 단위의 패턴의 특징을 재조합하는 단계, 상기 조음기관의 물리특성을 상기 발화 특징으로 해석하는 단계에 따라 상기 발화 특징을 파악할 수 있다.The data analyzing unit may recognize physical characteristics of the articulation engine as measured by the sensor unit, and recognize the physical characteristics of the articulation engine as a pattern of each consonant unit. Extracting a feature of the pattern of the consonant, classifying the extracted feature of the pattern of the consonant unit according to similarity, recombining the feature of the pattern of the classified consonant unit, and uttering the physical characteristics of the articulation organ According to the interpretation of the feature, the speech feature may be identified.
또한, 상기 데이터해석부는, 상기 센서부에 의해서 측정되는 상기 조음기관의 물리특성에 의해, 자모음의 동화(Assimilation), 이화(Dissimilation), 탈락(Elision), 첨가(Attachment), 강세(Stress)와, 약화(Reduction)로 야기되는 기식음화 (Asperation), 음절성자음화(Syllabic cosonant), 탄설음화(Flapping), 경음화(Tensification), 순음화(Labilalization), 연구개음화(Velarization), 치음화(Dentalizatiom), 구개음화 (Palatalization), 비음화(Nasalization), 강세변화(Stress Shift), 장음화(Lengthening) 중 적어도 하나의 이차조음현상인 발화 변이를 측정할 수 있다.In addition, the data analysis unit, according to the physical characteristics of the articulation engine measured by the sensor unit, assimilation, dissimilation, elimination, attachment, stress and stress of the consonants Asthma, Syllabic cosonant, Flapping, Tensification, Labilization, Velarization, and Dinification caused by Reduction It is possible to measure the utterance variation, which is the secondary articulation of at least one of Dentalizatiom, Palatalization, Nasalization, Stress Shift, and Lengthening.
그리고, 상기 구강설 센서는, 센서 작동을 위한 회로부, 상기 회로부를 감싸는 캡슐부, 상기 구강설 일면에 부착되는 접착부를 포함할 수 있다.The oral cavity sensor may include a circuit unit for sensor operation, a capsule unit surrounding the circuit unit, and an adhesive unit attached to one surface of the oral cavity.
또한, 상기 구강설 센서는, 박막 회로를 가진 필름 형태로서 상기 구강설에 인접하여 작동할 수 있다. In addition, the oral tongue sensor may be operated adjacent to the oral tongue as a film having a thin film circuit.
그리고, 상기 센서부는, 두경부 근육의 신경신호 측정의 기준 전위를 생성하는 적어도 하나의 레퍼런스 센서와, 상기 두경부 근육의 신경신호를 측정하는 적어도 하나의 양극센서 및 적어도 하나의 음극센서로 구성된 안면부 센서를 포함할 수 있다. The sensor unit may include a face sensor including at least one reference sensor generating a reference potential for measuring nerve signals of the head and neck muscles, at least one anode sensor and at least one cathode sensor for measuring nerve signals of the head and neck muscles. It may include.
또한, 상기 데이터해석부는, 상기 안면부 센서에 기반하여 상기 센서부의 위치를 획득함에 있어서, 상기 레퍼런스 센서를 기준으로 하여 상기 적어도 하나의 양극센서 및 상기 적어도 하나의 음극센서의 전위차를 파악하여 상기 안면부 센서의 위치를 파악할 수 있다. In addition, the data analysis unit, in acquiring the position of the sensor unit based on the face sensor, by grasping the potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor the face unit sensor I can figure out the position of.
그리고, 상기 데이터해석부는, 상기 안면부 센서에 기반하여 상기 화자의 발화 특징을 획득함에 있어서, 상기 레퍼런스 센서를 기준으로 하여 상기 적어도 하나의 양극센서 및 상기 적어도 하나의 음극센서의 전위차를 파악하여 상기 화자의 두경부에서 발생하는 상기 조음기관의 물리 특성에 의한 발화 특징을 파악할 수 있다. The data analyzer may be configured to determine a potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor in acquiring a utterance characteristic of the speaker based on the face sensor. Ignition characteristics due to the physical characteristics of the articulation engine generated in the head and neck of the.
또한, 상기 센서부는, 상기 화자의 두경부 중 성대에 인접하여 성대의 근전도 내지 떨림을 파악하여, 상기 화자의 발화 시작, 발화 정지, 발화 종료 중 적어도 하나의 발화 내역 정보를 파악하는 성대 센서를 포함할 수 있다. The sensor unit may include a vocal cord sensor that grasps EMG or tremor of the vocal cords adjacent to the vocal cords of the head and neck of the speaker and grasps at least one utterance information of the utterance start, ignition stop, or utterance of the narrator. Can be.
그리고, 상기 센서부는, 치아의 일면에 인접하여 상기 구강설 및 아랫 입술의 접촉에 기인하여 발생하는 전기적 용량 변화에 따른 신호발생 위치를 파악하는 치아센서를 포함할 수 있다. The sensor unit may include a dental sensor that detects a signal generation position due to a change in electrical capacity generated due to contact between the oral tongue and the lower lip adjacent to one surface of the tooth.
또한, 상기 데이터해석부는, 상기 화자의 두경부 일면에 인접한 음성 취득 센서를 통해 발화에 따른 상기 화자의 음성을 취득할 수 있다.The data analyzer may acquire a voice of the speaker according to the utterance through a voice acquisition sensor adjacent to one surface of the head and neck of the speaker.
그리고, 상기 센서부는, 상기 화자의 두경부 조음기관의 변화 정보, 상기 화자의 두경부 표정 변화 정보, 상기 화자의 발화 의도에 따라 움직이는 두경부, 흉곽부, 상지부, 하지부의 비언어적 표현 중 적어도 하나를 파악하기 위해 상기 화자의 두경부를 촬상하는 촬상센서를 포함할 수 있다.The sensor unit may detect at least one of change information of the head and neck articulation organ of the speaker, head and neck facial expression change information of the speaker, and non-verbal expression of the head, neck, chest, upper and lower extremities moving according to the speaker's intention to speak. It may include an imaging sensor for imaging the head and neck of the speaker.
또한, 상기 발화 의도 표현 시스템은, 상기 센서부의 구강설 센서, 안면 센서, 음성취득 센서, 성대 센서, 치아 센서, 촬상센서 중 적어도 하나에 전원을 공급하는 전원부를 더 포함할 수 있다. In addition, the speech intent expression system may further include a power supply unit for supplying power to at least one of the oral tongue sensor, facial sensor, voice acquisition sensor, vocal cord sensor, dental sensor, imaging sensor.
그리고, 상기 발화 의도 표현 시스템은, 상기 데이터해석부 및 상기 데이터베이스부가 외부에 위치하여 작동할 경우, 연동되어 통신할 수 있는 유선 또는 무선의 통신부를 더 포함할 수 있다. The speech intent representation system may further include a wired or wireless communication unit capable of interworking communication when the data analysis unit and the database unit are located outside and operated.
또한, 상기 데이터해석부는, 상기 센서부의 위치, 상기 화자의 발화 특징, 상기 화자의 음성에 대응하는 적어도 하나의 언어 데이터 색인을 포함하는 데이터베이스부와 연동될 수 있다. The data analysis unit may be linked with a database unit including at least one language data index corresponding to the position of the sensor unit, the speaker's speech feature, and the speaker's voice.
그리고, 상기 데이터베이스부는, 발화의 진행 시간, 발화에 따른 주파수, 발화의 진폭, 발화에 따른 두경부 근육의 근전도, 발화에 따른 두경부 근육의 위치 변화, 구강설의 굽힘 및 회전에 따른 위치 변화 중 적어도 하나의 정보를 기반으로, 자모음의 음소단위 색인, 음절단위 색인, 단어단위 색인, 구절단위 색인, 문장단위 색인, 연속발화단위 색인, 발음의 고저 색인 중 적어도 하나의 언어 데이터 색인을 구성할 수 있다.And, the database unit, at least one of the duration of the speech, the frequency according to the speech, the amplitude of the speech, the electromyography of the head and neck muscles according to the speech, the position change of the head and neck muscles according to the speech, the position change due to the bending and rotation of the oral tongue Based on the information of, the at least one language data index among the phoneme index, the syllable unit index, the word unit index, the phrase unit index, the sentence unit index, the continuous speech unit index, and the high and low index of the pronunciation may be constructed. .
또한, 상기 데이터표현부는, 상기 데이터베이스부의 언어 데이터 색인과 연동되어, 상기 화자의 발화 특징을 자모음의 음소(Phoneme)단위, 적어도 하나의 단어단위, 적어도 하나의 구절단위(Citation Forms), 적어도 하나의 문장단위, 연속발화단위(Consecutive Speech) 중 적어도 하나의 발화 표현을 나타낼 수 있다. In addition, the data expression unit, in conjunction with the language data index of the database unit, the speaker's utterance characteristics (Phoneme) unit, at least one word unit, at least one phrase unit (Citation Forms), at least one It may represent at least one speech expression of the sentence unit of the continuous speech (Consecutive Speech).
그리고, 상기 데이터표현부에 의해 나타나는 발화 표현은, 문자 기호, 그림, 특수기호, 숫자 중 적어도 하나로 시각화되거나, 소리 형태로 청각화되어, 상기 화자와 청자에게 제공될 수 있다.The speech expression represented by the data expression unit may be visualized as at least one of a letter symbol, a picture, a special symbol, and a number, or may be audited in a sound form and provided to the speaker and the listener.
또한, 상기 데이터표현부에 의해 나타나는 발화 표현은, 진동, 스누징, 태핑, 압박, 이완 중 적어도 하나의 촉각적 방법으로 상기 화자와 청자에게 제공될 수 있다.In addition, the utterance expression represented by the data expression unit may be provided to the speaker and the listener by at least one tactile method of vibration, snoozing, tapping, pressing, and relaxing.
그리고, 상기 데이터변환부는, 상기 센서부의 위치와 상기 두경부 표정 변화 정보를 제 1 기저 데이터로 변환하고, 상기 발화 특징과 상기 조음기관의 변화 정보와 두경부 표정 변화정보를 제 2 기저 데이터로 변환하여, 영상 객체의 두경부 또는 로봇 객체의 두경부 중 적어도 하나의 객체에 필요한 객체 두경부 데이터로 생성할 수 있다.The data converter converts the position of the sensor unit and the head and neck facial expression change information into first basis data, and converts the utterance feature, the change information of the articulation organ, and the head and neck facial expression change information into second basis data. The head and neck of the image object or the head and neck of the robot object may be generated as object head and neck data required for at least one object.
또한, 상기 발화 의도 표현 시스템은, 상기 데이터해석부에서 처리된 객체 두경부 데이터를 상기 영상 객체의 두경부 또는 상기 로봇 객체의 두경부에 표현함에 있어서, 상기 데이터변환부의 제 1 기저 데이터를 기반으로 정적 기초 좌표를 설정하고, 제 2 기저 데이터를 기반으로 동적 가변 좌표를 설정하여, 매칭 위치를 생성하는 데이터매칭부를 더 포함할 수 있다.In addition, the speech intention expression system, in representing the head and neck data processed by the data analysis unit in the head and neck of the image object or the head and neck of the robot object, the static basic coordinates based on the first basis data of the data conversion unit And setting the dynamic variable coordinates based on the second basis data to generate a matching position.
그리고, 객체 두경부 데이터는 상기 데이터매칭부에 의해 상기 로봇 객체의 두경부의 일면에 위치한 액츄에이터에 전달되고, 상기 액츄에이터는 상기 객체 두경부 데이터에 따라 조음, 발화, 표정 중 적어도 하나를 포함하는 로봇 객체의 두경부의 움직임을 구현할 수 있다.The head and neck data of the robot is transmitted to an actuator located on one surface of the head and neck of the robot object by the data matching unit, and the actuator is head and neck of the robot object including at least one of articulation, speech, and facial expression according to the head and neck data. Can implement the movement.
본 발명의 두경부 조음기관 물리 특성 기반의 발화 의도 표현 시스템은, 화자의 구강설을 중심으로 한 두경부 조음기관 활용에 발화 의도를 파악하고 청각, 시각, 촉각의 형태로 나타내어 양호한 품질의 발화 형성, 즉 발성이 표출될 수 있는 효과를 갖는다.The speech intention expression system based on the physical characteristics of the head and neck articulation organs of the present invention grasps the intention to speak in the use of the head and neck articulator centering on the oral cavity of the speaker and expresses it in the form of hearing, sight, and tactile sound, namely Talking has the effect that can be expressed.
본 발명에서는 말하기 의도를 파악하기 위해 구강설을 포함한 두경부 내외의 조음기관을 이용하게 되며, 이러한 움직임의 예를 들면, 구강설의 독립적 물리 특성이나 수동적 조음기관과 입술, 성문, 성대, 인두, 후두개 중 하나이상으로 구성된 능동적 조음기관 중 하나이상의 조음기관과의 상호 작용에 의해 생기는 폐쇄도, 파열도, 마찰도, 공명도. 접근도 중 하나이상의 특성을 파악해야 하며, 이러한 특성을 파악하기 위하여 방위각, 앙각, 회전각, 압력, 마찰, 거리, 온도, 소리 등을 파악할 수 있는 다양한 센서를 이용하게 된다.In the present invention, the articulatory organs inside and outside the head and neck, including the oral cavity, are used to grasp the intention of speaking. Examples of such movements include the independent physical characteristics of the oral tongue or the passive articulator and the lips, the gate, the vocal cord, the pharynx, the epiglottis The degree of closure, rupture, friction, and resonance caused by interaction with one or more of the active articulators, consisting of one or more of the following: One or more characteristics of the approach should be identified, and various sensors capable of identifying azimuth, elevation, rotation angle, pressure, friction, distance, temperature, sound, etc. are used to understand these characteristics.
기존에 제안된 인공 성대의 경우 외부에서 진동을 통해 소리를 내는 정도로, 한 손의 움직임이 부자연스럽고 발화의 질이 매우 낮다는 단점이 있었고, 인공 구개의 경우에는 수동적 조음기관인 경구개에 의존한다는 단점이 있었다. The existing artificial vocal cords have the disadvantage of sounding vibrations from the outside, the one hand movement is unnatural and the quality of speech is very low. In the case of artificial palate, it is dependent on the palatal, which is a passive articulator. there was.
더불어, 음성학적으로는 인공구개를 활용하여 화자의 발화를 측정하고자 하는 조음음성학(Articulatory Phonetics)이 지금껏 주류로서 인정 되어 왔으나, 발화 측정에 있어서 특정 자모음의 조음에 따른 발화의 이산적인 발화 유무만 파악할 수 있었다. 하지만, 이러한 조음음성학의 주장은 인간의 발화가 이산적인 특징을 가지는 것이 아니라. 각 음소(Phoneme), 특히 모음에 있어서, 각 모음들이 분절되어 존재할 수도 없고 분절되어 발음될 수도 없는 연속적인 체계임을 주장하는 음향음성학(Acoustic Phonetics)에 의해 학계의 의문을 불러일으키고 있다. 자세히 말하자면, 인간의 발화는 조음을 하여 "발화를 한다." 또는 "발화 하지 못했다."와 같이 이산적으로 나누어 질 수 있는 것이 아니라, 유사정도에 따른 비례적, 비율적, 단계적 특성을 지닌다는 것이다.In addition, phonetic articulatory phonetics, which attempts to measure the speaker's speech using artificial palate, has been recognized as the mainstream until now, but in the measurement of speech, only the presence or absence of discrete speech of speech caused by a specific consonant's articulation I could figure it out. However, this claim of articulation does not mean that human speech is discrete. In each phoneme, in particular vowels, academia is aroused by academic acoustics, which claims that each vowel is a continuous system that cannot be segmented and cannot be segmented or pronounced. Specifically, human speech is articulated by "speaking." Or "did not ignite", but not discretely, but rather proportional, proportional, or phased by similarity.
그렇기에 음향음성학은 화자의 발화에 따른 언어음 자체의 물리적 속성을 수치화(Scaling)하여, 발화의 유사도 또는 근접도를 파악함으로서, 종래의 조음음성학이 구현할 수 없었던 발음의 비례적, 비율적, 단계적 유사정도에 따른 발화 측정에 대해 가능성을 열어두고 있다.Therefore, acoustic phonology quantifies the physical properties of language itself according to the speaker's utterance, and grasps the similarity or proximity of utterances, so that the proportional, proportional, stepwise similarity of pronunciation that conventional articulation phonology could not implement It opens the possibility for measuring ignition by degree.
이러한 종래 관련 기술동향과 관련 학문적 배경을 참고하였을 때, 본 발명은 조음음성학의 기반을 두고서, 음향음성학이 추구하고자 하는 조음의 수치화(Scaling)에 따른 보다 정확한 발화 의도를 파악하고 구현할 수 있는 매우 획기적인 장점을 가지고 있다고 할 수 있다.When referring to the related technical trends and related academic backgrounds, the present invention is based on articulation phonology, and it is very innovative to grasp and implement more accurate speech intention according to scaling of articulation to be pursued by acoustic phonology. It has advantages.
자세히 말하자면, 본 발명에서는 화자의 조음기관 작용에 의해 발생하는 조음도를 수치화(Scaling)하여 발화 의도를 청각, 시각, 촉각의 형태로 직관적으로 제시하기 때문에 의사소통의 질 및 생활 편의도가 매우 탁월해질 것으로 기대된다는 것이다.In detail, in the present invention, the quality of the communication and the convenience of life are very excellent because the present invention intuitively presents the speech intention in the form of hearing, sight, and touch by scaling the articulation generated by the speaker's articulation. It is expected to be.
더불어, 화자의 발화에 따른 발화 의도를 문자로서 표현할 경우, Speech to Text로 응용되어, Silent Speech(침묵 대화)가 가능해진다. 이를 통해, 청각 장애인과 의사소통을 할 시에, 화자는 발화를 하고 청자인 청각 장애인은 이를 시각적 자료로 인지하기에 소통상의 어려움이 없어진다. 더불어, 의사전달에 있어서 소음에 영향을 받는 대중 교통, 공공 시설, 군사 시설 및 작전, 수중 활동 등에 활용 될 수 있다.In addition, when the intention to speak according to the speaker's speech is expressed as a character, Silent Speech (silent conversation) becomes possible by being applied to Speech to Text. In this way, when communicating with the hearing impaired, the speaker speaks and the hearing impaired as the listener recognizes this as a visual material, thus eliminating communication difficulties. In addition, it can be used for public transportation, public facilities, military installations and operations and underwater activities that are affected by noise in communication.
더불어, 발화에 따라 변화하는 화자의 두경부 조음기관의 외상을 촬상함으로서, 발화와 발화에 따른 조음기관의 외적 변화의 연관성을 파악해, 언어학적 방면과 보완 대체 의사소통 방면, 휴머노이드의 안면 구현 방면으로 활용될 수 있다.In addition, by taking a picture of the trauma of the head and neck articulators that change according to the utterance, it is possible to grasp the relationship between the external changes of the articulation organ according to the utterance and utterance, and to use it as a linguistic aspect, complementary alternative communication aspect, and facial expression of the humanoid. Can be.
특히, 애니메이션 및 영화 제작 업계에서는 현재까지 애니메이션 캐릭터를 포함한 영상 객체의 발화와 표정의 일치를 달성하는 데 어려움을 겪고 있다. 가장 문제가 되는 부분은 바로 조음 기관의 작동, 발화의 영역이다. 인간의 복잡한 조음기관의 물리적 특성을 제대로 반영하지 못해, 월트 디즈니 및 픽사 등의 거대 기업들조차도 캐릭터가 입만 뻥긋거리는 정도의 개발 수준에 그치며, 대사와 발화 및 표정 간의 낮은 일치도를 보여준다. 이러한 문제를 해결하기 위해 영상 제작팀에게 높은 비용을 지불하고서 전신에 성우 혹은 모사 배우에게 특징점을 부착한다. 하지만, 이러한 방법은 영상 객체의 발화나 표정에 대한 근본적인 부분을 해결하지 못하고, 신체 전반에 걸치는 거시적인 움직임을 표현하는 것을 주 목적으로 하는 한계가 있다. 그러나, 본 발명은 실제 인간 화자의 조음기관 물리특성을 측정하여 이를 영상 객체의 두경부에 맵핑함으로서 영상 객체의 발화나 표정을 실제 인간 화자와 비슷하게 구현할 수 있도록 한다. In particular, in the animation and film production industry, it has been difficult to achieve the matching of the utterances and expressions of visual objects including animation characters. The most problematic part is the articulation of the articulation organs. Because they do not properly reflect the physical characteristics of human's complex articulators, even giant companies such as Walt Disney and Pixar are only at the level of development, where characters are squeaky, showing a low degree of agreement between dialogue, speech and expression. In order to solve this problem, a high cost is paid to the film production team and the feature points are attached to the voice actors or replica actors. However, this method does not solve the fundamental part of the utterance or expression of the image object, there is a limit to the main purpose to express the macroscopic movement over the entire body. However, the present invention measures physical characteristics of articulator organs of a real human speaker and maps them to the head and neck of the image object so that the utterance or expression of the image object can be realized similar to the real human speaker.
특히, 본 발명은 화자 조음정보를 로봇 객체의 두경부의 움직임을 구현하는 액추에이터에 전달하여 매칭시킴으로서, 인간 화자와 유사한 로봇의 조음, 발화, 표정을 포함하는 두경부 움직임을 재현하는 것으로, 이는 일본의 모리 마사히로(Mori Masahiro)가 주장한 휴머노이드 로봇이 인간에게 유발하는 만성적인 인지적 부조화인 "불쾌한 골짜기(Uncanny Valley)"를 극복할 수 있는 효과가 있다. 더불어, 휴머노이드 및 그 외 일반 로봇의 인간 친화적인 조음 구현이 가능해짐에 따라, 로봇 및 안드로이드의 인간 역할 대체가 가능해지고, 더 나아가 인간-로봇의 대화가 달성됨으로서 고령화에 따른 노인 인구 증대로 대두되는 노인 사회의 고립 현상 및 우울증과 같은 정신/심리적 질환을 예방할 수 있는 효과가 있다. In particular, the present invention transmits and matches the speaker articulation information to an actuator that implements the movement of the head and neck of the robot object, thereby reproducing head and neck movements including articulation, utterance, and expression of a robot similar to a human speaker, Humanoid robots insisted by Masahiro have the effect of overcoming the "Uncanny Valley", a chronic cognitive dissonance caused by humans. In addition, human-friendly articulation of humanoids and other robots becomes possible, and it is possible to replace human roles of robots and Androids, and furthermore, human-robot dialogues are achieved, thereby increasing the elderly population due to aging. It is effective in preventing mental and psychological diseases such as isolation and depression in elderly society.
도 1은 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 센서부를 도시한 도면.1 is a view showing a sensor unit of a speech intention representation system according to a first embodiment of the present invention.
도 2는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 센서부의 위치를 도시한 도면.2 is a view showing the position of the sensor unit of the speech intention representation system according to the first embodiment of the present invention.
도 3은 본 발명의 제1실시예에 따른 발화 의도 표현 시스템을 도시한 도면.3 is a diagram illustrating a speech intent representation system according to a first embodiment of the present invention;
도 4는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템에서 활용되는 구강설의 위치적 명칭을 도시한 도면.4 is a view showing the positional name of the oral cavity used in the speech intent expression system according to the first embodiment of the present invention.
도 5는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템에서 활용되는 모음 발화를 위한 구강설의 작용을 도시한 도면.5 is a view showing the action of oral tongue for vowel speech utilized in the speech intent expression system according to the first embodiment of the present invention.
도 6 내지 도 10은 각각 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 다양한 구강설 센서를 도시한 도면.6 to 10 are views illustrating various oral cavity sensors of a speech intention expression system according to a first embodiment of the present invention, respectively.
도 11 및 도 12는 각각 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 구강설 센서의 부착상태를 도시한 단면도 및 사시도.11 and 12 are a cross-sectional view and a perspective view respectively showing the attachment state of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
도 13은 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 구강설 센서의 회로부를 도시한 도면.13 is a view showing a circuit portion of the oral cavity sensor of the speech intention expression system according to the first embodiment of the present invention.
도 14는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 구강설 센서의 다양한 활용상태를 도시한 도면.FIG. 14 is a view illustrating various utilization states of the oral cavity sensor of the speech intention expression system according to the first embodiment of the present invention; FIG.
도 15는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템을 도시한 도면.FIG. 15 is a diagram illustrating a speech intent representation system according to a second embodiment of the present invention; FIG.
도 16은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 발화 특징을 파악하는 원리를 도시한 도면.FIG. 16 is a diagram illustrating a principle in which a data interpreter of a speech intent expression system according to a second embodiment of the present invention grasps speech characteristics. FIG.
도 17은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 측정된 조음기관의 물리 특성을 발화 특징으로 파악하는 원리를 도시한 도면.17 is a view showing a principle of grasping the physical characteristics of the articulation engine measured by the data analysis unit of the speech intent representation system according to the second embodiment of the present invention as speech characteristics.
도 18은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 활용하는 모음에 관한 표준 발화 특징 행렬을 도시한 도면.FIG. 18 is a diagram illustrating a standard speech feature matrix for a vowel utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention. FIG.
도 19는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 활용하는 자음에 관한 표준 발화 특징 행렬을 도시한 도면.FIG. 19 is a diagram showing a standard speech feature matrix relating to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention. FIG.
도 20은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 조음기관의 물리 특성을 발화 특징으로 파악하기 위하여 활용하는 알고리즘 프로세스를 도시한 도면.20 is a diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features;
도 21은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 조음기관의 물리 특성을 발화 특징으로 파악하기 위하여 활용하는 알고리즘 프로세스를 상세히 도시한 도면.FIG. 21 is a detailed diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation organ as speech features; FIG.
도 22는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 조음기관의 물리 특성을 발화 특징으로 파악하기 위하여 활용하는 알고리즘 프로세스의 원리를 상세히 도시한 도면.FIG. 22 illustrates in detail the principle of an algorithmic process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation organ as speech features; FIG.
도 23은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 구강설 센서가 발화된 특정 모음을 발화 특징으로 파악하는 알고리즘 프로세스를 도시한 도면.FIG. 23 is a diagram illustrating an algorithm process of identifying a specific vowel uttered by the oral cavity sensor of the utterance intention expression system according to the second embodiment of the present invention as a utterance characteristic; FIG.
도 24는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Alveolar Stop을 활용하는 경우를 도시한 도면.FIG. 24 is a diagram illustrating a case in which the data analysis unit of the speech intent representation system according to the second embodiment of the present invention utilizes Alveolar Stop; FIG.
도 25는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Bilabial Stop을 활용하는 경우를 도시한 도면.FIG. 25 is a diagram illustrating a case in which a data analysis unit of a speech intention representation system according to a second embodiment of the present invention utilizes a bilabial stop; FIG.
도 26은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Voiced Bilabial Stop을 활용한 실험 결과를 도시한 도면.FIG. 26 is a diagram illustrating an experimental result using a voiced bilabial stop of a data interpreter of a speech intention expression system according to a second embodiment of the present invention; FIG.
도 27 및 도 28은 각각 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Voiced Labiodental Fricative를 활용하는 경우를 도시한 도면.27 and 28 are diagrams illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment of the present invention utilizes a voiced labiodental fricative;
도 29는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부 및 데이터베이스의 연동을 도시한 도면.FIG. 29 is a diagram illustrating interworking between a data interpreter and a database of a speech intent representation system according to a second embodiment of the present invention; FIG.
도 30은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 특정 단어로 파악하는 경우를 도시한 도면.30 is a diagram illustrating a case in which a data interpreter of a speech intention expression system according to a second embodiment of the present invention recognizes a specific word.
도 31은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터베이스부를 도시한 도면.FIG. 31 is a diagram showing a database unit of a speech intent representation system according to a second embodiment of the present invention; FIG.
도 32는 본 발명의 제3실시예에 따른 발화 의도 표현 시스템을 도시한 도면.32 is a diagram showing a speech intention expression system according to a third embodiment of the present invention;
도 33 및 도 34는 각각 본 발명의 제3실시예에 따른 발화 의도 표현 시스템의 데이터베이스부의 실제 형태를 도시한 도면.33 and 34 are diagrams each showing an actual form of a database unit of the speech intent representation system according to the third embodiment of the present invention.
도 35는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템을 도시한 도면.35 is a diagram showing a speech intention expression system according to a fourth embodiment of the present invention;
도 36은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 센서부, 데이터해석부, 데이터표현부 및 데이터베이스부의 연동을 도시한 도면.36 is a view showing interlocking of a sensor unit, a data analysis unit, a data expression unit, and a database unit in a speech intention representation system according to a fourth embodiment of the present invention;
도 37 내지 도 41은 각각 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 표현하는 수단을 도시한 도면.37 to 41 respectively show a means for expressing language data by a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention;
도 42는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 시각적 및 청각적으로 표현하는 경우를 도시한 도면.FIG. 42 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention expresses language data visually and acoustically; FIG.
도 43은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 시각적으로 표현하는 경우를 도시한 도면.FIG. 43 is a diagram illustrating a case in which a data expression unit of a speech intention expression system according to a fourth embodiment of the present invention visually expresses language data; FIG.
도 44는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 시각적으로 표현하는 경우를 도시한 도면.FIG. 44 is a diagram illustrating a case in which a data expression unit of a speech intention expression system in accordance with a fourth embodiment of the present invention visually expresses language data; FIG.
도 45는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 연속 발화 단위로 표현하는 경우를 도시한 도면.FIG. 45 is a diagram illustrating a case in which a data expression unit of a speech intention expression system in accordance with a fourth embodiment of the present invention expresses language data in a continuous speech unit; FIG.
도 46은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 활용하는 Confusion Matrix를 도시한 도면.FIG. 46 is a diagram illustrating a confusion matrix utilized by a speech intent representation system according to a fourth embodiment of the present invention. FIG.
도 47은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 활용하는 Confusion Matrix를 백분율로 도시한 도면.FIG. 47 is a diagram showing, as a percentage, the Confusion Matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention. FIG.
도 48은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 화면을 통해 화자로 하여금 언어 교정 및 지도를 돕는 경우를 도시한 도면.48 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention assists a speaker in language correction and guidance through a screen;
도 49는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 두경부 조음기관의 외상을 촬상하고 파악하는 경우를 도시한 도면.FIG. 49 is a diagram illustrating a case where a speech intention expression system according to a fourth embodiment of the present invention captures and captures an image of a head and neck articulation organ; FIG.
도 50은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 표준 발화 특징 행렬을 통해 상호 정보들을 결합시키는 경우를 도시한 도면.50 is a diagram illustrating a case in which a speech intent representation system according to a fourth embodiment of the present invention combines mutual information through a standard speech feature matrix;
도 51은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템을 도시한 도면.FIG. 51 is a diagram illustrating a speech intention presentation system according to a fifth embodiment of the present invention; FIG.
도 52는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 정적 기초 좌표를 기반으로 객체 두경부 데이터를 영상 객체의 두경부에 매칭하는 경우를 도시한 도면.FIG. 52 illustrates a case in which a speech intention expression system according to a fifth embodiment of the present invention matches object head and neck data to head and neck portions of an image object based on static basic coordinates. FIG.
도 53은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 위치에 기반한 정적 기초 좌표를 도시한 도면.FIG. 53 is a view illustrating static basis coordinates based on a position of a face sensor utilized by a speech intention representation system according to a fifth embodiment of the present invention; FIG.
도 54는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 동적 기초 좌표를 기반으로 객체 두경부 데이터를 영상 객체의 두경부에 매칭하는 경우를 도시한 도면.FIG. 54 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on dynamic basic coordinates. FIG.
도 55는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 전압차에 기반한 동적 기초 좌표를 도시한 도면.FIG. 55 is a view showing dynamic basis coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention; FIG.
도 56는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 정적 기초 좌표를 기반으로 객체 두경부 데이터를 로봇 객체의 두경부의 액츄에이터에 매칭하는 경우를 도시한 도면.FIG. 56 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches the head and neck data of the robot object to the actuator of the head and neck of the robot object based on the static basic coordinates. FIG.
도 57은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 전압차에 기반한 정적 기초 좌표를 도시한 도면.FIG. 57 is a view showing static basic coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention; FIG.
도 58은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 동적 가변 좌표를 기반으로 객체 두경부 데이터를 로봇 객체의 두경부의 액츄에이터에 매칭하는 경우를 도시한 도면.58 is a diagram illustrating a case in which the speech intent representation system according to the fifth embodiment of the present invention matches the head and neck data with an actuator of the head and neck of a robot object based on dynamic variable coordinates.
도 59는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 전압차에 기반한 동적 가변 좌표를 도시한 도면.FIG. 59 is a view illustrating dynamic variable coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to a fifth embodiment of the present invention; FIG.
도 60 및 도 61은 각각 본 발명의 제5실시예에 따른 발화 의도 표현 시스템의 로봇 객체의 두경부의 액츄에이터의 동작을 도시한 도면.60 and 61 are views showing the operation of the actuator of the head and neck portion of the robot object of the speech intent representation system according to the fifth embodiment of the present invention.
도 62는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템의 로봇 객체의 두경부의 액츄에이터를 도시한 도면.FIG. 62 is a view showing an actuator of the head and neck part of the robot object of the speech intent expression system according to the fifth embodiment of the present invention; FIG.
이하, 첨부된 도면을 참조하면서 본 발명의 일실시예에 따른 발화 의도 표현을 위한 두경부 조음기관의 물리 특성을 이용한 발화 의도 표현 시스템에 대해 상세히 설명하기로 한다. Hereinafter, with reference to the accompanying drawings will be described in detail a speech intent expression system using the physical characteristics of the head and neck articulator for expressing speech intent according to an embodiment of the present invention.
본 발명의 하기의 실시예는 본 발명을 구체화하기 위한 것일 뿐 본 발명의 권리범위를 제한하거나 이를 한정하는 것이 아님은 물론이다. 본 발명의 상세한 설명 및 실시예로부터 본 발명이 속하는 기술 분야의 전문가가 용이하게 유추할 수 있는 것은 본 발명의 권리범위에 속하는 것으로 해석된다. The following examples of the present invention are intended to embody the present invention, but not to limit or limit the scope of the present invention. What can be easily inferred by those skilled in the art from the detailed description and the embodiments of the present invention is interpreted as belonging to the scope of the present invention.
본 발명을 실시하기 위한 내용을 도 1부터 도 62를 기반으로 상세히 설명하고자 한다. Details for carrying out the present invention will be described in detail with reference to FIGS. 1 to 62.
도 1은 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 센서부를 도시한 도면이고, 도 2는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 센서부의 위치를 도시한 도면이고, 도 3은 본 발명의 제1실시예에 따른 발화 의도 표현 시스템을 도시한 도면이다.1 is a diagram illustrating a sensor unit of a speech intent representation system according to a first embodiment of the present invention, and FIG. 2 is a diagram illustrating a position of a sensor unit of a speech intention representation system according to a first embodiment of the present invention. 3 is a diagram illustrating a speech intention expression system according to a first embodiment of the present invention.
도 1, 도 2 및 도 3에 도시한 바와 같이, 본 발명의 제1실시예에 따른 발화 의도 표현 시스템에서, 센서부(100)는 두경부에 위치하는 구강설 센서(110), 안면 센서(120), 음성 취득 센서(130), 성대 센서(140), 치아센서(150)로 구성된다.1, 2 and 3, in the utterance intention expression system according to the first embodiment of the present invention, the sensor unit 100 is oral tongue sensor 110, face sensor 120 located in the head and neck portion ), The voice acquisition sensor 130, the vocal cord sensor 140, the tooth sensor 150.
더욱 자세히는 두경부에 위치하는 구강설 센서(110), 안면 센서(120), 음성 취득 센서(130), 성대 센서(140), 치아센서(150)는, 각 센서들이 위치하는 센서부의 위치(210), 화자(10)의 발화에 따른 발화 특징(220), 화자의 음성(230), 발화 내역 정보(240), 발화 변이(250)에 대한 데이터를 제공한다. In more detail, the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 positioned in the head and neck part are located in the sensor unit 210 in which the respective sensors are located. ), The speech feature 220 according to the utterance of the speaker 10, the voice of the speaker 230, the speech detail information 240, and the speech variation 250 are provided.
데이터해석부(200)는 이러한 데이터들을 취득하고, 데이터변환부(300)는 이러한 데이터를 언어데이터(310)로 처리한다.The data interpreter 200 acquires such data, and the data converter 300 processes the data as language data 310.
도 4는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템에서 활용되는 구강설의 위치적 명칭을 도시한 도면이고, 도 5는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템에서 활용되는 모음 발화를 위한 구강설의 작용을 도시한 도면이다. 4 is a view showing the positional name of the oral cavity used in the speech intent representation system according to the first embodiment of the present invention, Figure 5 is used in the speech intent representation system according to the first embodiment of the present invention It is a diagram showing the action of oral tongue for vowel speech.
도 4 및 도 5에 도시한 바와 같이, 구강설 센서(110)의 경우, 구강설(12)의 일측면에 고착되거나, 그 표면을 감싸거나, 그 내부에 삽입되며, 저고도, 전후설성, 굴곡도, 신전도, 회전도, 긴장도, 수축도, 이완도, 진동도 중 하나이상의 구강설 자체의 독립 물리 특성을 파악한다. As shown in FIGS. 4 and 5, in the case of the oral tongue sensor 110, the oral tongue sensor 12 is fixed to one side of the oral tongue 12, surrounds the surface, or is inserted therein, and has a low altitude, front and rear tongue shape, and bending Identify the independent physical properties of the oral tongue itself of at least one of degrees, extension, rotation, tension, contraction, relaxation, and vibration.
도 6 내지 도 10은 각각 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 다양한 구강설 센서를 도시한 도면이다.6 to 10 are views illustrating various oral cavity sensors of a speech intention expression system according to a first embodiment of the present invention.
도 6 및 도 7에 도시한 바와 같이, 구강설(12) 자체의 독립 물리 특성을 파악함에 있어서, 구강설 센서(110)는 x축, y축, z축 방향의 가속도 및 단위 시간 당 회전하는 각도의 변화량(각속도) 중 적어도 하나를 파악함으로써, 구강설(12)을 포함한 다른 조음기관의 물리 특성에 의한 발화 특징(220)을 파악한다. As shown in Figs. 6 and 7, in grasping the independent physical characteristics of the oral tongue 12 itself, the oral tongue sensor 110 rotates per unit time and acceleration in the x-axis, y-axis, and z-axis directions. By grasping at least one of the change amount (angular velocity) of the angle, the ignition characteristic 220 by the physical characteristics of another articulation organ including the oral cavity 12 is grasped.
도 8에 도시한 바와 같이, 구강설 센서(110)는 발화에 따른 구강설(12)의 수축 내지 이완으로 발생하는 물리력에 따라 결정 구조(111)의 변화에 의해 편극이 발생하여 전기신호가 발생하는 압전소자(112)를 통해 구강설(12)의 굽힘도를 파악함으로써, 구강설(12)을 포함한 조음기관의 물리 특성에 의한 발화 특징(220)을 파악할 수 있다. As shown in FIG. 8, the oral tongue sensor 110 generates a polarization signal due to a change in the crystal structure 111 according to the physical force generated by contraction or relaxation of the oral tongue 12 due to ignition. doing By grasping the bending degree of the oral tongue 12 through the piezoelectric element 112, it is possible to grasp the ignition characteristic 220 due to the physical characteristics of the articulation organ including the oral tongue 12.
도 9에 도시한 바와 같이, 구강설 센서(110)는 구강설(12)이 두경부 내외의 다른 조음기관과의 상호작용에 의해 생기는 접근 및 접촉에 의해 발생하는 마찰전기(Tribo Electric Generator)에 따른 연계 물리 특성을 파악하기 위해 마찰대전소자(113)를 사용하여 화자의 발화 특징(220)을 파악한다. As shown in FIG. 9, the oral tongue sensor 110 has a triboelectric generator generated by the approach and contact caused by the oral tongue 12 interacting with other articulation organs inside and outside the head and neck. In order to understand the associated physical characteristics, the speaker's ignition characteristic 220 is identified using the triboelectric charging element 113.
도 10에 도시한 바와 같이, 통합된 구강설 센서(110)는 x축, y축, z축 방향의 가속도 및 각속도, 압전에 의한 전기신호, 접촉에 의한 마찰전기를 이용하여 구강설(12)을 포함하는 조음기관의 물리 특성에 의한 발화 특징(220)을 파악한다.As shown in FIG. 10, the integrated oral tongue sensor 110 includes an acceleration and angular velocity in the x-axis, y-axis, and z-axis directions, an electrical signal by piezoelectricity, and a triboelectricity by contact. Identify the utterance characteristics 220 by the physical characteristics of the articulation organ, including.
도 11 및 도 12는 각각 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 구강설 센서의 부착상태를 도시한 단면도 및 사시도이다. 11 and 12 are a cross-sectional view and a perspective view respectively showing the attachment state of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
도 11 및 도 12에 도시한 바와 같이, 구강설 센서(110)는 복합 박막 회로로 구성되어 단일한 필름 형태로 구현될 수 있다. 이때, 구강설 센서(110)는 센서부(100)를 작동하기 위한 회로부(114), 회로부(114)를 감싸는 캡슐부(115), 구강설 센서(110)를 구강설(12)의 일면에 고착시키는 접착부(116)로 구성된다.As illustrated in FIGS. 11 and 12, the oral tongue sensor 110 may be configured as a composite thin film circuit and implemented as a single film. In this case, the oral tongue sensor 110 includes a circuit part 114 for operating the sensor part 100, a capsule part 115 surrounding the circuit part 114, and an oral tongue sensor 110 on one surface of the oral tongue 12. It consists of an adhesive part 116 to adhere.
도 6 내지 도 9에 도시한 바와 같이, 구강설 센서(110)는 각 센서의 특징에 따라 두경부 내외의 다른 조음기관과의 인접 내지 응접에 의해 생기는 파열도, 마찰도, 공명도, 접근도 중 하나 이상의 물리 특성을 파악할 수 있다.  As shown in Figures 6 to 9, the oral tongue sensor 110, the degree of rupture, friction, resonance, approaching degree caused by the adjacent or in contact with other articulation organs inside and outside the head and neck according to the characteristics of each sensor One or more physical characteristics can be identified.
도 13은 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 구강설 센서의 회로부를 도시한 도면이다.FIG. 13 is a diagram illustrating a circuit unit of an oral cavity sensor of a speech intention expression system according to a first embodiment of the present invention.
도 13에 도시한 바와 같이, 구강설 센서(110)의 회로부(114)는 통신칩, 센싱회로, MCU로 구성된다.As shown in FIG. 13, the circuit unit 114 of the oral cavity sensor 110 includes a communication chip, a sensing circuit, and an MCU.
도 14는 본 발명의 제1실시예에 따른 발화 의도 표현 시스템의 구강설 센서의 다양한 활용상태를 도시한 도면이다.14 is a diagram illustrating various utilization states of the oral cavity sensor of the speech intent expression system according to the first embodiment of the present invention.
도 14에 도시한 바와 같이, 구강설 센서(110)는 화자의 다양한 자모음의 발화에 따른 구강설(12)의 상태를 파악하여, 자모음 발화에 따른 발화 특징(220)을 파악할 수 있다. As shown in FIG. 14, the oral culprit sensor 110 may grasp the state of the oral culprit 12 according to the utterance of the speaker's various consonants, and identify the utterance characteristics 220 according to the consonant vowels.
예를 들어, 구강설 센서(110)는 Bilabial Sound (양순음), Alveolar Sound (치경음), Palatal Sound (구개음)에 따른 발화 특징(220)을 파악할 수 있다.For example, the oral cavity sensor 110 may grasp the utterance feature 220 according to the Bilabial Sound, the Alveolar Sound, and the Palatal Sound.
도 15는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템을 도시한 도면이다. 15 is a diagram illustrating a speech intention expression system according to a second embodiment of the present invention.
도 15에 도시한 바와 같이, 본 발명의 제2실시예에 따른 발화 의도 표현 시스템에서, 구강설 센서(110), 안면 센서(120), 음성취득센서(130), 성대센서(140), 치아센서(150)로 이루어진 두경부 조음기관 인근의 센서부(100)는, 두경부 조음기관에서 센서가 위치한 센서부의 위치(210), 발화에 따른 발화특징(220), 발화에 따른 화자의 음성(230), 발화의 시작, 발화 정지, 발화 종료를 포함하는 발화 내역 정보(240)을 파악한다. As shown in FIG. 15, in the speech intention expression system according to the second embodiment of the present invention, the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, the teeth The sensor unit 100 near the head and neck articulation organ including the sensor 150 is located in the head and neck articulation organ, where the sensor is located 210, the utterance feature 220 according to the utterance, and the speaker's voice 230 according to the utterance. The identification history information 240 including the start of the utterance, the utterance stop, and the utterance end is grasped.
이때, 발화 특징(220)은, 인간이 발화할 때 발생하는 페쇄파열음화, 마찰음화, 파찰음화도, 비음화, 유음화, 활음화, 치찰음화, 유무성음화, 성문음화 중 하나이상의 기본적인 물리적 발화 특징을 의미한다. 또한, 화자의 음성(230)은, 발화 특징으로 인해 함께 수반되는 청각적인 발화 특징이다. 그리고, 발화 내역 정보(240)는, 성대 센서(140)를 통한 것으로, 성대의 근전도 내지 떨림으로 그 정보를 파악한다. In this case, the utterance feature 220 may include one or more basic physical utterance characteristics of closed rupture speech, frictional speech, rupture speech, non-negative speech, voiced speech, active speech, sibilant speech, voiceless speech, and voiced speech. it means. The speaker's voice 230 is also an auditory speech feature that accompanies the speech feature. And, the utterance history information 240, through the vocal cord sensor 140, grasps the information by EMG or tremor of the vocal cords.
데이터해석부(200)는, 구강설 센서(110), 안면 센서(120), 음성취득센서(130), 성대센서(140), 치아센서(150)로 이루어진 두경부 조음기관 인근의 센서부(100)가 측정한 화자의 조음기관 물리특성에서 화자의 성별, 인종, 나이, 모국어에 따라 발생하는 발화 변이(250)를 파악한다. The data analysis unit 200 includes a sensor unit 100 near the head and neck articulation organ including the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150. In the physical characteristics of the speaker's articulation organ measured by), the speech variation 250 generated according to the speaker's gender, race, age, and native language is identified.
이때, 발화 변이(250)는 자모음의 동화(Assimilation), 이화(Dissimilation), 탈락(Elision), 첨가(Attachment), 강세(Stress), 약화(Reduction)로 야기되는 기식음화 (Asperation), 음절성자음화(Syllabic cosonant), 탄설음화(Flapping), 경음화(Tensification), 순음화(Labilalization), 연구개음화(Velarization), 치음화(Dentalizatiom), 구개음화 (Palatalization), 비음화(Nasalization), 강세변화(Stress Shift), 장음화(Lengthening) 중 하나이상의 이차조음현상을 포함한다.At this time, the utterance variation 250 is an associative syllable caused by assimilation, dissimilation, elimination, attachment, stress, and reduction of consonants. Syllabic cosonant, Flapping, Tensification, Labilization, Velarization, Dentalizatiom, Palatalization, Nasalization, Stress change Stress shift), or one or more secondary articulations of lengthening (Lengthening).
데이터변환부(300)는, 두경부 조음기관 센서들(110, 120, 130, 140, 150)에 의해 측정된 센서부의 위치(210), 발화에 따른 발화특징(220), 발화에 따른 화자의 음성(230), 발화 내역 정보(240), 발화 변이(250)를 언어데이터(310)로 인지하여 처리한다. The data converter 300 may include a position 210 of the sensor unit measured by the head and neck articulator sensors 110, 120, 130, 140, and 150, a speech feature 220 according to speech, and a speaker's voice according to speech. 230, the utterance history information 240 and the utterance variation 250 are recognized as language data 310 and processed.
이때, 데이터변환부(300)가 언어데이터(310)를 인지하여 처리함에 있어서, 데이터해석부(200)는 데이터베이스부(350)와 연동된다. At this time, in the data conversion unit 300 recognizes and processes the language data 310, the data analysis unit 200 is linked with the database unit 350.
데이터베이스부(350)는, 자모음의 음소단위(361), 색인 음절 단위 색인(362), 단어단위 색인(363), 구절단위 색인(364), 문장단위 색인(365), 연속 발화 색인(366), 발음의 고저 색인(367)을 포함하는 언어 데이터 색인(360)을 가지고 있다. 이러한 언어 데이터 색인(360)을 통해, 데이터해석부(200)는 센서부(100)에서 취득된 다양한 발화 관련 정보들을 언어데이터로 처리할 수 있게 된다. The database unit 350 includes a phoneme unit 361, an index syllable unit index 362, a word unit index 363, a phrase unit index 364, a sentence unit index 365, and a continuous speech index 366. ), A linguistic data index 360 including a high and low index 367 of the pronunciation. Through the language data index 360, the data analysis unit 200 may process various speech-related information acquired by the sensor unit 100 as language data.
도 16은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 발화 특징을 파악하는 원리를 도시한 도면이고, 도 17은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 측정된 조음기관의 물리 특성을 발화 특징으로 파악하는 원리를 도시한 도면이고, 도 18은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 활용하는 모음에 관한 표준 발화 특징 행렬을 도시한 도면이고, 도 19는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 활용하는 자음에 관한 표준 발화 특징 행렬을 도시한 도면이다.FIG. 16 is a diagram illustrating a principle in which a data interpreter of a speech intent representation system according to a second embodiment of the present invention grasps speech characteristics, and FIG. 17 illustrates data of a speech intent representation system according to a second embodiment of the present invention. FIG. 18 is a diagram illustrating a principle of identifying physical characteristics of an articulation engine measured as an utterance feature, and FIG. 18 is a standard utterance feature regarding a vowel used by a data interpreter of a speech intent representation system according to a second embodiment of the present invention 19 is a diagram showing a matrix, and FIG. 19 is a diagram illustrating a standard speech feature matrix related to consonants utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention.
도 16, 도 17, 도 18 및 도 19에 도시한 바와 같이, 데이터해석부(200)는, 구강설 센서(110)를 포함한 센서부(100)로부터 측정된 조음기관의 물리 특성을 먼저 획득한다. 구강설 센서(110)로 인해 조음기관 물리 특성이 획득된 경우, 구강설 센서(110)는 조음기관 물리 특성을 센싱하면서 센싱된 물리 특성의 행렬값을 만든다. As shown in FIGS. 16, 17, 18, and 19, the data analysis unit 200 first acquires physical characteristics of the articulation organ measured from the sensor unit 100 including the oral cavity sensor 110. . When the articulation organ physical characteristics are acquired due to the oral cavity sensor 110, the oral cavity sensor 110 senses the articulation organ physical characteristics and creates a matrix value of the sensed physical characteristics.
이후, 데이터해석부(200)는, 자모음의 표준 발화 특징 행렬(205)에서 이러한 물리 특성의 행렬값에 대응하는 자모음의 발화특징(220)을 파악한다. 이때 자모음의 표준 발화 특징 행렬(205)는 그 내부의 값들이 자모음 발화 기호, 2진수 내지 실수 중 하나이상으로 존재할 수 있다.Subsequently, the data analysis unit 200 grasps the speech feature 220 of the consonant corresponding to the matrix value of the physical property in the standard speech feature matrix 205 of the consonant. In this case, the standard speech feature matrix 205 of the consonant may have one or more of its values as a consonant utterance symbol, binary or real number.
도 20은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 조음기관의 물리 특성을 발화 특징으로 파악하기 위하여 활용하는 알고리즘 프로세스를 도시한 도면이다.FIG. 20 is a diagram illustrating an algorithm process utilized by a data interpreter of a speech intention expression system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features.
도 20에 도시한 바와 같이, 데이터해석부(200)가 활용하는 알고리즘 프로세스는, 센서부(100)에 의해 측정된 조음기관의 물리 특성을 파악함에 있어서, 조음기관의 물리 특성을 취득하는 단계, 취득된 조음기관의 물리 특성이 가지고 있는 각 자모음 단위의 패턴을 파악하는 단계, 각 자모음 패턴으로부터 고유한 특징을 추출하는 단계, 추출된 특징들을 분류하는 단계, 분류된 패턴의 특징들을 재조합하는 단계로 이루어지고, 이를 통해 최종적으로 특정 발화 특징으로 파악한다.As shown in FIG. 20, the algorithm process utilized by the data analysis unit 200 includes the steps of acquiring the physical characteristics of the articulation engine in grasping the physical characteristics of the articulation engine measured by the sensor unit 100; Identifying a pattern of each consonant unit possessed by the acquired physical characteristics of the articulated organ, extracting a unique feature from each consonant pattern, classifying the extracted features, and recombining the features of the classified pattern It consists of stages, through which the final identification of the specific speech characteristics.
도 21은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 조음기관의 물리 특성을 발화 특징으로 파악하기 위하여 활용하는 알고리즘 프로세스를 상세히 도시한 도면이고, 도 22는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 조음기관의 물리 특성을 발화 특징으로 파악하기 위하여 활용하는 알고리즘 프로세스의 원리를 상세히 도시한 도면이고, 도 23은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 구강설 센서가 발화된 특정 모음을 발화 특징으로 파악하는 알고리즘 프로세스를 도시한 도면이다.FIG. 21 is a detailed diagram illustrating an algorithm process utilized by a data interpreter of a speech intent representation system according to a second embodiment of the present invention to grasp the physical characteristics of an articulation engine as speech features; FIG. FIG. 23 is a diagram illustrating in detail the principle of an algorithm process utilized by a data interpreter of a speech intention representation system according to a second embodiment to identify physical characteristics of an articulation engine as speech characteristics, and FIG. 23 is a diagram illustrating a second embodiment of the present invention. It is a diagram illustrating an algorithm process of identifying a specific vowel as a speech feature by the oral cavity sensor of the speech intent expression system.
도 21, 도 22 및 도 23에 도시한 바와 같이, 데이터해석부(200)가 진행하는 발화 특징 파악 알고리즘에 있어서, 각 자모음의 단위의 패턴을 파악하는 단계는, 조음기관 물리특성을 파악한 센서부(100)가 구강설(12)일 경우에 x, y, z축을 기반으로 그 자모음 단위의 패턴을 파악한다. As shown in Figs. 21, 22 and 23, in the speech feature grasping algorithm performed by the data analysis unit 200, the step of grasping the pattern of the unit of each consonant is a sensor grasping the physical characteristics of the articulation engine. When the unit 100 is the oral cavity 12, the pattern of the consonant unit is determined based on the x, y, and z axes.
이때, 알고리즘은 K-nearset Neihbor(KNN), Artificial Neural Network(ANN), Convolutional Neural Network(CNN), Recurrent Neural Network(RNN), Restricted Boltzmann Machine(RBM), Hidden Markov Model(HMM) 중 하나이상의 알고리즘에 기반할 수 있다. At this time, the algorithm is one or more of K-nearset Neihbor (KNN), Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Hidden Markov Model (HMM) Can be based on
예를 들어,도 22, 도 23에서, 구강설 센서(110)가 벡터량의 변화량 내지 각도 변화량을 파악하는 센서로 구동될 경우, 화자의 발화를 측정함으로써 벡터량의 변화량, 각도 변화량을 파악하고, 이를 통해 고설성(Tongue Height)과 전설성(Tongue Frontness)을 가지는 모음 [i]으로 인지한다. For example, in FIGS. 22 and 23, when the oral cavity sensor 110 is driven by a sensor that grasps the change amount or the angle change amount of the vector amount, the change in the vector amount and the angle change amount are determined by measuring the utterance of the speaker. It is recognized as a vowel [i] having Tongue Height and Tongue Frontness.
또한, 구강설 센서(110)가 압전신호 내지 마찰전기신호의 원리로 구동되는 센서일 경우, 압전에 따른 전기 신호 변화와 구강설 센서(110)와 구강 내외부의 조음기관과 근접 내지 마찰하여 발생하는 마찰전기신호를 파악하여 고설성과 전설성을 가지는 모음 [i]으로 인지한다. In addition, when the oral tongue sensor 110 is a sensor driven on the basis of piezoelectric signals or triboelectric signals, the oral culprit sensor 110 may be caused by the change of the electrical signal due to piezoelectricity and the proximity or friction between the oral tongue sensor 110 and the internal and external articulation organs. The triboelectric signal is identified and recognized as a vowel [i] with dullness and legend.
모음 [u]의 경우에도 같은 원리들을 기반으로, 고설성(Tongue Height: High)과 후설성(Backness)를 측정하여 해당 모음으로 파악하게 된다. []의 경우에도 같은 원리들을 기반으로, 저설성(Tongue Height: Low)r과 전설성(Tongue Frontness)를 측정하여 해당 모음으로 파악한다. In the case of the vowel [u], based on the same principles, the height of the tongue and the backness of the vowel are measured to identify the vowel. In the case of [], based on the same principle, it measures the Tongue Height (Low) r and the Tongue Frontness as the corresponding collection.
도 23에서, 구강설 센서(110)는 화자의 발화에 따라 발생한 [i], [u], []과 같은 모음을 발화 특징(220)으로 측정한다. 이러한 모음의 발화 특징(220)은 데이터베이스부(350)의 자모음의 음소 단위 색인(361)에 대응한다. In FIG. 23, the oral cavity sensor 110 measures vowels such as [i], [u], and [] generated by the utterance of the speaker as the utterance feature 220. This vowel speech feature 220 corresponds to the phoneme unit index 361 of the consonant of the database 350.
도 24는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Alveolar Stop을 활용하는 경우를 도시한 도면이다.FIG. 24 is a diagram illustrating a case in which the data analysis unit of the speech intention expression system according to the second embodiment of the present invention utilizes Alveolar Stop.
도 24에 도시한 바와 같이, 구강설 센서(110)는 화자에 의해 발화된 특정 자음을 발화 특징(220)으로 측정한다. 이러한 자음의 발화 특징(220)은 데이터베이스부(350)의 자모음의 음소 단위 색인(361)에 대응되며, 이를 데이터해석부(200)가 언어데이터(310)인 Alveolar Stop으로 파악한다.As shown in FIG. 24, the oral cavity sensor 110 measures the specific consonant uttered by the speaker as the utterance feature 220. The consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes Alveolar Stop as the language data 310.
도 25는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Bilabial Stop을 활용하는 경우를 도시한 도면이다.FIG. 25 is a diagram illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment uses Bilabial Stop.
도 25에 도시한 바와 같이, 구강설 센서(110)와 안면 센서(120)는 화자에 의해 발화된 특정 자음을 발화 특징(220)으로 측정한다. 이러한 자음의 발화 특징(220)은 데이터베이스부(350)의 자모음의 음소 단위 색인(361)에 대응되며, 이를 데이터해석부(200)가 언어데이터(310)인 Bilabial Stop으로 파악한다.As shown in FIG. 25, the oral cavity sensor 110 and the face sensor 120 measure the specific consonant uttered by the speaker as the utterance feature 220. The consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes the bilabial stop as the language data 310.
도 26은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Voiced Bilabial Stop을 활용한 실험 결과를 도시한 도면이다.FIG. 26 is a diagram illustrating an experiment result using a voiced bilabial stop of a data interpreter of a speech intent expression system according to a second embodiment of the present invention.
도 26에 도시한 바와 같이, 구강설 센서(110)와 안면 센서(120)는 화자에 의해 발화된 특정 자음을 발화 특징(220)으로 측정한다. 이러한 자음의 발화 특징(220)은 데이터베이스부(350)의 자모음의 음소 단위 색인(361)에 대응되며, 이를 데이터해석부(200)가 언어데이터(310)인 Voiced Bilabial Stop인 /버/와 Voiceless Bilabial Stop인 /퍼/로 파악하였다.As shown in FIG. 26, the oral tongue sensor 110 and the face sensor 120 measure the specific consonant uttered by the speaker as the utterance feature 220. The consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 is a voiced bilabial stop, which is language data 310, and / or /. The voiceless bilabial stop was identified as / per /.
도 27 및 도 28은 각각 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 Voiced Labiodental Fricative를 활용하는 경우를 도시한 도면이다.27 and 28 are diagrams illustrating a case in which the data interpreter of the speech intent representation system according to the second embodiment of the present invention utilizes Voiced Labiodental Fricative.
도 27 및 도 28에 도시한 바와 같이, 구강설 센서(110), 안면 센서(120), 음성취득센서(130). 성대센서(140), 치아센서(150)은 화자에 의해 발화된 특정 자음을 발화 특징(220)으로 측정한다. 이러한 자음의 발화 특징(220)은 데이터베이스부(350)의 자모음의 음소 단위 색인(361)에 대응되며, 이를 데이터해석부(200)가 언어데이터(310)인 Voiced Labiodental Fricative로 파악한다.As shown in Figure 27 and Figure 28, oral tongue sensor 110, facial sensor 120, voice acquisition sensor 130. The vocal cord sensor 140 and the dental sensor 150 measure the specific consonant uttered by the speaker as the utterance feature 220. The consonant utterance feature 220 of the consonant corresponds to the phoneme unit index 361 of the consonant of the database unit 350, and the data interpreter 200 recognizes the voiced Labiodental Fricative as the language data 310.
도 29는 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부 및 데이터베이스의 연동을 도시한 도면이다.FIG. 29 is a diagram illustrating interworking between a data interpreter and a database of a speech intention expression system according to a second embodiment of the present invention.
도 29에 도시한 바와 같이, 촬상 센서(160)는 화자가 구강설 센서(110), 안면 센서(120), 음성취득센서(130). 성대센서(140), 치아센서(150) 중 하나이상을 사용하는 상황에서 발화할 시에 발생하는 두경부의 조음기관 변화 정보(161), 두경부 표정 변화 정보(162), 비언어적 표현 정보(163)를 음성데이터(310)로 인지하여 처리한다. As shown in FIG. 29, the imaging sensor 160 includes a speaker having an oral cavity sensor 110, a face sensor 120, and a voice acquisition sensor 130. Articulation change information 161 of the head and neck, head and neck facial expression change information 162, and non-verbal expression information 163 that occur when vocalization occurs in a situation where one or more of the vocal cord sensor 140 and the dental sensor 150 are used. The voice data 310 is recognized and processed.
특히, 두경부의 일면에 위치한 안면 센서는 레퍼런스 센서(121)를 기준으로 양극 센서(122)와 음극 센서(123)의 전위차를 가지고 그 자체 위치를 제공하며, 이는 촬상 센서(160)가 촬상함으로써 파악되는 물리적인 두경부의 조음기관 변화 정보(161), 두경부 표정 변화 정보(162), 비언어적 표현 정보(163)와 함께 언어데이터(310)로 데이터변환부(300)에 전달된다.In particular, the face sensor located on one surface of the head and neck has its own position with the potential difference between the anode sensor 122 and the cathode sensor 123 relative to the reference sensor 121, which is captured by the imaging sensor 160 by imaging. The physical head and neck joint articulation change information 161, head and neck facial expression change information 162, and non-verbal expression information 163 are transmitted to the data converter 300 as language data 310.
도 30은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터해석부가 특정 단어로 파악하는 경우를 도시한 도면이다.FIG. 30 is a diagram illustrating a case in which a data interpreter of a speech intention expression system according to a second embodiment of the present invention recognizes a specific word.
도 30에 도시한 바와 같이, 구강설 센서(110), 안면 센서(120), 음성취득센서(130), 성대센서(140), 치아센서(150)가 화자에 의해 발화된 특정 자음과 모음을 측정하고, 데이터해석부(200)는 자음과 모음을 발화 특징(220)으로 파악한다. 이러한 각 자모음의 발화 특징(220)인 [b], [i], [f]는 데이터베이스부(350)의 자모음의 음소 단위 색인(361)에 각각 대응되며, 데이터해석부가 이를 /beef/ 내지 [bif]라는 단어로 파악한다.As shown in FIG. 30, the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 make a specific consonant and a vowel spoken by the speaker. The measurement and data analysis unit 200 recognizes consonants and vowels as speech features 220. The speech features 220, [b], [i], and [f] of the respective consonants correspond to phoneme unit indexes 361 of the consonants of the database unit 350, respectively, and the data interpreter determines this as / beef /. To [bif].
도 31은 본 발명의 제2실시예에 따른 발화 의도 표현 시스템의 데이터베이스부를 도시한 도면이다.31 is a diagram showing a database unit of a speech intent representation system according to a second embodiment of the present invention.
도 31에 도시한 바와 같이, 데이터베이스부(350)의 언어데이터 색인(360)은 자모음의 음소단위 색인(361), 음절 단위 색인(362), 단어 단위 색인(363), 구절 단위 색인(364), 문장 단위 색인(365), 연속 발화 색인(366), 발음의 고저 색인(367)으로 구성된다.As shown in FIG. 31, the language data index 360 of the database unit 350 includes a phoneme unit index 361, a syllable unit index 362, a word unit index 363, and a phrase unit index 364 of a consonant. ), A sentence unit index 365, a continuous speech index 366, and a high and low index 367 of the pronunciation.
도 32는 본 발명의 제3실시예에 따른 발화 의도 표현 시스템을 도시한 도면이다.32 is a diagram illustrating a speech intention expression system according to a third embodiment of the present invention.
도 32에 도시한 바와 같이, 데이터해석부(200)와 데이터표현부(도 34의 500) 중 하나 이상이 외부에 위치하여 작동할 경우, 연동되어 통신할 수 있는 통신부(400)를 포함된다. 통신부(400)는, 유선 및 무선으로 구현되며, 무선의 경우 블루투스, 와이파이, 3G, 4G, NFC 등 다양한 방법이 사용될 수 있다.As shown in FIG. 32, when one or more of the data analysis unit 200 and the data expression unit (500 of FIG. 34) are located outside and operate, the communication unit 400 may communicate with each other. The communication unit 400 is implemented by wire and wireless, and in the case of wireless, various methods such as Bluetooth, Wi-Fi, 3G, 4G, and NFC may be used.
도 33 및 도 34는 각각 본 발명의 제3실시예에 따른 발화 의도 표현 시스템의 데이터베이스부의 실제 형태를 도시한 도면이다.33 and 34 are diagrams each showing an actual form of the database unit of the speech intent representation system according to the third embodiment of the present invention.
도 33 및 도 34에 도시한 바와 같이, 데이터해석부(200)와 연동되는 데이터베이스부(350)는 언어데이터 색인을 가지고서 실제 발화에 따른 발화 특징(220), 화자의 음성(230), 발화 내역 정보(240), 발화 변이(250)을 언어데이터(310)으로 파악한다. As shown in FIGS. 33 and 34, the database unit 350 linked to the data analysis unit 200 has a language data index and has a speech feature 220 according to the actual speech, the speaker's voice 230, and the speech details. The information 240 and the speech variation 250 are identified as the language data 310.
도 33은, 도 23의 High Front tense Vowel과 High Back tense Vowel, 도 24의 Alveolar Sounds, 도 27의 Voiceless labiodental fricative를 포함하는 다양한 자모음 발화 특징을 센서부(100)가 측정하고 데이터해석부(200)가 반영한 데이터베이스부(350)의 실제 데이터이다. FIG. 33 is a diagram illustrating various consonant speech features including the high front tense vowel and the high back tense vowel of FIG. 23, the alveolar sounds of FIG. 24, and the voiceless labiodental fricative of FIG. 27. The actual data of the database unit 350 reflected by the 200 is shown.
도 34는, 도 23의 High Front lax Vowel, 도 24의 Alveolar Sounds, 도 25의 Bilabial Stop Sounds를 포함하는 다양한 자모음 발화 특징을 센서부(100)가 측정하고 데이터해석부(200)가 반영한 데이터베이스부(350)의 실제 데이터이다.FIG. 34 illustrates a database in which the sensor unit 100 measures various consonant utterance characteristics including the high front lax vowel of FIG. 23, the alveolar sounds of FIG. 24, and the bilabial stop sounds of FIG. 25, and the data interpreter 200 reflects. Actual data of the unit 350.
도 35는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템을 도시한 도면이고, 도 36은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 센서부, 데이터해석부, 데이터표현부 및 데이터베이스부의 연동을 도시한 도면이다.35 is a diagram illustrating a speech intent expression system according to a fourth embodiment of the present invention, and FIG. 36 is a sensor part, a data interpreter, a data expression part, and a speech intent expression system according to a fourth embodiment of the present invention. It is a figure which shows the interlocking of a database part.
도 35에 도시한 바와 같이, 본 발명의 제4실시예에 따른 발화 의도 표현 시스템은 유기적으로 연동되어 작동하는 센서부(100), 데이터해석부(200), 데이터변환부(300), 데이터베이스부(350) 및 데이터표현부(500)를 포함한다. As shown in FIG. 35, the speech intention expression system according to the fourth embodiment of the present invention includes a sensor unit 100, a data analysis unit 200, a data conversion unit 300, and a database unit which operate organically in cooperation. 350 and the data expression unit 500.
도 36에 도시한 바와 같이, 센서부(100)가 실제 조음기관에 위치하여 화자의 발화에 따른 조음기관 물리특성을 측정하고 이를 데이터해석부(200)로 전달하고 데이터해석부(200)는 이를 언어데이터로 해석한다. 해석된 언어데이터는 데이터표현부(500)로 전달되며, 그 언어데이터의 해석 과정과 표현 과정에서 데이터베이스부(350)가 연동되어 작동함을 알 수 있다.As shown in FIG. 36, the sensor unit 100 is located in an actual articulation engine, measures physical characteristics of the articulation engine according to the utterance of the speaker, and transmits the physical characteristics to the data analysis unit 200, which the data analysis unit 200 performs. Interpret as language data. The interpreted language data is transmitted to the data expression unit 500, and it can be seen that the database unit 350 works in conjunction with the interpretation process and the expression process of the language data.
도 37 내지 도 41은 각각 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 표현하는 수단을 도시한 도면이다.37 to 41 are diagrams showing means for expressing language data by the data expression unit of the speech intention expression system according to the fourth embodiment of the present invention.
도 37 내지 도 41에 도시한 바와 같이, 센서부(100)에 의해 획득된 화자의 두경부 조음기관의 물리특성은 데이터해석부(200)를 통해 센서부의 위치(210), 발화 특징(220), 화자의 음성(230), 발화 내역 정보(240), 발화 변이(250)로 파악된다. 37 to 41, the physical characteristics of the head and neck articulator of the speaker obtained by the sensor unit 100 is the position of the sensor unit 210, the ignition feature 220, through the data analysis unit 200, The speaker 230 is identified as the speaker 230, the speech detail information 240, and the speech variation 250.
촬상센서(160)는 화자의 두경부 조음기관의 외관상 변화를 촬상하고, 데이터해석부(200)는 이를 통해 화자의 두경부 조음기관의 변화 정보(161), 두경부 표정 변화 정보(162)를 파악한다. The imaging sensor 160 captures an appearance change of the speaker's head and neck articulation organ, and the data interpreter 200 recognizes the change information 161 and the head and neck facial expression change information 162 of the speaker's head and neck articulation organ.
이후, 이러한 정보들은 데이터해석부(200)에서 언어데이터(310)로 변환되며, 데이터표현부(500)에서 외부로 표현된다. Then, such information is converted into language data 310 in the data analysis unit 200, it is represented to the outside in the data expression unit 500.
이때, 도 37은 언어데이터(310)를 데이터표현부(500)가 청각적으로 표현하는 것을 나타낸 것이고, 도 38은 데이터표현부(500)가 언어데이터(310)를 시각적으로 표현함에 있어서, 데이터해석부(200)가 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 언어데이터 색인(360)과 비교하여, 실제 표준 발음의 광역표기 (broad description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나이상을 측정한 수치를 함께 제공하는 것을 나타낸 것이다. In this case, FIG. 37 illustrates that the data expression unit 500 acoustically expresses the language data 310, and FIG. 38 illustrates that the data expression unit 500 visually expresses the language data 310. The physical characteristics of the speaker's articulation organ measured by the analyzing unit 200 are compared with the language data index 360 of the database unit 350, and the accent noon and the like are accompanied by a broad description of the actual standard pronunciation. It is provided to provide a numerical value measured at least one of proximity, intention to ignite.
도 39는 데이터표현부(500)가 언어데이터(310)를 시각과 청각적으로 표현함에 있어서, 데이터해석부(200)가 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 언어데이터 색인(360)과 비교하여, 실제 표준 발음의 미세표기(narrow description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나이상을 측정한 수치를 함께 제공하는 것을 나타낸 것이다.39 illustrates that the data expression unit 500 expresses the language data 310 visually and aurally, and indicates the physical characteristics of the speaker's articulation organ measured by the data interpreter 200. Compared with the index 360, it provides a narrow description of the actual standard pronunciation along with a measure of one or more of stressed noon, pseudo-proximity, and speech intent.
도 40은 데이터표현부(500)가 언어데이터(310)를 시각적으로 표현함에 있어서, 데이터해석부(200)가 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 언어데이터 색인(360)과 비교하여, 실제 표준 발음의 미세표기(narrow description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나 이상을 측정한 수치, 그리고 해당 언어데이터(310)가 단어로서 단어 단위 색인(363)에 대응할 경우, 그에 해당하는 이미지를 함께 제공하는 것을 나타낸 것이다. 40 illustrates that the data expressing unit 500 visually expresses the language data 310, the physical characteristics of the speaker's articulation organ measured by the data analyzing unit 200 are indexed by the language unit 360 of the database unit 350. ), Along with a narrow description of the actual standard pronunciation, a number of measurements of one or more of stressed noon, pseudo-proximity, or speech intent, and the corresponding language data 310 is a word-by-word index as a word ( 363), the corresponding image is provided together with the corresponding image.
도 41은 데이터표현부(500)가 언어데이터(310)를 시각과 청각적으로 표현함에 있어서, 데이터해석부(200)가 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 언어데이터 색인(360)과 비교하여, 실제 표준 발음의 미세표기(narrow description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나이상을 측정한 수치를 제공하고 화자에 의한 언어데이터(310)를 교정하고 강화할 수 있도록 해당 발음을 발화할 수 있는 발화 교정 이미지를 함께 제공하는 것을 나타낸 것이다. 41 illustrates that the data expression unit 500 expresses the language data 310 visually and aurally, and indicates the physical characteristics of the speaker's articulation organ measured by the data interpreter 200. Compared with the index 360, a numerical description of one or more of stressed noon, pseudo-proximity, and speech intent, together with a narrow description of the actual standard pronunciation, is provided. It shows that the speech correction image is provided together to utter the pronunciation so that it can be corrected and strengthened.
도 42는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 시각적 및 청각적으로 표현하는 경우를 도시한 도면이다.FIG. 42 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually and acoustically.
도 42에 도시한 바와 같이, 데이터표현부(500)가 언어데이터(310)를 문자로 시각화하고 소리로 청각화하여 제공함에 있어서, 데이터해석부(200)가 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 자모음 음소 단위 색인(361), 음절단위 색인(362), 단어단위 색인(363), 구절 단위 색인(364), 문장 단위 색인(365) 중 하나이상의 언어데이터 색인(360)과 비교한다. As shown in FIG. 42, the data expression unit 500 visualizes the language data 310 by text and provides audio by audio, so that the physical characteristics of the speaker's articulation engine measured by the data interpreter 200 are measured. A language data index of one or more of the consonant phoneme unit index 361, the syllable unit index 362, the word unit index 363, the phrase unit index 364, and the sentence unit index 365 of the database unit 350. 360).
이러한 언어 데이터(310)를 데이터표현부(500)가 화자의 언어데이터(310)에 관련된 실제 표준 발음의 미세표기(narrow description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나이상을 측정한 문자와 소리로 제공하여 화자가 언어데이터(310)를 교정하고 강화할 수 있도록 돕는다.The linguistic data 310 may include one or more of stressed noon, pseudo-proximity, and speech intent, together with a narrow description of the actual standard pronunciation associated with the speaker's linguistic data 310. By providing the measured letters and sounds to help the speaker to correct and reinforce the language data (310).
도 43은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 시각적으로 표현하는 경우를 도시한 도면이다.FIG. 43 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually.
도 43에 도시한 바와 같이, 데이터표현부(500)가 언어데이터(310)를 문자, 그림, 사진, 영상 중 하나이상으로 시각화하여 제공한다. As shown in FIG. 43, the data expression unit 500 visualizes and provides the language data 310 as one or more of a text, a picture, a picture, and an image.
이때, 데이터해석부(200)는 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 자모음 음소 단위 색인(361), 음절단위 색인(362), 단어단위 색인(363), 구절 단위 색인(364), 문장 단위 색인(365) 중 하나 이상의 언어데이터 색인(360)과 비교한다. At this time, the data analysis unit 200 measures the physics characteristics of the speaker's articulation organ by using the consonant phoneme unit index 361, syllable unit index 362, word unit index 363, and phrase unit of the database unit 350. The language data index 360 is compared to one or more of the index 364 and the sentence unit index 365.
더불어, 문자로 시각화 될 경우, 실제 표준 발음의 미세표기(narrow description)와 광역표기 (broad description)를 모두 제공한다. 이를 통해 언어 데이터(310)를 데이터표현부(500)가 화자의 언어데이터(310)에 관련된 실제 표준 발음의 미세표기(narrow description) 및 광역표기 (broad description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나이상을 측정한 문자와 소리로 제공하여 화자가 언어데이터(310)를 교정하고 강화할 수 있도록 돕는다.In addition, when visualized as a letter, both a narrow description and a broad description of the actual standard pronunciation are provided. Through this, the language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310. In addition, by providing at least one of the intention to speak the measured character and sound to help the speaker to correct and reinforce the language data (310).
도 44는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 시각적으로 표현하는 경우를 도시한 도면이다. FIG. 44 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data visually.
도 44에 도시한 바와 같이, 데이터표현부(500)가 언어데이터(310)를 문자로 시각화하여 제공함에 있어서, 데이터해석부(200)가 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 자모음 음소 단위 색인(361), 음절단위 색인(362), 단어단위 색인(363), 구절 단위 색인(364), 문장 단위 색인(365), 연속발화색인(366) 중 하나 이상의 언어데이터 색인(360)과 비교한다. 이러한 언어 데이터(310)를 데이터표현부(500)가 화자의 언어데이터(310)에 관련된 실제 표준 발음의 미세표기(narrow description) 및 광역표기 (broad description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나이상을 측정한 연속 발화 단위의 문자와 소리로 제공하여 화자가 언어데이터(310)를 교정하고 강화할 수 있도록 돕는다.As shown in FIG. 44, when the data expression unit 500 visualizes and provides the language data 310 as a character, the database unit 350 displays the physical characteristics of the speaker's articulation organ measured by the data analysis unit 200. One or more of the phoneme unit index (361), syllable unit index (362), word unit index (363), phrase unit index (364), sentence unit index (365), continuous speech index (366) Compare with index 360. The language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310. In addition, by providing a character and a sound of a continuous speech unit measured at least one of the speech intent to help the speaker to correct and reinforce the language data (310).
도 45는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템의 데이터표현부가 언어데이터를 연속 발화 단위로 표현하는 경우를 도시한 도면이다.45 is a diagram illustrating a case in which the data expression unit of the speech intention expression system according to the fourth embodiment expresses language data in units of continuous speech.
도 45에 도시한 바와 같이, 데이터표현부(500)가 언어데이터(310)를 문자로 시각화하고 소리로 청각화하여 제공함에 있어서, 데이터해석부(200)가 측정한 화자의 조음기관의 물리특성을 데이터베이스부(350)의 자모음 음소 단위 색인(361), 음절단위 색인(362), 단어단위 색인(363), 구절 단위 색인(364), 문장 단위 색인(365), 연속발화색인(366), 발음의 고저 색인(367) 중 하나 이상의 언어데이터 색인(360)과 비교한다. 이러한 언어 데이터(310)를 데이터표현부(500)가 화자의 언어데이터(310)에 관련된 실제 표준 발음의 미세표기(narrow description) 및 광역표기 (broad description)와 함께 강세의 정오도, 유사근접도, 발화 의도 중 하나이상을 측정한 문자와 소리로 제공하여 화자가 언어데이터(310)를 교정하고 강화할 수 있도록 돕는다.As shown in FIG. 45, when the data expression unit 500 visualizes the language data 310 by text and auditoryizes it with sound, the physical characteristics of the speaker's articulation organ measured by the data analysis unit 200 are measured. Consonant phoneme unit index (361), syllable unit index (362), word unit index (363), phrase unit index (364), sentence unit index (365), and continuous speech index (366) of the database unit 350. Compare the linguistic data index 360 with one or more of the phonetic high and low indexes 367. The language data 310 is used by the data expression unit 500 along with a narrow description and broad description of the actual standard pronunciation related to the speaker's language data 310. In order to help the speaker correct and reinforce the language data 310, the speaker may provide one or more of the utterance intentions as measured letters and sounds.
도 46은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 활용하는 Confusion Matrix를 도시한 도면이고, 도 47은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 활용하는 Confusion Matrix를 백분율로 도시한 도면이다.46 is a diagram illustrating a confusion matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention, and FIG. 47 is a percentage of the confusion matrix utilized by the speech intent representation system according to the fourth embodiment of the present invention. FIG.
도 46 및 도 47에 도시한 바와 같이, 데이터해석부(200)는 언어데이터(310)를 파악함에 있어서, Time Domain의 Variance, 주파수 영역의 Cepstral Coefficient, Linear Predict Coding Coefficient를 사용하는 하나 이상의 특징 추출 알고리즘을 대표하여 사용한다. 46 and 47, the data interpreter 200 extracts one or more features using time domain variance, frequency domain Cepstral Coefficient, and linear predictive coding coefficient in identifying the language data 310. Represents an algorithm.
데이터의 분산 정도를 나타내는 Variance는 다음 [수학 식1]에 따라 계산된다. 여기서, n은 모집단의 네트워크,
Figure PCTKR2018004325-appb-I000001
는 수집된 조음기관 물리 특성인 데이터의 모집단의 평균, xi는 수집된 조음기관 물리 특성인 데이터들을 나타낸다.
Variance representing the degree of variance of the data is calculated according to Equation 1 below. Where n is the network of population,
Figure PCTKR2018004325-appb-I000001
Is the mean of a population of data that is the articulator physical characteristics collected, and x i represents the data of the articulator physical characteristics collected.
[수학식1][Equation 1]
Figure PCTKR2018004325-appb-I000002
Figure PCTKR2018004325-appb-I000002
Cepstral Coefficient는 주파수의 세기를 정형화하기 위해 다음 [수학식 2]로 계산된다. 여기서, F-1는 역 푸리에급수 변환인 Inverse Fourrier Transform을 나타내고, X(f)는 신호에 대한 주파수의 스펙트럼을 나타낸다. 본 예시에서는 Cepstral의 Cofficent는 n=0일 때의 값을 활용하였다.Cepstral Coefficient is calculated by the following Equation 2 to formulate the strength of the frequency. Here, F- 1 represents an Inverse Fourrier Transform, which is an Inverse Fourier series transform, and X (f) represents a spectrum of frequencies for a signal. In this example, Cofficent of Cepstral utilizes the value when n = 0.
[수학식2][Equation 2]
Figure PCTKR2018004325-appb-I000003
Figure PCTKR2018004325-appb-I000003
Linear Predict Coding Coefficient는 주파수의 선형적 특성을 나타내는 것으로 다음[수학식 3]에 따라 계산된다. 여기서, n은 표본의 개수를 나타내며, ai는 Linear Predict Coding Coefficient 계수이다. Cepstral의 계수는 n=1일때의 값을 사용하였다.Linear Predict Coding Coefficient represents the linear characteristic of frequency and is calculated according to Equation 3 below. Where n represents the number of samples and a i is a Linear Predict Coding Coefficient coefficient. As the coefficient of Cepstral, the value when n = 1 was used.
[수학식 3][Equation 3]
Figure PCTKR2018004325-appb-I000004
Figure PCTKR2018004325-appb-I000004
더불어, 조음기관 물리 특성인 데이터를 유사도에 따라 묶고 예측데이터를 생성하여 각 데이터를 분류하는 ANN을 활용하였다. 이를 통해, 화자가 최초 발화에 대해 표준 발화에 대비하여 본인 발화 내용의 정오도, 근접유사도, 발화 의도를 파악할 수 있게 된다. 이를 바탕으로 화자는 자신에 발화 내용에 대한 피드백을 얻고 지속적으로 발화 교정을 위한 재발화를 실시한다. 이러한 반복적 조음기관 물리특성 데이터 입력 방식을 통해 많은 조음기관 물리 특성 데이터가 모이고 ANN의 정확도가 증가한다. In addition, ANN was used to classify each data by grouping the data of the articulator physical properties according to similarity and generating prediction data. Through this, the speaker can grasp the noon, proximity similarity, and intention of the utterance of his utterance in preparation for the standard utterance. Based on this, the speaker gets feedback on the contents of the utterance and continuously re-ignites the utterance. Through this repetitive articulation data input method, a large amount of articulation data of physical arts is gathered and the accuracy of ANN is increased.
여기서, 입력 데이터인 조음기관 물리 특성을 10개의 자음으로 선정하였고, 추출 과정에서 5개의 조음위치인 Bilabial, Alveolar, Palatal, Velar, Glottal로 분류되었다. 이를 위해, 상기 5개의 조음위치에 해당하는 10개의 자음을 순서대로 100번씩, 총 1000번 무작위로 50번씩 총 500번 발음을 하였다. Here, the physical properties of the articulation organs, which are input data, were selected as 10 consonants, and classified into 5 articulation positions, Bilabial, Alveolar, Palatal, Velar, and Glottal. To this end, 10 consonants corresponding to the five articulation positions were pronounced 100 times in order, 1000 times in total, 50 times in total, and 500 times in total.
이에 따라, 도 46에 도시한 바와 같이, 자음 분류를 위한 Confusion Matrix가 형성되었다. 이를 기반으로 각 조음위치마다 발화되는 자음의 개수가 상이하다는 것을 고려하여, 도 47에 도시한 바와 같이, 백분율로 나타내었다. As a result, as shown in FIG. 46, a confusion matrix for consonant classification was formed. Based on this, considering that the number of consonants spoken in each of the articulation positions is different, as shown in FIG.
이를 통해, 화자는 표준 발화와 대비하여 발음의 정오도 및 근접유사도가 낮은 Palatal과 관련하여, 자음을 제대로 발화하지 못함을 알 수 있다. 또한 도 46에 도시한 바와 같이, Palatal과 관련된 자음을 발화하고자 하였으나 Alveolar와 관련된 자음으로 잘못 발화한 경우는 17%이다. 이는 화자가 Palatal과 관련된 자음과 Alveolar와 관련된 자음 간의 차이를 명확히 인지하지 못함을 의미한다. Through this, it can be seen that the speaker cannot utter consonants properly with respect to Palatal, which has low noon and close similarity as compared to standard speech. In addition, as shown in FIG. 46, 17% of cases attempt to utter consonants related to Palatal but erroneously utter consonants related to Alveolar. This means that the speaker does not clearly recognize the difference between the consonant associated with Palatal and the consonant associated with Alveolar.
도 48은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 화면을 통해 화자로 하여금 언어 교정 및 지도를 돕는 경우를 도시한 도면이다.48 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention assists a speaker in language correction and guidance through a screen.
도 48에 도시한 바와 같이, 영어를 모국어로 하지 않는 한국인 화자는 [kiŋ]을 의도하고서, 발화하였고 센서부(100)는 상기 발화에 따른 조음기관 물리 특성을 파악한다. As shown in FIG. 48, a Korean speaker who does not speak English as a native language utters [kiŋ], and the sensor unit 100 grasps the physical characteristics of the articulation organ according to the utterance.
그러나, 화자의 경우, 한국어에 존재하지 않는 Velar Nasal Sound인 [ŋ]에 대해 조음과 발화 방법에 대해 미숙하였다. However, in the case of the speaker, the method of articulation and speech was inexperienced for [ŋ], a Velar Nasal Sound that does not exist in Korean.
이에 데이터해석부(200)는 화자가 제대로 발화하지 못한 [ŋ]를 표준 발화 특징 행렬(205)과의 비교를 통해 파악한다. 이후, 데이터표현부(300)은 화자의 발화 정오도, 유사도를 제공하였고, 결과는 46%에 그쳤다. 이후, 데이터표현부(300)는 화면 등을 통해, 화자로 하여금 [kiŋ]을 정확히 발음할 수 있도록 돕는다. In response, the data interpreter 200 recognizes that the speaker does not speak properly [ŋ] through comparison with the standard speech feature matrix 205. Then, the data expression unit 300 provided the speaker's utterance noon and similarity, and the result was only 46%. Then, the data expression unit 300 helps the speaker to pronounce [kiŋ] correctly through the screen.
이때, 데이터표현부(300)는 화자가 어느 조음 기관을 어떻게 조작해야 하는지 직관적으로 보여주기 위해 Speech Guidance(Image)를 제공한다. 데이터표현부(300)가 제시하는 Speech Guidance(Image)는 상기 [ŋ]을 발화하기 위한 조음기관에 부착되거나 인접한 센서부를 기반으로 발화 교정 및 안내를 실시한다. 예를 들어, 상기 [kiŋ]의 경우, [k]은 혀의 뒷부분(Tongue Body, Root)을 Velar(연구개) 방향으로 승강시켜 붙이고 유격시키면서 파열음을 내고, 성대의 떨림 없이 무성음으로, 입을 통해 /크/로 발화해야 한다. In this case, the data expression unit 300 provides Speech Guidance (Image) to intuitively show which articulation organs the speaker should manipulate. Speech Guidance (Image) presented by the data expression unit 300 performs utterance correction and guidance based on the sensor unit attached to or adjacent to the articulator for ignition of the [ŋ]. For example, in the case of [kiŋ], [k] raises the tongue of the tongue (Tongue Body, Root) in the direction of the Velar (drug) and makes a rupture sound while making a play. Fire at / k /
이때, 혀의 뒷부분이 Velar(연구개) 방향으로 승강시키고 붙였다 유격되는 파열음을 내는 것은 구강설 센서(110)가 측정하게 된다. [i]의 경우에는, 전설 고설 긴장 모음(High Front Tense Vowel)임으로, 이 역시 구강설 센서(110)가 혀의 고설성(Hight)과 전설성(Frontness)을 파악한다. 더불어, [i]를 발화할 때, 입술의 양 끝이 양 볼로 각 각 당겨지는 현상이 발생한다. 이를 상기 안면 센서(120)가 파악하게 된다. [ŋ]의 경우에는, 혀의 뒷부분(Tongue Body, Tongue Root)를 Velar(연구개) 방향으로 승강시키고 코를 울려 발화해야 한다. 그렇기에 역시 구강설 센서(110)가 혀의 고저설성 및 전후설성을 파악한다. At this time, the rear part of the tongue raises and attaches in the direction of Velar (stud) and emits a rupture sound that is spaced apart by the oral tongue sensor 110. In the case of [i], since it is a legendary high front tension vowel, the oral tongue sensor 110 also detects the tongue's height and frontness. In addition, when igniting [i], both ends of the lips are pulled to both cheeks. The face sensor 120 will grasp this. In the case of [ŋ], the back of the tongue (Tongue Body, Tongue Root) must be lifted in the direction of Velar and snorted to ignite. Therefore, the oral tongue sensor 110 also grasps the high hypothermia and the anteroposterior tongue of the tongue.
더불어, 상기 발음은 비음이기에 코와 그 주변의 근육이 떨리게 된다. 이러한 현상은 상기 안면 센서(120)가 코 주변에 부착됨으로써 파악될 수 있다.In addition, since the pronunciation is nasal sound, the muscles around the nose and its surroundings tremble. This phenomenon can be identified by attaching the face sensor 120 around the nose.
도 49는 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 두경부 조음기관의 외상을 촬상하고 파악하는 경우를 도시한 도면이다.FIG. 49 is a diagram illustrating a case in which a speech intention expression system according to a fourth embodiment of the present invention captures and captures an image of a head and neck articulation organ.
도 49에 도시한 바와 같이, 촬상 센서(160)는 발화에 따른 화자의 두경부 조음기관의 외관상 변화를 촬상하고, 데이터해석부(200)는 이를 통해 화자의 두경부 조음기관의 변화 정보(161), 두경부 표정 변화 정보(162)를 파악한다. 이때 센서부(100)의 구강설 센서(110), 안면센서(120), 음성 취득 센서(130), 성대 센서(140), 치아센서(150)을 통해 파악된 화자의 발화 특징(210)도 함께 데이터해석부(200)가 고려하게 된다. As shown in FIG. 49, the imaging sensor 160 captures an appearance change of the speaker's head and neck articulation engine according to the utterance, and the data interpreter 200 changes information 161 of the speaker's head and neck articulation engine through this. The head and neck facial expression change information 162 is grasped. In this case, the speaker's speech characteristics 210 identified through the oral tongue sensor 110, the face sensor 120, the voice acquisition sensor 130, the vocal cord sensor 140, and the dental sensor 150 of the sensor unit 100 are also included. The data analysis unit 200 will be considered together.
도 50은 본 발명의 제4실시예에 따른 발화 의도 표현 시스템이 표준 발화 특징 행렬을 통해 상호 정보들을 결합시키는 경우를 도시한 도면이다. 50 is a diagram illustrating a case in which a speech intent representation system according to a fourth embodiment combines mutual information through a standard speech feature matrix.
도 50에 도시한 바와 같이, 센서부(100)의 구강설 센서(110), 안면센서(120), 음성 취득 센서(130), 성대 센서(140). 치아센서(150)는 화자의 발화 특징(210)을 파악하고, 촬상 센서(160)는 화자의 두경부 조음기관의 변화 정보(161), 두경부 표정 변화 정보(162)를 파악한다. 이를 통해, 데이터해석부(200)가 표준 발화 특징 행렬(205)을 기반으로 두경부 조음기관의 변화 정보(161), 두경부 표정 변화 정보(162)에 대응하는 발화 특징을 결합시킨다. As shown in FIG. 50, the oral cavity sensor 110, the face sensor 120, the voice acquisition sensor 130, and the vocal cord sensor 140 of the sensor unit 100. The dental sensor 150 grasps the speaker's speech characteristics 210, and the imaging sensor 160 grasps the change information 161 of the head and neck articulation organ and the head and neck facial expression change information 162. Through this, the data interpreter 200 combines the speech information corresponding to the change information 161 of the head and neck articulation organ and the head and neck facial expression change information 162 based on the standard speech feature matrix 205.
도 51은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템을 도시한 도면이다.51 is a diagram illustrating a speech intent expression system according to a fifth embodiment of the present invention.
도 51에 도시한 바와 같이, 센서부(100)에서 측정된 센서부의 위치(210)와 촬상센서(160)에서 획득한 두경부 표정 변화 정보(162)를 기반으로 데이터변환부(300)는 객체 두경부 데이터(320) 중 제 1 기저 데이터(211)를 생성한다. 데이터매칭부(600)는 제 1 기저 데이터(211)를 기반으로 영상 객체의 두경부(21) 내지 로봇 객체의 두경부(22) 중 하나 이상의 객체(20)에 객체 두경부 데이터를 매칭할 수 있는 매칭 위치(610) 중 정적 기초 좌표(611)를 생성하여 매칭한다. As shown in FIG. 51, based on the position 210 of the sensor unit measured by the sensor unit 100 and the head and neck facial expression change information 162 obtained by the imaging sensor 160, the data converter 300 may include an object head and neck unit. The first basis data 211 is generated among the data 320. The data matching unit 600 may match the head and neck data to one or more objects 20 of the head and neck 21 and the head and neck 22 of the image object based on the first basis data 211. Static base coordinates 611 are generated and matched out of 610.
더불어, 센서부(100)에서 측정된 화자(10)의 발화특징(220)과 상기 촬상센서(160)에 의해 획득한 조음기관의 변화 정보(161)와 두경부 표정 변화 정보(162)를 기반으로 데이터변환부(300)는 객체 두경부 데이터(320) 중 제 2 기저 데이터(221)를 생성한다. 데이터매칭부(600)는 제 2 기저 데이터(221)를 기반으로 영상 객체의 두경부(21) 내지 로봇 객체의 두경부(22) 중 하나 이상의 객체(20)가 발화함에 따라 변화하는 두경부의 동적 움직임을 구현하기 위해 동적 가변 좌표(621)를 생성하여 매칭한다.In addition, based on the utterance feature 220 of the speaker 10 measured by the sensor unit 100, the change information 161 of the articulation organ obtained by the imaging sensor 160, and the head and neck facial expression change information 162. The data converter 300 generates second basis data 221 of the head and neck data 320. The data matching unit 600 may change the dynamic movement of the head and neck part that changes as one or more objects 20 of the head and neck part 21 of the image object and the head and neck part 22 of the robot object ignite based on the second basis data 221. Dynamic variable coordinates 621 are generated and matched for implementation.
도 52는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 정적 기초 좌표를 기반으로 객체 두경부 데이터를 영상 객체의 두경부에 매칭하는 경우를 도시한 도면이고, 도 53은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 위치에 기반한 정적 기초 좌표를 도시한 도면이다.52 is a diagram illustrating a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on static basic coordinates, and FIG. 53 is a fifth embodiment of the present invention. FIG. 3 is a diagram illustrating static basic coordinates based on a position of a face sensor utilized by a speech intention representation system according to an example. FIG.
도 52 및 도 53에 도시한 바와 같이, 데이터매칭부(600)가 객체 두경부 데이터(320)를 영상 객체의 두경부(21)에 매칭하기 위해서, 화자의 두경부에 부착된 안면 센서(120)의 위치인 제 1 기저 데이터(211)를 활용하여 정적 기초 좌표(611)를 생성한다. 52 and 53, the position of the face sensor 120 attached to the head and neck portion of the speaker in order for the data matching unit 600 to match the head and neck data 320 to the head and neck portion 21 of the image object. The first basis data 211 is used to generate the static basis coordinates 611.
이때, 전술한 것처럼, 안면 센서(120)가 가지는 전위차를 활용하여 그 위치를 파악한다. 화자의 비발화 상태에서 부착된 상기 안면 센서(120)의 레퍼런스 센서(121), 양극 센서(122), 음극 센서(123)는 각각 (0.0)과 같은 기준 위치를 가지게 된다. 이러한 위치가 바로 정적 기초 좌표(611)가 된다. In this case, as described above, the position of the face sensor 120 is detected by using the potential difference. The reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's non-ignition state have a reference position equal to (0.0), respectively. This position is the static basic coordinates 611.
도 54는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 동적 기초 좌표를 기반으로 객체 두경부 데이터를 영상 객체의 두경부에 매칭하는 경우를 도시한 도면이고, 도 55는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 전압차에 기반한 동적 기초 좌표를 도시한 도면이다.FIG. 54 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to the head and neck of an image object based on dynamic basic coordinates, and FIG. 55 is a fifth embodiment of the present invention. FIG. 4 is a diagram illustrating dynamic basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an example. FIG.
도 54 및 도 55에 도시한 바와 같이, 데이터매칭부(600)는 객체 두경부 데이터(320)를 영상 객체의 두경부(21)에 매칭하기 위해서, 화자의 두경부에 부착되어 화자의 발화에 따른 두경부 근육의 작용에 의한 안면 센서(120)의 전위차인 제 2 기저 데이터(221)를 활용하여 동적 가변 좌표(621)를 생성한다. 54 and 55, the data matching unit 600 is attached to the head and neck of the speaker to match the object head and neck data 320 to the head and neck 21 of the image object, and thus the head and neck muscles according to the speaker's speech. The dynamic variable coordinates 621 are generated using the second basis data 221 which is the potential difference of the face sensor 120 by the operation of.
이때, 전술한 것처럼, 안면 센서(120)는 화자의 발화에 따라 움직이는 두경부의 근전도 측정하여 두경부 조음기관의 물리 특성으로 파악한다. 화자의 발화 상태에서 부착된 상기 안면 센서(120)의 레퍼런스 센서(121), 양극 센서(122), 음극 센서(123)는 발화에 따라 변화하는 두경부 근육의 근전도를 파악함으로서 각각 (0, -1), (-1, -1), (1, -1))과 같은 가변적인 위치를 가지게 된다. 이러한 위치가 바로 동적 가변 좌표(621)가 된다.At this time, as described above, the face sensor 120 measures the electromyography of the head and neck moving in accordance with the utterance of the speaker to determine the physical characteristics of the head and neck articulation. The reference sensor 121, the positive electrode 122, and the negative electrode sensor 123 of the face sensor 120 attached in the speaker's ignition state detect the EMG of the head and neck muscles that change according to the utterance, respectively (0, -1). ), (-1, -1), and (1, -1)). This position becomes the dynamic variable coordinate 621.
도 56은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 정적 기초 좌표를 기반으로 객체 두경부 데이터를 로봇 객체의 두경부의 액츄에이터에 매칭하는 경우를 도시한 도면이고, 도 57은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 전압차에 기반한 정적 기초 좌표를 도시한 도면이다.FIG. 56 illustrates a case in which the speech intent expression system according to the fifth embodiment of the present invention matches object head and neck data to an actuator of a head and neck part of a robot object based on static basic coordinates, and FIG. 57 is a view illustrating an embodiment of the present invention. 5 is a diagram illustrating static basic coordinates based on a voltage difference of a face sensor utilized by a speech intent representation system according to an exemplary embodiment.
도 56 및 도 57에 도시한 바와 같이, 데이터매칭부(600)가 객체 두경부 데이터(320)를 로봇 객체의 두경부(22)의 액츄에이터(30)에 매칭하기 위해서, 화자의 두경부에 부착된 안면 센서(120)의 위치인 제 1 기저 데이터(211)를 활용하여 정적 기초 좌표(611)를 생성한다. 56 and 57, a face sensor attached to the head and neck of the speaker in order for the data matching unit 600 to match the head and neck neck data 320 to the actuator 30 of the head and neck 22 of the robot object. The static basis coordinates 611 are generated using the first basis data 211, which is the position of 120.
이때, 전술한 것처럼, 안면 센서(120)가 가지는 전위차를 활용하여 그 위치를 파악한다. 화자의 비발화 상태에서 부착된 상기 안면 센서(120)의 레퍼런스 센서(121), 양극 센서(122), 음극 센서(123)는 로봇 객체의 두경부(22)의 액츄에이터(30)에서 각각 (0.0)과 같은 기준 위치를 가지게 된다. 이러한 위치가 바로 정적 기초 좌표(611)가 된다. In this case, as described above, the position of the face sensor 120 is detected by using the potential difference. The reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's non-ignition state are respectively (0.0) in the actuator 30 of the head and neck portion 22 of the robot object. It will have the same reference position as This position is the static basic coordinates 611.
도 58은 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 동적 가변 좌표를 기반으로 객체 두경부 데이터를 로봇 객체의 두경부의 액츄에이터에 매칭하는 경우를 도시한 도면이고, 도 59는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템이 활용하는 안면 센서의 전압차에 기반한 동적 가변 좌표를 도시한 도면이다.FIG. 58 illustrates a case in which the speech intent representation system according to the fifth embodiment of the present invention matches object head and neck data to an actuator of the head and neck portion of a robot object based on dynamic variable coordinates, and FIG. 59 is a view illustrating an embodiment of the present invention. FIG. 5 is a diagram illustrating dynamic variable coordinates based on a voltage difference of a face sensor utilized by a speech intention expression system according to an exemplary embodiment.
도 58 및 도 59에 도시한 바와 같이, 데이터매칭부(600)는 객체 두경부 데이터(320)를 로봇 객체의 두경부(22)의 액츄에이터(30)에 매칭하기 위해서, 화자의 두경부에 부착되어 화자의 발화에 따른 두경부 근육의 작용에 의한 안면 센서(120)의 전위차인 제 2 기저 데이터(221)를 활용하여 동적 가변 좌표(621)를 생성한다. 58 and 59, the data matching unit 600 is attached to the head and neck of the speaker to match the object head and neck data 320 to the actuator 30 of the head and neck 22 of the robot object. The dynamic variable coordinates 621 are generated using the second basis data 221 which is the potential difference of the face sensor 120 caused by the action of the head and neck muscles due to the utterance.
이때, 전술한 것처럼, 안면 센서(120)는 화자의 발화에 따라 움직이는 두경부의 근전도 측정하여 두경부 조음기관의 물리 특성으로 파악한다. 화자의 발화 상태에서 부착된 상기 안면 센서(120)의 레퍼런스 센서(121), 양극 센서(122), 음극 센서(123)는 발화에 따라 변화하는 두경부 근육의 근전도를 파악함으로서, 로봇 객체의 두경부(22)의 액츄에이터(30)가 각각 (0, -1), (-1, -1), (1, -1)과 같은 가변적인 위치를 가지게 되어, 이에 따라 움직이도록 한다. 이러한 위치가 바로 동적 가변 좌표(621)가 된다. At this time, as described above, the face sensor 120 measures the electromyography of the head and neck moving in accordance with the utterance of the speaker to determine the physical characteristics of the head and neck articulation. The reference sensor 121, the anode sensor 122, and the cathode sensor 123 of the face sensor 120 attached in the speaker's ignition state grasp the EMG of the head and neck muscles that change according to the utterance, and thus, the head and neck ( The actuator 30 of 22) has variable positions such as (0, -1), (-1, -1), and (1, -1), respectively, to move accordingly. This position becomes the dynamic variable coordinate 621.
도 60 및 도 61은 각각 본 발명의 제5실시예에 따른 발화 의도 표현 시스템의 로봇 객체의 두경부의 액츄에이터의 동작을 도시한 도면이고, 도 62는 본 발명의 제5실시예에 따른 발화 의도 표현 시스템의 로봇 객체의 두경부의 액츄에이터를 도시한 도면이다.60 and 61 are views showing the operation of the actuator of the head and neck portion of the robot object of the speech intent representation system according to the fifth embodiment of the present invention, Figure 62 is a speech intent representation according to a fifth embodiment of the present invention It is a figure which shows the actuator of the head and neck of the robot object of a system.
도 60에 도시한 바와 같이, 데이터매칭부(600)가 데이터해석부(200)와 데이터변환부(300)로부터 획득한 객체 두경부 데이터(310)를 로봇 객체의 두경부(22)의 하나이상의 액츄에이터(30)에 전달하여 매칭시키고, 이에 따라, 액츄에이터(30)는 로봇 객체의 두경부(22)의 인공근골격으로서, DC모터, 스텝모터, 서보모터를 포함하는 모터로 구동될 수 있고, 공압식 내지 유압식의 방식으로서 돌출 및 매립되어 작동될 수 있다. 이를 통해, 상기 액츄에이터(30)는 로봇 객체의 두경부(22)의 조음, 발화, 표정 중 하나이상의 다양한 동적 움직임을 구현할 수 있게 된다. As illustrated in FIG. 60, at least one actuator of the head and neck portion 22 of the robot object may be used by the data matching unit 600 to obtain the object head and neck data 310 obtained from the data interpreter 200 and the data converter 300. 30), and the actuator 30 is an artificial musculoskeletal structure of the head and neck portion 22 of the robot object, and may be driven by a motor including a DC motor, a step motor, and a servo motor, and may be pneumatic or hydraulic. Can be protruded and embedded and operated in a manner. Through this, the actuator 30 may implement various dynamic movements of one or more of articulation, speech, and facial expression of the head and neck part 22 of the robot object.
도 61에 도시한 바와 같이, 액츄에이터(30)는 DC모터, 스텝모터, 서보모터를 포함하는 모터로 구동될 수 있고, 공압식 내지 유압식의 방식으로 작동됨으로써, 인장형으로 수축 내지 이완을 할 수 있는 것을 특징으로 한다. As shown in FIG. 61, the actuator 30 may be driven by a motor including a DC motor, a step motor, and a servo motor, and operated in a pneumatic or hydraulic manner, thereby allowing contraction or relaxation in tension. It is characterized by.
도 62에 도시한 바와 같이, 액츄에이터(30)는 로봇 객체의 두경부(22)에 위치할 수 있다.As shown in FIG. 62, the actuator 30 may be located at the head and neck 22 of the robot object.
도면에 기재된 방법 외에도 센서부(100)의 경우 다음과 같은 것들이 포함 될 수 있다.In addition to the method described in the drawings, the sensor unit 100 may include the following.
1. 압력센서: MEMS 센서, Piezoelectric (압력-전압) 방식, Piezoresistive (압력-저항) 방식, Capacitive 방식, Pressure sensitive 고무 방식, Force sensing resistor (FSR) 방식, Inner particle 변형 방식, Buckling 측정 방식.1. Pressure sensor: MEMS sensor, Piezoelectric method, Piezoresistive method, Capacitive method, Pressure sensitive rubber method, Force sensing resistor (FSR) method, Inner particle deformation method, Buckling measurement method.
2. 마찰 센서: 마이크로 hair array 방식, 마찰온도 측정방식.2. Friction sensor: Micro hair array method, friction temperature measurement method.
3. 정전기 센서: 정전기 소모 방식, 정전기 발전 방식.3. Electrostatic sensor: electrostatic consumption, electrostatic generation.
4. 전기저항 센서: 직류저항 측정방식, 교류저항 측정방식, MEMS, Lateral 전극 array 방식, Layered 전극 방식, Field Effect Transistor (FET) 방식 (Organic-FET,Metal-oxide-semiconductor-FET, Piezoelectric-oxide-semiconductor -FET 등 포함).4. Electric resistance sensor: DC resistance measurement method, AC resistance measurement method, MEMS, Lateral electrode array method, Layered electrode method, Field Effect Transistor (FET) method (Organic-FET, Metal-oxide-semiconductor-FET, Piezoelectric-oxide) -semiconductor -FET etc.).
5. Tunnel Effect Tactile 센서: Quantum tunnel composites 방식, Electron tunneling 방식, Electroluminescent light 방식.5. Tunnel Effect Tactile Sensor: Quantum tunnel composites, Electron tunneling, Electroluminescent light.
6. 열저항 센서: 열전도도 측정방식, Thermoelectric 방식.6. Thermal resistance sensor: thermal conductivity measurement method, thermoelectric method.
7. Optical 센서: light intensity 측정방식, refractive index 측정방식.7. Optical sensor: light intensity measurement, refractive index measurement.
8. Magnetism based 센서: Hall-effect 측정 방식, Magnetic flux 측정 방식.8. Magnetism based sensor: Hall-effect measurement method, Magnetic flux measurement method.
9. Ultrasonic based 센서: Acoustic resonance frequency 방식, Surface noise 방식, Ultrasonic emission 측정방식.9. Ultrasonic based sensor: Acoustic resonance frequency method, Surface noise method, Ultrasonic emission measurement method.
10. 소프트 재료 센서: 고무, 파우더, 다공성 소재, 스펀지, 하이드로젤, 에어로젤, 탄소섬유, 나노탄소재료, 탄소나노튜브, 그래핀, 그래파이트, 복합재, 나노복합재, metal-고분자 복합재, ceramic-고분자 복합재, 전도성 고분자 등의 재료를 이용한 pressure, stress, 혹은 strain 측정 방식, Stimuli responsive 방식.10. Soft material sensors: rubber, powder, porous materials, sponges, hydrogels, aerogels, carbon fibers, nanocarbon materials, carbon nanotubes, graphene, graphite, composites, nanocomposites, metal-polymer composites, ceramic-polymer composites Pressure, stress, or strain measurement method using materials such as conductive polymers, Stimuli responsive method.
11. Piezoelectric 소재 센서: Quartz, PZT (lead zirconate titanate) 등의 세라믹 소재, PVDF, PVDF copolymers, PVDF-TrFE 등의 고분자 소재, 셀룰로오스, ZnO 나노선 등의 나노소재 방식.11. Piezoelectric sensor: Ceramic materials such as quartz, lead zirconate titanate (PZT), polymer materials such as PVDF, PVDF copolymers, and PVDF-TrFE, and nanomaterials such as cellulose and ZnO nanowires.

Claims (28)

  1. 화자의 두경부의 일면에 인접하여 조음기관의 물리특성을 측정하는 센서부;A sensor unit measuring physical characteristics of the articulation engine adjacent to one surface of the head and neck of the speaker;
    상기 센서부의 위치와 상기 조음기관의 물리특성을 기반으로 화자의 발화 특징을 파악하는 데이터해석부;A data analysis unit which grasps a utterance characteristic of a speaker based on the position of the sensor unit and the physical characteristics of the articulator;
    상기 센서부의 위치와 상기 발화특징을 언어데이터로 변환하는 데이터변환부;A data converting unit converting the position of the sensor unit and the speech feature into language data;
    상기 언어데이터를 외부로 표현하는 데이터표현부Data expression unit for expressing the language data to the outside
    를 포함하고,Including,
    상기 센서부는, 구강설에 대응되는 구강설 센서를 포함하는 발화 의도 표현 시스템.The sensor unit, corresponding to the oral cavity Speaking intention expression system including oral tongue sensor.
  2. 제 1 항에 있어서,The method of claim 1,
    상기 구강설 센서는, The oral cavity sensor,
    상기 구강설의 일측면에 고착되거나, 상기 구강설의 표면을 감싸거나, 상기 구강설 내부에 삽입되고, Fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, or inserted into the oral tongue,
    발화에 따른 상기 구강설의 x축, y축, z축 방향 기반의 시간에 따른 벡터량의 변화량을 파악하여, 상기 구강설의 저고도, 전후설성, 굴곡도, 신전도, 회전도, 긴장도, 수축도, 이완도, 진동도 중 적어도 하나의 물리특성을 파악하는 발화 의도 표현 시스템.By grasping the amount of change in the amount of vector over time based on the x-axis, y-axis, and z-axis direction of the oral tongue according to the utterance, the low altitude, front and rear tongue, flexion, extension, rotation, tension, contraction of the oral tongue A speech intent expression system for grasping at least one physical property among the degree of relaxation, relaxation and vibration.
  3. 제 1 항에 있어서,The method of claim 1,
    상기 구강설 센서는, The oral cavity sensor,
    상기 구강설의 일측면에 고착되거나, 상기 구강설의 표면을 감싸거나, 상기 구강설 내부에 삽입되고, Fixed to one side of the oral tongue, or wrapped around the surface of the oral tongue, or inserted into the oral tongue,
    발화에 따른 상기 구강설의 x축, y축, z축 방향 기반의 단위 시간 당 회전하는 각도의 변화량을 파악하여, 상기 구강설을 포함한 상기 조음기관의 물리 특성을 파악하는 발화 의도 표현 시스템.A speech intent expression system that grasps the amount of change in the angle of rotation per unit time based on the x-, y-, and z-axis directions of the oral tongue according to the utterance, and grasps the physical characteristics of the articulation organ including the oral tongue.
  4. 제 1 항에 있어서,The method of claim 1,
    상기 구강설 센서는, The oral cavity sensor,
    상기 구강설의 일측면에 고착되거나, 상기 구강설의 표면을 감싸고, Adhere to one side of the oral tongue, or surround the surface of the oral tongue,
    발화에 따른 상기 구강설의 수축 및 이완으로 발생하는 물리력에 따라 결정 구조의 변화에 기인하는 편극에 대응되는 전기신호가 발생하는 압전소자를 통해 상기 구강설의 굽힘도를 파악하여, 상기 구강설의 저고도, 전후설성, 굴곡도, 신전도, 회전도, 긴장도, 수축도, 이완도, 진동도 중 적어도 하나의 물리특성을 파악하는 발화 의도 표현 시스템.The degree of bending of the oral tongue by grasping the degree of bending of the oral tongue through a piezoelectric element that generates an electrical signal corresponding to the polarization due to the change in crystal structure according to the physical force generated by the contraction and relaxation of the oral tongue due to the ignition A utterance intent expression system that grasps at least one physical property of low altitude, front and rear, stiffness, extension, rotation, tension, contraction, relaxation and vibration.
  5. 제 1 항에 있어서,The method of claim 1,
    상기 센서부는, 상기 구강설이 상기 두경부 내외의 다른 조음기관과의 상호작용에 기인하는 접근 및 접촉에 대응되는 마찰전기(Tribo Electric Generator)에 따른 파열도, 마찰도, 공명도, 접근도 중 적어도 하나의 물리특성을 파악하는 마찰대전소자를 포함하는 발화 의도 표현 시스템.The sensor unit, at least one of the degree of rupture, friction, resonance, approach according to the triboelectric generator corresponding to the contact and contact caused by the oral cavity due to interaction with other articulators inside and outside the head and neck A ignition intention expression system including a triboelectric charging element for grasping the physical characteristics.
  6. 제 1 항에 있어서,The method of claim 1,
    상기 데이터해석부는, 상기 센서부에서 측정되는 상기 구강설과 다른 조음기관과의 물리특성을 통해 상기 화자가 발화하는 자모음, 어휘 단위 강세 (Lexical Stress), 문장 단위 강세(Tonic stress) 중 적어도 하나의 발화 특징을 파악하는 발화 의도 표현 시스템.The data interpreter may include at least one of a consonant vowel, lexical stress, and sentence stress that are spoken by the speaker through physical characteristics of the oral puncture measured by the sensor and another articulation organ. Speaking intent expression system to grasp utterance characteristics.
  7. 제 6 항에 있어서,The method of claim 6,
    상기 데이터해석부는, 상기 센서부에 의해 측정되는 상기 조음기관의 물리특성에 의한 발화 특징을 파악함에 있어서, 2진수 내지 실수를 포함하는 수치로 구성된 표준 발화 특징 행렬을 기반으로 상기 화자의 발음과 강세의 정오도, 유사근접도, 발화 의도 중 적어도 하나의 발화 특징을 측정하는 발화 의도 표현 시스템.The data analysis unit, in grasping the speech characteristics by the physical characteristics of the articulation engine measured by the sensor unit, pronunciation and stress of the speaker based on a standard speech characteristic matrix composed of numerical values including binary or real number A speech intent expression system for measuring a speech characteristic of at least one of noon, pseudo-proximity, and speech intent.
  8. 제 7 항에 있어서, The method of claim 7, wherein
    상기 데이터해석부는, 상기 센서부에 의해 측정되는 상기 조음기관의 물리특성을 발화 특징을 파악함에 있어서, 상기 조음기관의 물리특성을 각 자모음 단위의 패턴으로 인식하는 단계, 상기 자모음 단위의 패턴의 특징을 추출하고, 추출된 상기 자모음 단위의 패턴의 특징을 유사도에 따라 분류하는 단계, 분류된 상기 자모음 단위의 패턴의 특징을 재조합하는 단계, 상기 조음기관의 물리특성을 상기 발화 특징으로 해석하는 단계에 따라 상기 발화 특징을 파악하는 발화 의도 표현 시스템.The data interpreter may recognize physical characteristics of the articulation organ as measured by the sensor unit, and recognize the physical characteristics of the articulation organ as patterns of each consonant unit. Extracting a feature of the image, classifying the extracted feature of the pattern of the consonant unit according to similarity, recombining the feature of the pattern of the classified consonant unit, and physical properties of the articulation organ as the speech feature. A speech intent expression system for identifying the speech characteristics according to the interpreting step.
  9. 제 7 항에 있어서,The method of claim 7, wherein
    상기 데이터해석부는, 상기 센서부에 의해서 측정되는 상기 조음기관의 물리특성에 의해, 자모음의 동화(Assimilation), 이화(Dissimilation), 탈락(Elision), 첨가(Attachment), 강세(Stress)와, 약화(Reduction)로 야기되는 기식음화 (Asperation), 음절성자음화(Syllabic cosonant), 탄설음화(Flapping), 경음화(Tensification), 순음화(Labilalization), 연구개음화(Velarization), 치음화(Dentalizatiom), 구개음화 (Palatalization), 비음화(Nasalization), 강세변화(Stress Shift), 장음화(Lengthening) 중 적어도 하나의 이차조음현상인 발화 변이를 측정하는 발화 의도 표현 시스템.The data analysis unit, according to the physical characteristics of the articulation organ measured by the sensor unit, assimilation, dissimilation, elimination, attachment, stress and stress of consonants, Asperation, Syllable cosonant, Flapping, Tenseification, Labalization, Velarization, Dentalizatiom caused by Reduction A speech intent expression system for measuring speech variation, which is at least one secondary symptom among palatalization, nasalization, stress shift, and lentthening.
  10. 제 1 항에 있어서,The method of claim 1,
    상기 구강설 센서는, 센서 작동을 위한 회로부, 상기 회로부를 감싸는 캡슐부, 상기 구강설 일면에 부착되는 접착부를 포함하는 발화 의도 표현 시스템.The oral tongue sensor, the circuit part for the sensor operation, the capsule portion surrounding the circuit portion, the mouth intention expression system comprising an adhesive portion attached to one side of the oral tongue.
  11. 제 10 항에 있어서,The method of claim 10,
    상기 구강설 센서는, 박막 회로를 가진 필름 형태로서 상기 구강설에 인접하여 작동하는 발화 의도 표현 시스템. And the oral tongue sensor is a film having a thin film circuit and operates adjacent to the oral tongue.
  12. 제 1 항에 있어서,The method of claim 1,
    상기 센서부는, 두경부 근육의 신경신호 측정의 기준 전위를 생성하는 적어도 하나의 레퍼런스 센서와, 상기 두경부 근육의 신경신호를 측정하는 적어도 하나의 양극센서 및 적어도 하나의 음극센서로 구성된 안면부 센서를 포함하는 발화 의도 표현 시스템. The sensor unit includes at least one reference sensor for generating a reference potential for measuring nerve signals of the head and neck muscles, at least one anode sensor and at least one cathode sensor for measuring nerve signals of the head and neck muscles. Speaking intention expression system including a face sensor.
  13. 제 12 항에 있어서,The method of claim 12,
    상기 데이터해석부는, 상기 안면부 센서에 기반하여 상기 센서부의 위치를 획득함에 있어서, 상기 레퍼런스 센서를 기준으로 하여 상기 적어도 하나의 양극센서 및 상기 적어도 하나의 음극센서의 전위차를 파악하여 상기 안면부 센서의 위치를 파악하는 발화 의도 표현 시스템. The data analysis unit, in acquiring a position of the sensor unit based on the face unit sensor, detects a potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor to locate the face unit sensor. Speech intent representation system to grasp.
  14. 제 12 항에 있어서,The method of claim 12,
    상기 데이터해석부는, 상기 안면부 센서에 기반하여 상기 화자의 발화 특징을 획득함에 있어서, 상기 레퍼런스 센서를 기준으로 하여 상기 적어도 하나의 양극센서 및 상기 적어도 하나의 음극센서의 전위차를 파악하여 상기 화자의 두경부에서 발생하는 상기 조음기관의 물리 특성에 의한 발화 특징을 파악하는 발화 의도 표현 시스템. The data analysis unit, in acquiring the utterance characteristic of the speaker based on the face part sensor, grasps the potential difference between the at least one anode sensor and the at least one cathode sensor based on the reference sensor, and the head and neck part of the speaker. Speaking intention expression system for grasping utterance characteristics by the physical characteristics of the articulation engine generated in the.
  15. 제 1 항에 있어서,The method of claim 1,
    상기 센서부는, 상기 화자의 두경부 중 성대에 인접하여 성대의 근전도 내지 떨림을 파악하여, 상기 화자의 발화 시작, 발화 정지, 발화 종료 중 적어도 하나의 발화 내역 정보를 파악하는 성대 센서를 포함하는 발화 의도 표현 시스템. The sensor unit, the vocal intention including a vocal cord sensor to grasp the EMG or tremor of the vocal cords adjacent to the vocal cords of the head and neck of the speaker, to grasp at least one of the utterance history information of the utterance start, ignition stop, utterance Expression system.
  16. 제 1 항에 있어서,The method of claim 1,
    상기 센서부는, 치아의 일면에 인접하여 상기 구강설 및 아랫 입술의 접촉에 기인하여 발생하는 전기적 용량 변화에 따른 신호발생 위치를 파악하는 치아센서를 포함하는 발화 의도 표현 시스템. The sensor unit, a speech intention expression system including a tooth sensor to determine the signal generation position according to the change in the electrical capacity generated due to the contact between the oral tongue and the lower lip adjacent to one surface of the tooth.
  17. 제 1 항에 있어서,The method of claim 1,
    상기 데이터해석부는, 상기 화자의 두경부 일면에 인접한 음성 취득 센서를 통해 발화에 따른 상기 화자의 음성을 취득하는 발화 의도 표현 시스템.And the data analysis unit acquires a voice of the speaker according to the utterance through a voice acquisition sensor adjacent to one surface of the head and neck of the speaker.
  18. 제 1 항에 있어서, The method of claim 1,
    상기 센서부는, 상기 화자의 두경부 조음기관의 변화 정보, 상기 화자의 두경부 표정 변화 정보, 상기 화자의 발화 의도에 따라 움직이는 두경부, 흉곽부, 상지부, 하지부의 비언어적 표현 중 적어도 하나를 파악하기 위해 상기 화자의 두경부를 촬상하는 촬상센서를 포함하는 발화 의도 표현 시스템.The sensor unit, change information of the head and neck articulator of the speaker, Speaking intention expression including an imaging sensor for capturing at least one of the head and neck facial expression change information, the non-verbal representation of the head, neck, chest, upper limbs, and lower extremities moving in accordance with the speaker's intention to ignite system.
  19. 제 1 항에 있어서,The method of claim 1,
    상기 센서부의 구강설 센서, 안면 센서, 음성취득 센서, 성대 센서, 치아 센서, 촬상센서 중 적어도 하나에 전원을 공급하는 전원부를 더 포함하는 발화 의도 표현 시스템. And a power supply unit for supplying power to at least one of a mouth sensor, a face sensor, a voice acquisition sensor, a vocal cord sensor, a dental sensor, and an imaging sensor of the sensor unit.
  20. 제 1 항에 있어서, The method of claim 1,
    상기 데이터해석부 및 상기 데이터베이스부가 외부에 위치하여 작동할 경우, 연동되어 통신할 수 있는 유선 또는 무선의 통신부를 더 포함하는 발화 의도 표현 시스템. Speaking intention expression system further comprises a wired or wireless communication unit that can communicate in communication with the data analysis unit and the database unit is located outside.
  21. 제 1항에 있어서, The method of claim 1,
    상기 데이터해석부는, 상기 센서부의 위치, 상기 화자의 발화 특징, 상기 화자의 음성에 대응하는 적어도 하나의 언어 데이터 색인을 포함하는 데이터베이스부와 연동되는 발화 의도 표현 시스템. The data analysis unit includes at least one language data index corresponding to the position of the sensor unit, the speaker's speech characteristics, and the speaker's voice. Speaking intent expression system linked with the database unit.
  22. 제 21 항에 있어서, The method of claim 21,
    상기 데이터베이스부는, 발화의 진행 시간, 발화에 따른 주파수, 발화의 진폭, 발화에 따른 두경부 근육의 근전도, 발화에 따른 두경부 근육의 위치 변화, 구강설의 굽힘 및 회전에 따른 위치 변화 중 적어도 하나의 정보를 기반으로, 자모음의 음소단위 색인, 음절단위 색인, 단어단위 색인, 구절단위 색인, 문장단위 색인, 연속발화단위 색인, 발음의 고저 색인 중 적어도 하나의 언어 데이터 색인을 구성하는 발화 의도 표현 시스템. The database unit may include at least one of information about a progression time of speech, frequency according to speech, amplitude of speech, electromyography of head and neck muscles according to speech, position change of head and neck muscles according to speech, positional change due to bending and rotation of oral tongue A speech intent representation system constituting at least one language data index of a phoneme index, syllable unit index, word unit index, phrase unit index, sentence unit index, continuous speech unit index, and high and low index of pronunciation based on a consonant .
  23. 제 1 항에 있어서, The method of claim 1,
    상기 데이터표현부는, 상기 데이터베이스부의 언어 데이터 색인과 연동되어, 상기 화자의 발화 특징을 자모음의 음소(Phoneme)단위, 적어도 하나의 단어단위, 적어도 하나의 구절단위(Citation Forms), 적어도 하나의 문장단위, 연속발화단위(Consecutive Speech) 중 적어도 하나의 발화 표현을 나타내는 발화 의도 표현 시스템.The data expression unit is interlocked with the language data index of the database unit, so that the speaker's utterance characteristics are included in a phoneme unit, at least one word unit, at least one phrase unit, and at least one sentence unit of a consonant. A speech intent expression system representing at least one speech expression of a unit or a continuous speech unit.
  24. 제 23 항에 있어서,The method of claim 23, wherein
    상기 데이터표현부에 의해 나타나는 발화 표현은, 문자 기호, 그림, 특수기호, 숫자 중 적어도 하나로 시각화되거나, 소리 형태로 청각화되어, 상기 화자와 청자에게 제공되는 발화 의도 표현 시스템.The speech expression represented by the data expression unit is visualized as at least one of a letter symbol, a picture, a special symbol, and a number, or is audible in a sound form and provided to the speaker and the listener.
  25. 제 23 항에 있어서,The method of claim 23, wherein
    상기 데이터표현부에 의해 나타나는 발화 표현은, 진동, 스누징, 태핑, 압박, 이완 중 적어도 하나의 촉각적 방법으로 상기 화자와 청자에게 제공되는 발화 의도 표현 시스템.The speech expression represented by the data expression unit is provided to the speaker and the listener in a tactile manner of at least one of vibration, snoozing, tapping, pressing, and relaxing.
  26. 제 1 항에 있어서,The method of claim 1,
    상기 데이터변환부는, 상기 센서부의 위치와 상기 두경부 표정 변화 정보를 제 1 기저 데이터로 변환하고, 상기 발화 특징과 상기 조음기관의 변화 정보와 두경부 표정 변화정보를 제 2 기저 데이터로 변환하여, 영상 객체의 두경부 또는 로봇 객체의 두경부 중 적어도 하나의 객체에 필요한 객체 두경부 데이터로 생성하는 발화 의도 표현 시스템.The data conversion unit converts the position of the sensor unit and the head and neck facial expression change information into first basis data, and converts the utterance feature, the change information of the articulation organ and the head and neck facial expression change information into second basis data, and an image object. A speech intent expression system that generates object head and neck data required for at least one of the head and neck of the head and neck of the robot object.
  27. 제 26 항에 있어서,The method of claim 26,
    상기 데이터해석부에서 처리된 객체 두경부 데이터를 상기 영상 객체의 두경부 또는 상기 로봇 객체의 두경부에 표현함에 있어서, 상기 데이터변환부의 제 1 기저 데이터를 기반으로 정적 기초 좌표를 설정하고, 제 2 기저 데이터를 기반으로 동적 가변 좌표를 설정하여, 매칭 위치를 생성하는 데이터매칭부를 더 포함하는 발화 의도 표현 시스템.In expressing the head and neck data processed by the data analyzer to the head and neck of the image object or the head and neck of the robot object, the static basis coordinates are set based on the first basis data of the data converter and the second basis data is set. Speaking intention expression system further comprising a data matching unit for generating a matching position by setting the dynamic variable coordinates based on.
  28. 제 27 항에 있어서,The method of claim 27,
    객체 두경부 데이터는 상기 데이터매칭부에 의해 상기 로봇 객체의 두경부의 일면에 위치한 액츄에이터에 전달되고, 상기 액츄에이터는 상기 객체 두경부 데이터에 따라 조음, 발화, 표정 중 적어도 하나를 포함하는 로봇 객체의 두경부의 움직임을 구현하는 발화 의도 표현 시스템.The head and neck data is transmitted to an actuator located on one surface of the head and neck of the robot object by the data matching unit, and the actuator moves at least one of articulation, speech, and facial expression according to the head and neck data. Speech intent representation system to implement.
PCT/KR2018/004325 2017-04-13 2018-04-13 Speech intention expression system using physical characteristics of head and neck articulator WO2018190668A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/605,361 US20200126557A1 (en) 2017-04-13 2018-04-13 Speech intention expression system using physical characteristics of head and neck articulator

Applications Claiming Priority (16)

Application Number Priority Date Filing Date Title
KR20170048010 2017-04-13
KR10-2017-0048010 2017-04-13
KR1020170126469A KR20180115602A (en) 2017-04-13 2017-09-28 Imaging Element and Apparatus for Recognition Speech Production and Intention Using Derencephalus Action
KR10-2017-0126049 2017-09-28
KR10-2017-0126470 2017-09-28
KR10-2017-0126469 2017-09-28
KR10-2017-0126048 2017-09-28
KR1020170126470A KR20180115603A (en) 2017-04-13 2017-09-28 The Articulatory Physical Features and Sound Synchronization for the Speech Production and its Expression Based on Speech Intention and its Recognition Using Derencephalus Action
KR1020170126048A KR20180115600A (en) 2017-04-13 2017-09-28 The Expression System for Speech Production and Intention Using Derencephalus Action
KR1020170125765A KR20180115599A (en) 2017-04-13 2017-09-28 The Guidance and Feedback System for the Improvement of Speech Production and Recognition of its Intention Using Derencephalus Action
KR1020170126049A KR20180115601A (en) 2017-04-13 2017-09-28 The Speech Production and Facial Expression Mapping System for the Visual Object Using Derencephalus Action
KR10-2017-0125765 2017-09-28
KR10-2017-0126769 2017-09-29
KR10-2017-0126770 2017-09-29
KR1020170126769A KR20180115604A (en) 2017-04-13 2017-09-29 The Articulatory Physical Features and Text Synchronization for the Speech Production and its Expression Based on Speech Intention and its Recognition Using Derencephalus Action
KR1020170126770A KR20180115605A (en) 2017-04-13 2017-09-29 The Speech Production and Facial Expression Mapping System for the Robot Using Derencephalus Action

Publications (1)

Publication Number Publication Date
WO2018190668A1 true WO2018190668A1 (en) 2018-10-18

Family

ID=63792694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/004325 WO2018190668A1 (en) 2017-04-13 2018-04-13 Speech intention expression system using physical characteristics of head and neck articulator

Country Status (1)

Country Link
WO (1) WO2018190668A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117752307A (en) * 2023-12-21 2024-03-26 新励成教育科技股份有限公司 Oral expression analysis system based on multisource biological signal acquisition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100007512A1 (en) * 2005-10-31 2010-01-14 Maysam Ghovanloo Tongue Operated Magnetic Sensor Based Wireless Assistive Technology
US20120259554A1 (en) * 2011-04-08 2012-10-11 Sony Computer Entertainment Inc. Tongue tracking interface apparatus and method for controlling a computer program
KR20140068080A (en) * 2011-09-09 2014-06-05 아티큘레이트 테크놀로지스, 인코포레이티드 Intraoral tactile biofeedback methods, devices and systems for speech and language training
US20140342324A1 (en) * 2013-05-20 2014-11-20 Georgia Tech Research Corporation Wireless Real-Time Tongue Tracking for Speech Impairment Diagnosis, Speech Therapy with Audiovisual Biofeedback, and Silent Speech Interfaces
US20160027441A1 (en) * 2014-07-28 2016-01-28 Ching-Feng Liu Speech recognition system, speech recognizing device and method for speech recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100007512A1 (en) * 2005-10-31 2010-01-14 Maysam Ghovanloo Tongue Operated Magnetic Sensor Based Wireless Assistive Technology
US20120259554A1 (en) * 2011-04-08 2012-10-11 Sony Computer Entertainment Inc. Tongue tracking interface apparatus and method for controlling a computer program
KR20140068080A (en) * 2011-09-09 2014-06-05 아티큘레이트 테크놀로지스, 인코포레이티드 Intraoral tactile biofeedback methods, devices and systems for speech and language training
US20140342324A1 (en) * 2013-05-20 2014-11-20 Georgia Tech Research Corporation Wireless Real-Time Tongue Tracking for Speech Impairment Diagnosis, Speech Therapy with Audiovisual Biofeedback, and Silent Speech Interfaces
US20160027441A1 (en) * 2014-07-28 2016-01-28 Ching-Feng Liu Speech recognition system, speech recognizing device and method for speech recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHIN, JIN HO ET AL.: "Korean Consonant Classification Based on Physical Sensor according to the Articulation Position for the Silent Speech Recognition", THE JOURNAL OF KOREAN INSTITUTE OF NEXT GENERATION COMPUTING, 21 October 2016 (2016-10-21) *
SHIN, JIN HO ET AL.: "Korean Consonant Recognition Based on Multiple Motion Sensor according to the Articulation Position", PROCEEDINGS OF THE 2016 KOREAN INSTITUTE OF NEXT GENERATION COMPUTING SPRING CONFERENCE, 28 May 2016 (2016-05-28) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117752307A (en) * 2023-12-21 2024-03-26 新励成教育科技股份有限公司 Oral expression analysis system based on multisource biological signal acquisition

Similar Documents

Publication Publication Date Title
KR102196099B1 (en) Imaging Element and Apparatus for Recognition Speech Production and Intention Using Derencephalus Action
Denby et al. Silent speech interfaces
US5536171A (en) Synthesis-based speech training system and method
WO2015099464A1 (en) Pronunciation learning support system utilizing three-dimensional multimedia and pronunciation learning support method thereof
JPH075807A (en) Device for training conversation based on synthesis
Chen Elements of human voice
WO2022080774A1 (en) Speech disorder assessment device, method, and program
WO2017082447A1 (en) Foreign language reading aloud and displaying device and method therefor, motor learning device and motor learning method based on foreign language rhythmic action detection sensor, using same, and electronic medium and studying material in which same is recorded
Perrier Control and representations in speech production
WO2018190668A1 (en) Speech intention expression system using physical characteristics of head and neck articulator
Kröger et al. Neural modeling of speech processing and speech learning
KR102071421B1 (en) The Assistive Speech and Listening Management System for Speech Discrimination, irrelevant of an Environmental and Somatopathic factors
KR102364032B1 (en) The Articulatory Physical Features and Sound-Text Synchronization for the Speech Production and its Expression Based on Speech Intention and its Recognition Using Derencephalus Action
Simpson et al. Detecting larynx movement in non-pulmonic consonants using dual-channel electroglottography
Seong et al. A study on the voice security system using sensor technology
Altalmas et al. Quranic Letter Pronunciation Analysis based on Spectrogram Technique: Case Study on Qalqalah Letters.
Stone A silent-speech interface using electro-optical stomatography
Vescovi et al. CONTROL, OF A MODIFIED TWO-MASS MODEL, FOR ANTHROPOMORPHIC SYNTHESIS
JPH0830190A (en) Conversation training device and method basically consisting of synthesis
WO2015019835A1 (en) Electric artificial larynx device
Deorukhakar et al. Speech Recognition for People with Disfluency: A
Johnson Speech recognition techniques applied to speech therapy
Tran Silent Communication: whispered speech-to-clear speech conversion
Guangpu Articulatory Phonetic Features for Improved Speech Recognition
JPH034919B2 (en)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18783668

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18783668

Country of ref document: EP

Kind code of ref document: A1