KR20190030083A - Speaker Identification Method Converged with Text Dependant Speaker Recognition and Text Independant Speaker Recognition in Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein - Google Patents
Speaker Identification Method Converged with Text Dependant Speaker Recognition and Text Independant Speaker Recognition in Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein Download PDFInfo
- Publication number
- KR20190030083A KR20190030083A KR1020170117367A KR20170117367A KR20190030083A KR 20190030083 A KR20190030083 A KR 20190030083A KR 1020170117367 A KR1020170117367 A KR 1020170117367A KR 20170117367 A KR20170117367 A KR 20170117367A KR 20190030083 A KR20190030083 A KR 20190030083A
- Authority
- KR
- South Korea
- Prior art keywords
- user
- speaker identification
- speech
- voice information
- speaker
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000013473 artificial intelligence Methods 0.000 title abstract description 11
- 230000001419 dependent effect Effects 0.000 claims abstract description 16
- 230000007257 malfunction Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
The present invention relates to a speaker identification method in an artificial intelligent assistant service and a speech recognition apparatus used therein. More particularly, the present invention relates to a speaker identification method and a speaker recognition method for preventing artificial intelligent assistant service from being provided using a self- It is possible to individually identify a plurality of users using artificial intelligent speakers, thereby preventing the occurrence of malfunctions due to the inability to distinguish the users, and at the same time, providing a personalized service for each user To a speaker identification method in an intelligent secretary service, and to a speech recognition apparatus used therefor.
Recently, the artificial intelligence secretary service using voice recognition technology has been widely launched at home and abroad, and the world market of artificial intelligent speaker is expected to reach about 2.5 trillion won in 2020, and the related market size will increase sharply It is expected.
However, the artificial intelligent speaker according to the prior art can not prevent the unauthorized use of the unregistered user who has no legitimate use right, and can not distinguish the individual user when there are a plurality of persons having legitimate use rights such as family members Malfunctions frequently occur, resulting in customer complaints and damages.
In addition, the artificial intelligent speaker according to the prior art has a technical limitation in that it can not provide a personalized service for each user because it can not distinguish individual users.
Accordingly, it is an object of the present invention to provide an intelligent speaker system that can prevent a user who does not have a legitimate use right from being provided with an artificial intelligent speaker service using an artificial intelligent speaker, The present invention provides a speaker identification method in an artificial intelligence secretary service and a speech recognition apparatus used therein, which can prevent a malfunction caused by a user's inability to identify a user and provide a personalized service for each user .
According to another aspect of the present invention, there is provided a speaker identification method comprising the steps of: (a) storing, in a user's directory, speech information of a user for a predetermined call word; (b) the voice recognition device identifies the user based on the call speech voice information from the user, thereby generating voice information of the atypical natural language instruction following the call speech of the user in the user's directory Storing; And (c) generating, by the speech recognition apparatus, user speech parameters for context independent speaker identification based on utterance speech information of the atypical natural language instruction stored cumulatively in the user's directory.
Preferably, (d) the speech recognition apparatus performs context dependent speaker identification based on the call speech voice information in the case where there is utterance of an atypical natural language instruction together with the call speech from the user, And performing context independent speaker identification based on atypical natural language command speech voice information.
And (e) performing the speaker identification for the user based on the result of the context dependent speaker identification and the result of the context independent speaker identification.
According to another aspect of the present invention, there is provided a speech recognition apparatus comprising: a storage unit for storing speech utterance information of a user on a predetermined call word in a directory of the user; And a speaker identification unit for identifying the user based on the call speech voice information from the user, wherein the speaker identification unit identifies the user based on the call speech voice information, Wherein the speech identification information of the atypical natural language instruction following the speech utterance is cumulatively stored and the speaker identification unit identifies a user speech parameter for context independent speaker identification based on utterance speech information of the atypical natural language instruction cumulatively stored in the user's directory .
Preferably, the speaker identification unit performs context dependent speaker identification based on the call speech voice information when the user has uttered the atypical natural language instruction together with the call speech, And performing context independent speaker identification based on the speech information.
The speaker identification unit may perform speaker identification for the user based on the result of the context dependent speaker identification and the result of the context independent speaker identification.
According to the present invention, it is possible to prevent an artificial intelligent assistant service from being provided by a person who has no legitimate use authority.
In addition, according to the present invention, it is possible to individually identify a plurality of users using the artificial intelligent speaker, thereby preventing a malfunction due to the inability to identify the user and providing a personalized service for each user do.
1 is a configuration diagram of an artificial intelligent assistant service providing system according to a first embodiment of the present invention;
2 is a functional block diagram illustrating a structure of a speech recognition apparatus according to a first embodiment of the present invention.
3 is a flowchart illustrating a speaker identification method in the speech recognition apparatus according to the first embodiment of the present invention, and FIG.
4 is a signal flow diagram illustrating an execution procedure of the artificial intelligent assistant service providing method according to the second embodiment of the present invention.
Hereinafter, the present invention will be described in detail with reference to the drawings. It is to be noted that the same elements among the drawings are denoted by the same reference numerals whenever possible. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.
1 is a configuration diagram of an artificial intelligent assistant service providing system according to a first embodiment of the present invention. Referring to FIG. 1, an artificial intelligent assistant service providing system according to a first embodiment of the present invention includes a
The
The
More sophisticatedly, the
In addition, the
On the other hand, when the
On the other hand, the
2 is a functional block diagram illustrating the structure of a
First, the
The
Meanwhile, in the
The
On the other hand, the
3 is a flowchart illustrating a speaker identification method in the speech recognition apparatus according to the first embodiment of the present invention. Hereinafter, a speaker identification method in the speech recognition apparatus according to the first embodiment of the present invention will be described with reference to FIG.
The
Then, the user inputs his or her user ID through the input panel separately provided to the
Meanwhile, in step S210, the ID input by the user is the ID provided by the user at the time of subscription to the artificial intelligence secret service according to the present invention, so that the ID is the same as the ID stored in the
In addition, the above-described user registration procedure is repeatedly performed for each of a plurality of users (for example, family members) to be used together with the
After the user registration in step S210 is completed, the user first utters the caller in order to use the artificial intelligent assistant service according to the present invention, and based on the caller voice information uttered by the user, The
That is, if the user subsequently utters the idle language and unstructured natural language commands in order to utilize the artificial intelligence secretarial service according to the present invention, it is possible to use the atypical natural language command spoken by the user through the procedure of the above- The voice information is accumulated in the directory of the corresponding user (S240).
When the voice information of the atypical natural language instruction is accumulated in the directory of the corresponding user in a certain degree or more (for example, a net voice of 30 seconds or more), the speaker identification unit recognizes the user- Frequency bandwidth, amplitude spectrum, and the like), and the user-specific parameter values thus generated are stored together with the corresponding user's directory (S250).
Accordingly, the
When a specific user utteres a caller and atypical natural language commands sequentially in order to use the artificial intelligent assistant service according to the present invention, the
The
Herein, the final speaker identification method of the
In the meantime, in implementing the present invention, the speech information of the atypical natural language instruction word in step S270 is cumulatively stored in the user's directory, so that the user-specific speech recognition parameters generated in step S250 are additionally generated, It is desirable that the accuracy of the context independent speaker identification in the
4 is a signal flow diagram illustrating an execution procedure of the artificial intelligent assistant service providing method according to the second embodiment of the present invention. Hereinafter, with reference to FIG. 1, FIG. 2, and FIG. 4, description will be made of an execution procedure of the artificial intelligent assistant service providing method according to an embodiment of the present invention.
4 is a state in which the user registration procedure in the speaker identification method according to the first embodiment of the present invention shown in FIG. 3 is completed, It is assumed that the hybrid speaker identification method can be executed through cumulative learning in the
First, a user who wishes to use the artificial intelligent assistant service according to the present invention speaks a predetermined caller speech (for example, 'silos') and then successively transmits a service request voice (for example, Recommendation ') (S310).
Accordingly, the
Meanwhile, in the present invention, even if the third party knows the caller information or has spoken the caller by accident, the third party who has not proceeded with the registration process and the directory creation process according to the user in FIG. 3 described above, (I.e., authentication is disabled), thereby limiting the use of the artificial intelligent assistant service according to the present invention.
After completing the speaker identification procedure, the
Meanwhile, when the
Then, the
Specifically, the
In the meantime, in the present invention, the same information as shown in Table 1 may be stored in the
Meanwhile, if the user ID included in the service use permission authentication request message in step S340 is 'KIM08' and the requested service content included in the corresponding message is 'viewing adult movie content', the
Accordingly, the
Meanwhile, when the
In the case where the user information in Table 1 is stored in the
In executing the service provision in the above-described step S390, the
Specifically, in step S320, the
Specifically, in performing step S390, the
Accordingly, the
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention.
100: Voice recognition equipment, 200: service providing server.
Claims (6)
(b) the voice recognition device identifies the user based on the call speech voice information from the user, thereby generating voice information of the atypical natural language instruction following the call speech of the user in the user's directory Storing; And
(c) generating a user voice parameter for context independent speaker identification based on utterance voice information of the atypical natural language instruction cumulatively stored in the user's directory
/ RTI >
(d) when the speech recognition device has contextual dependent speaker identification based on the speech speech voice information when the user has uttered an atypical natural language instruction together with the call speech, And performing context independent speaker identification based on the speech information.
(e) performing the speaker identification for the user based on the result of the context-dependent speaker identification and the result of the context-independent speaker identification.
A speaker identification unit for identifying the user based on the call speech voice information from the user,
/ RTI >
Wherein the speaker identification unit identifies the user based on the call speech voice information, and the speech unit accumulates voice information of the atypical natural language instruction following the call speech of the user,
Wherein the speaker identification unit generates a user voice parameter for context independent speaker identification based on utterance voice information of the atypical natural language instruction cumulatively stored in the user's directory.
Wherein the speaker identification unit performs context dependent speaker identification on the basis of the call speech voice information when the user has uttered the atypical natural language instruction together with the call speech and generates a speech based on the atypical natural language command speech voice information Thereby performing context independent speaker identification.
Wherein the speaker identification unit performs speaker identification for the user based on the result of the context dependent speaker identification and the result of the context independent speaker identification.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020170117367A KR101993827B1 (en) | 2017-09-13 | 2017-09-13 | Speaker Identification Method Converged with Text Dependant Speaker Recognition and Text Independant Speaker Recognition in Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein |
PCT/KR2018/010225 WO2019054680A1 (en) | 2017-09-13 | 2018-09-03 | Speaker identification method in artificial intelligence secretarial service in which context-dependent speaker identification and context-independent speaker identification are converged, and voice recognition device used therefor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020170117367A KR101993827B1 (en) | 2017-09-13 | 2017-09-13 | Speaker Identification Method Converged with Text Dependant Speaker Recognition and Text Independant Speaker Recognition in Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20190030083A true KR20190030083A (en) | 2019-03-21 |
KR101993827B1 KR101993827B1 (en) | 2019-06-27 |
Family
ID=65723990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020170117367A KR101993827B1 (en) | 2017-09-13 | 2017-09-13 | Speaker Identification Method Converged with Text Dependant Speaker Recognition and Text Independant Speaker Recognition in Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR101993827B1 (en) |
WO (1) | WO2019054680A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116711006A (en) | 2021-02-23 | 2023-09-05 | 三星电子株式会社 | Electronic device and control method thereof |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004294755A (en) * | 2003-03-27 | 2004-10-21 | Secom Co Ltd | Device and program for speaker authentication |
KR20100027865A (en) * | 2008-09-03 | 2010-03-11 | 엘지전자 주식회사 | Speaker recognition and speech recognition apparatus and method thereof |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080090034A (en) * | 2007-04-03 | 2008-10-08 | 삼성전자주식회사 | Voice speaker recognition method and apparatus |
KR20100073178A (en) * | 2008-12-22 | 2010-07-01 | 한국전자통신연구원 | Speaker adaptation apparatus and its method for a speech recognition |
US10127911B2 (en) * | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
-
2017
- 2017-09-13 KR KR1020170117367A patent/KR101993827B1/en active IP Right Grant
-
2018
- 2018-09-03 WO PCT/KR2018/010225 patent/WO2019054680A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004294755A (en) * | 2003-03-27 | 2004-10-21 | Secom Co Ltd | Device and program for speaker authentication |
KR20100027865A (en) * | 2008-09-03 | 2010-03-11 | 엘지전자 주식회사 | Speaker recognition and speech recognition apparatus and method thereof |
Also Published As
Publication number | Publication date |
---|---|
WO2019054680A1 (en) | 2019-03-21 |
KR101993827B1 (en) | 2019-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11887590B2 (en) | Voice enablement and disablement of speech processing functionality | |
US11386905B2 (en) | Information processing method and device, multimedia device and storage medium | |
CN110661927B (en) | Voice interaction method and device, computer equipment and storage medium | |
US11763808B2 (en) | Temporary account association with voice-enabled devices | |
AU2016216737B2 (en) | Voice Authentication and Speech Recognition System | |
US10714085B2 (en) | Temporary account association with voice-enabled devices | |
KR102087202B1 (en) | Method for Providing Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein | |
KR20170088997A (en) | Method and apparatus for processing voice information | |
CN110858841B (en) | Electronic device and method for registering new user through authentication of registered user | |
KR20180046780A (en) | Method for providing of voice recognition service using double wakeup and apparatus thereof | |
CN111684521A (en) | Method for processing speech signal for speaker recognition and electronic device implementing the same | |
US10866948B2 (en) | Address book management apparatus using speech recognition, vehicle, system and method thereof | |
Maskeliunas et al. | Voice-based human-machine interaction modeling for automated information services | |
KR101993827B1 (en) | Speaker Identification Method Converged with Text Dependant Speaker Recognition and Text Independant Speaker Recognition in Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein | |
KR20210042520A (en) | An electronic apparatus and Method for controlling the electronic apparatus thereof | |
US11461779B1 (en) | Multi-speechlet response | |
KR102415694B1 (en) | Massage Chair Controllable by User's Voice | |
EP3502938B1 (en) | A conversational registration method for client devices | |
KR20220118698A (en) | Electronic device for supporting artificial intelligence agent services to talk to users | |
CN112513845A (en) | Transient account association with voice-enabled devices | |
US20240312455A1 (en) | Transferring actions from a shared device to a personal device associated with an account of a user | |
EP4428854A1 (en) | Method for providing voice synthesis service and system therefor | |
US11076018B1 (en) | Account association for voice-enabled devices | |
US20220406324A1 (en) | Electronic device and personalized audio processing method of the electronic device | |
US20240119930A1 (en) | Artificial intelligence device and operating method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right |