KR101993827B1

KR101993827B1 - Speaker Identification Method Converged with Text Dependant Speaker Recognition and Text Independant Speaker Recognition in Artificial Intelligence Secretary Service, and Voice Recognition Device Used Therein

Info

Publication number: KR101993827B1
Application number: KR1020170117367A
Authority: KR
Inventors: 정희석; 진세훈; 이형엽; 임형택
Original assignee: (주)파워보이스
Priority date: 2017-09-13
Filing date: 2017-09-13
Publication date: 2019-06-27
Also published as: WO2019054680A1; KR20190030083A

Abstract

A speaker identification method in an artificial intelligent assistant service in which a context-dependent speaker identification and a context-independent speaker identification are fused, and a speech recognition apparatus used therein are disclosed. According to the present invention, the speech recognition apparatus stores user's utterance voice information for a predetermined call word in a user's directory and identifies the user based on the call utterance voice information from the user, And accumulating the utterance voice information of the following unstructured natural language instruction in the user's directory and generating a user voice parameter for context independent speaker identification based on utterance voice information of the atypical natural language instruction cumulatively stored in the user's directory . According to the present invention, it is possible to prevent an artificial intelligent assistant service from being provided by a person who has no legitimate use authority. In addition, according to the present invention, it is possible to individually identify a plurality of users using the artificial intelligent speaker, thereby preventing a malfunction due to the inability to identify the user and providing a personalized service for each user do.

Description

TECHNICAL FIELD The present invention relates to a speaker identification method in an artificial intelligence secretary service in which a context-dependent speaker identification and a context-independent speaker identification are fused, and a speech recognition apparatus using the same. BACKGROUND ART [0002] Speaker identification methods, such as Speaker Identification Method Converged with Text Dependent Speaker Recognition and Text Independent Speaker Recognition, and Voice Recognition Device Used There}

The present invention relates to a speaker identification method in an artificial intelligent assistant service and a speech recognition apparatus used therein. More particularly, the present invention relates to a speaker identification method and a speaker recognition method for preventing artificial intelligent assistant service from being provided using a self- It is possible to individually identify a plurality of users using artificial intelligent speakers, thereby preventing the occurrence of malfunctions due to the inability to distinguish the users, and at the same time, providing a personalized service for each user To a speaker identification method in an intelligent secretary service, and to a speech recognition apparatus used therefor.

Recently, the artificial intelligence secretary service using voice recognition technology has been widely launched at home and abroad, and the world market of artificial intelligent speaker is expected to reach about 2.5 trillion won in 2020, and the related market size will increase sharply It is expected.

However, the artificial intelligent speaker according to the prior art can not prevent the unauthorized use of the unregistered user who has no legitimate use right, and can not distinguish the individual user when there are a plurality of persons having legitimate use rights such as family members Malfunctions frequently occur, resulting in customer complaints and damages.

In addition, the artificial intelligent speaker according to the prior art has a technical limitation in that it can not provide a personalized service for each user because it can not distinguish individual users.

Accordingly, it is an object of the present invention to provide an intelligent speaker system that can prevent a user who does not have a legitimate use right from being provided with an artificial intelligent speaker service using an artificial intelligent speaker, The present invention provides a speaker identification method in an artificial intelligence secretary service and a speech recognition apparatus used therein, which can prevent a malfunction caused by a user's inability to identify a user and provide a personalized service for each user .

According to another aspect of the present invention, there is provided a speaker identification method comprising the steps of: (a) storing, in a user's directory, speech information of a user for a predetermined call word; (b) the voice recognition device identifies the user based on the call speech voice information from the user, thereby generating voice information of the atypical natural language instruction following the call speech of the user in the user's directory Storing; And (c) generating, by the speech recognition apparatus, user speech parameters for context independent speaker identification based on utterance speech information of the atypical natural language instruction stored cumulatively in the user's directory.

Preferably, (d) the speech recognition apparatus performs context dependent speaker identification based on the call speech voice information in the case where there is utterance of an atypical natural language instruction together with the call speech from the user, And performing context independent speaker identification based on atypical natural language command speech voice information.

And (e) performing the speaker identification for the user based on the result of the context dependent speaker identification and the result of the context independent speaker identification.

According to another aspect of the present invention, there is provided a speech recognition apparatus comprising: a storage unit for storing speech utterance information of a user on a predetermined call word in a directory of the user; And a speaker identification unit for identifying the user based on the call speech voice information from the user, wherein the speaker identification unit identifies the user based on the call speech voice information, Wherein the speech identification information of the atypical natural language instruction following the speech utterance is cumulatively stored and the speaker identification unit identifies a user speech parameter for context independent speaker identification based on utterance speech information of the atypical natural language instruction cumulatively stored in the user's directory .

Preferably, the speaker identification unit performs context dependent speaker identification based on the call speech voice information when the user has uttered the atypical natural language instruction together with the call speech, And performing context independent speaker identification based on the speech information.

The speaker identification unit may perform speaker identification for the user based on the result of the context dependent speaker identification and the result of the context independent speaker identification.

According to the present invention, it is possible to prevent an artificial intelligent assistant service from being provided by a person who has no legitimate use authority.

In addition, according to the present invention, it is possible to individually identify a plurality of users using the artificial intelligent speaker, thereby preventing a malfunction due to the inability to identify the user and providing a personalized service for each user do.

1 is a configuration diagram of an artificial intelligent assistant service providing system according to a first embodiment of the present invention;
2 is a functional block diagram illustrating a structure of a speech recognition apparatus according to a first embodiment of the present invention.
3 is a flowchart illustrating a speaker identification method in the speech recognition apparatus according to the first embodiment of the present invention, and FIG.
4 is a signal flow diagram illustrating an execution procedure of the artificial intelligent assistant service providing method according to the second embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings. It is to be noted that the same elements among the drawings are denoted by the same reference numerals whenever possible. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

1 is a configuration diagram of an artificial intelligent assistant service providing system according to a first embodiment of the present invention. Referring to FIG. 1, an artificial intelligent assistant service providing system according to a first embodiment of the present invention includes a speech recognition apparatus 100 and a service providing server 200.

The voice recognition apparatus 100 is an artificial intelligent speaker equipped with a voice recognition function, and is installed in a space such as a living room in which a user resides. In implementing the present invention, the voice recognition apparatus 100 includes a smart phone .

The voice recognition apparatus 100 guides the user to utter a predetermined call word (for example, 'silo') for a user registration procedure in the artificial intelligence secretary service, and then performs voice registration for each user according to the utterance of the user And authenticates the user on the basis of the voice information of the callee through the voice recognition function when the user utters the callee after completing voice registration for each user.

More sophisticatedly, the speech recognition apparatus 100 can be implemented as a speaker identification method that combines text dependent speaker recognition via an alias keyword and text independent speaker recognition based on an atypical natural language instruction .

In addition, the speech recognition apparatus 100 generates and stores parameter values for voice print analysis such as a frequency bandwidth and an amplitude spectrum in a user's speech voice signal and an unstructured natural language command speech (service request speech) signal, If there is an input of the service request voice of the user, the voice parameter values in the voice of the service request are compared with the previously stored parameter values, so that authentication through the text-independent speaker recognition method for the speaker Perform the procedure.

On the other hand, when the speech recognition apparatus 100 performs the speaker authentication procedure using the parameter for the grammar analysis, various conventional methods such as Korean Patent Laid-Open No. 10-2012-72906 may be used.

On the other hand, the service providing server 200 is a server installed and operated by a company that manufactures and sells the voice recognition apparatus 100 such as an artificial intelligent speaker. In the service providing server 200, a user who subscribes to the artificial intelligence secretary service Information on user personal information such as ID, age, sex, and preferred content information provided by the user at the time of subscription to the service, and information on the scope of use of the service for each user.

2 is a functional block diagram illustrating the structure of a speech recognition apparatus 100 according to an embodiment of the present invention. 2, the speech recognition apparatus 100 according to an exemplary embodiment of the present invention includes an input unit 110, an output unit 130, a speaker identification unit 150, a determination unit 170, a storage unit 180, And a communication unit 190. [

First, the input unit 110 of the voice recognition apparatus 100 is implemented by a microphone module or the like. When a user speaks a call word, the voice of the user is input through the input unit 110.

The output unit 130 of the voice recognition apparatus 100 may be implemented as a speaker module or the like and may be a voice recognition system such as a voice recognition system (For example, say 'silo' if a beep sounds), and outputs a result of a subsequent service request of the registered user (such as a service unavailable guide, request information information such as weather, etc.) ) To the user.

Meanwhile, in the storage unit 180 of the voice recognition apparatus 100, the caller information set by the manufacturer or the purchaser (user) of the voice recognition apparatus 100 is stored. Also, in the user registration procedure, The voice is associated with the user's ID for each user.

The speaker identification unit 150 of the voice recognition apparatus 100 performs a context-dependent speaker identification based on the caller keyword and a context-independent speaker identification based on the unstructured natural language instruction as a speaker identification method.

On the other hand, the determination unit 170 of the voice recognition apparatus 100 performs a function of determining a customized service content to be provided to the user based on the ID information of the speaker identified by the speaker identification unit 150, The communication unit 190 of the server 100 performs data communication with the service providing server 200 or an external server providing the service contents requested by the user.

3 is a flowchart illustrating a speaker identification method in the speech recognition apparatus according to the first embodiment of the present invention. Hereinafter, a speaker identification method in the speech recognition apparatus according to the first embodiment of the present invention will be described with reference to FIG.

The speech recognition apparatus 100 may include a voice for guiding the user to register his / her voice in order to utilize the artificial intelligent assistant service according to the present invention, such as a voice call (for example, 'silo' The voice recognition device 100 outputs voice to the user through the output unit 130. The user speaks the voice call according to voice guidance of the voice recognition device 100, Accordingly, the voice of the user is input through the input unit 110 of the voice recognition apparatus 100.

Then, the user inputs his or her user ID through the input panel separately provided to the input unit 110. As a result, a directory for the user is created in the storage unit 180 of the voice recognition apparatus 100, The ID of the user and the voice information of the caller inputted by the user are stored in association with each other (S210).

Meanwhile, in step S210, the ID input by the user is the ID provided by the user at the time of subscription to the artificial intelligence secret service according to the present invention, so that the ID is the same as the ID stored in the service providing server 200 Lt; / RTI >

In addition, the above-described user registration procedure is repeatedly performed for each of a plurality of users (for example, family members) to be used together with the voice recognition apparatus 100 through the same call word.

After the user registration in step S210 is completed, the user first utters the caller in order to use the artificial intelligent assistant service according to the present invention, and based on the caller voice information uttered by the user, The controller 150 performs speaker identification for the user (S220). The voice information of the atypical natural language command, which is uttered following the uttered voice to use the artificial intelligent assistant service, (S230). &Lt; / RTI >

That is, if the user subsequently utters the idle language and unstructured natural language commands in order to utilize the artificial intelligence secretarial service according to the present invention, it is possible to use the atypical natural language command spoken by the user through the procedure of the above- The voice information is accumulated in the directory of the corresponding user (S240).

When the voice information of the atypical natural language instruction is accumulated in the directory of the corresponding user in a certain degree or more (for example, a net voice of 30 seconds or more), the speaker identification unit recognizes the user- Frequency bandwidth, amplitude spectrum, and the like), and the user-specific parameter values thus generated are stored together with the corresponding user's directory (S250).

Accordingly, the speaker identification unit 150 can independently perform the context dependent speaker identification based on the caller keyword and the context independent speaker identification based on the parameter values generated for each user.

When a specific user utteres a caller and atypical natural language commands sequentially in order to use the artificial intelligent assistant service according to the present invention, the speaker identification unit 150 identifies the first speaker identification through the caller keyword (S260), and then the second speaker identification (context independent speaker identification) through the atypical natural language instruction is continuously executed (S270).

The speaker identification unit 150 calculates a sum of a value obtained by applying a predetermined weight to the result of the first speaker identification by the context dependent speaker identification method and a value obtained by applying a predetermined weight to the result of the second speaker identification performed through the context independent speaker identification And finally identifies the speaker based on the value (S280).

Herein, the final speaker identification method of the speaker identification unit 150 as described above will be referred to as a hybrid speaker identification method in which a context-dependent speaker identification method and a context-independent speaker identification method are fused.

In the meantime, in implementing the present invention, the speech information of the atypical natural language instruction word in step S270 is cumulatively stored in the user's directory, so that the user-specific speech recognition parameters generated in step S250 are additionally generated, It is desirable that the accuracy of the context independent speaker identification in the identification unit 150 is continuously improved.

4 is a signal flow diagram illustrating an execution procedure of the artificial intelligent assistant service providing method according to the second embodiment of the present invention. Hereinafter, with reference to FIG. 1, FIG. 2, and FIG. 4, description will be made of an execution procedure of the artificial intelligent assistant service providing method according to an embodiment of the present invention.

4 is a state in which the user registration procedure in the speaker identification method according to the first embodiment of the present invention shown in FIG. 3 is completed, It is assumed that the hybrid speaker identification method can be executed through cumulative learning in the speaker identification unit 150. [

First, a user who wishes to use the artificial intelligent assistant service according to the present invention speaks a predetermined caller speech (for example, 'silos') and then successively transmits a service request voice (for example, Recommendation ') (S310).

Accordingly, the speaker identification unit 150 of the voice recognition apparatus 100 can identify the user ID of the corresponding user through execution of the hybrid speaker identification method through steps S260 through S280 (S320).

Meanwhile, in the present invention, even if the third party knows the caller information or has spoken the caller by accident, the third party who has not proceeded with the registration process and the directory creation process according to the user in FIG. 3 described above, (I.e., authentication is disabled), thereby limiting the use of the artificial intelligent assistant service according to the present invention.

After completing the speaker identification procedure, the speaker identification unit 150 of the voice recognition apparatus 100 analyzes the voice of the user's service request at step S330, and the communication unit 190 of the voice recognition apparatus 100 The service providing authority authentication request message including the user's ID information and the requested service content is transmitted to the service providing server 200 (S340).

Meanwhile, when the speaker identification unit 150 performs the service request speech analysis and speech recognition in the above-described step S330, speech analysis and recognition techniques in various speech recognition services according to the related art may be used.

Then, the service providing server 200 executes the authentication procedure for the usage right of the requested service based on the user's ID information and the requested service content information included in the service usage right request message received from the voice recognition device 100 (S350).

Specifically, the service providing server 200 may store user-specific information provided by the user in the step of joining the artificial intelligence assistant service according to the present invention as shown in Table 1 below.

In the meantime, in the present invention, the same information as shown in Table 1 may be stored in the storage unit 180 of the voice recognition apparatus 100 as well.

User ID date of birth gender Access Restriction Service KIM77 December 7, 1977. south none PARK78 May 1, 1978. female Paid content KIM08 December 15, 2008. south Adult content

Meanwhile, if the user ID included in the service use permission authentication request message in step S340 is 'KIM08' and the requested service content included in the corresponding message is 'viewing adult movie content', the service providing server 200 In step S360, transmits a service use permission rejection message to the voice recognition device 100 without authenticating the usage right of the service based on the user information in Table 1 above.

Accordingly, the output unit 130 of the voice recognition apparatus 100 outputs a service unavailable guidance voice such as 'the requested service is unavailable' to the corresponding user.

Meanwhile, when the service providing server 200 authenticates the service using right in step S350, the service providing server 200 transmits a service use permission completion message to the voice recognition apparatus 100 (S380) Accordingly, the voice recognition apparatus 100 performs service provision according to the service request in step S310 (S390).

In the case where the user information in Table 1 is stored in the storage unit 180 of the voice recognition apparatus 100 in the present invention, May be executed by itself through the determination unit 170 of the control unit 100.

In executing the service provision in the above-described step S390, the voice recognition apparatus 100 may provide a personalized service based on the user ID of the speaker identified in step S320 and the related information in the above table 1 will be.

Specifically, in step S320, the speaker identification unit 150 of the voice recognition apparatus 100 searches for a service request voice in the above-described step S310 as 'bored and there is nothing interesting' The determination unit 170 of the voice recognition apparatus 100 determines the personalized content for 'PARK78' as 'PARK78' based on the user information in Table 1 and the voice analysis result for the service request voice, American drama '.

Specifically, in performing step S390, the determination unit 170 of the voice recognition apparatus 100 provides preferential contents information of 'PARK78' at the time of service subscription by the corresponding user. Thus, the service providing server 200 and the voice recognition The other female members belonging to the age range of 'PARK78' among the 'American drama / family movie / latest song' which is the favorite content information additionally stored together with the user information in the above table 1 in the storage unit 180 of the apparatus 100 The US drama ', which is a content having a relatively high preference, can be determined as' US drama' as customized content for 'PARK78'.

Accordingly, the determination unit 170 of the voice recognition apparatus 100 generates a customized service proposal message such as 'Would you like to watch American dramas recommended by the silos?', 130 outputs a voice message to the user.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention.

100: voice recognition device, 200: service providing server.

Claims

(a) completing user registration by the speech recognition device storing user's utterance voice information for a predetermined call word in association with the user's ID in the user's directory;
(b) identifying the user based on the voice information of the caller uttered by the user when the user utters the atypical natural language command together with the caller to use the voice recognition service ;
(c) when the user recognizes the user based on the voice information of the caller uttered by the user, the voice recognition device transmits voice information of the atypical natural language instruction following the caller utterance of the user to the user Accumulating in the directory;
(d) generating a user voice parameter for context independent speaker identification based on utterance voice information of the atypical natural language instruction cumulatively stored in the user's directory; And
(e) if the speech recognition apparatus has contextual dependent speaker identification based on the call speech voice information when the user has uttered an atypical natural language instruction together with the call speech, Performing context independent speaker identification based on voice information
/ RTI >

delete

The method according to claim 1,
(f) performing the speaker identification for the user based on the result of the context-dependent speaker identification and the result of the context-independent speaker identification.

A storage unit for completing a user registration by associating the user's voice information of a predetermined call word with the user's ID in the user's directory; And
A speaker identification unit for identifying the user based on voice information of a call word uttered by the user when the user utters the atypical natural language command together with the call word to use the voice recognition service,
/ RTI >
When the speaker identification unit identifies the user based on the voice information of the call word uttered by the user, the storage unit stores the utterance voice information of the atypical natural language instruction following the call utterance of the user in the user's directory Cumulative storage,
Wherein the speaker identification unit generates a user voice parameter for context independent speaker identification based on utterance voice information of the atypical natural language instruction cumulatively stored in the user's directory,
Wherein the speaker identification unit performs context dependent speaker identification on the basis of the call speech voice information when the user has uttered the atypical natural language instruction together with the call speech and generates a speech based on the atypical natural language command speech voice information Thereby performing context independent speaker identification.

delete

5. The method of claim 4,
Wherein the speaker identification unit performs speaker identification for the user based on the result of the context dependent speaker identification and the result of the context independent speaker identification.