WO2019054681A1

WO2019054681A1 - Method for providing artificial intelligence secretarial service, and voice recognition device used therefor

Info

Publication number: WO2019054681A1
Application number: PCT/KR2018/010229
Authority: WO
Inventors: 정희석; 진세훈; 이형엽; 임형택
Original assignee: (주)파워보이스
Priority date: 2017-09-13
Filing date: 2018-09-03
Publication date: 2019-03-21
Also published as: KR20190030081A; KR102087202B1

Abstract

Disclosed are a method for providing an artificial intelligence secretarial service, and a voice recognition device used therefor. The present invention is implemented through a process performed by a voice recognition device, the process comprising: receiving an input of a call word voice from a user; determining whether a call word input by the user matches a preconfigured call word; and when it is determined that the call word matches the preconfigured call word and when a service request voice has been input by a user, authenticating a speaker, by comparing the service request voice with a preconfigured parameter for analysis of a voice print of a user. According to the present invention, a user can continuously use an artificial intelligent secretarial service without needing to repeatedly input a predetermined call word. In addition, according to the present invention, speaker authentication, which is separately performed in response to a service request from a user, can prevent an erroneous operation caused by an unauthorized third party's voice.

Description

A method for providing an artificial intelligence secretary service, and a speech recognition apparatus used therein

The present invention relates to a method for providing an artificial intelligent secretary service and a speech recognition apparatus used therein. More specifically, the present invention enables a user to continuously use an artificial intelligent secretarial service without repeatedly inputting a predetermined call word, It is possible not only to prevent a malfunction caused by an unauthorized third party's voice by separately executing a speaker authentication procedure for a service request of a user but also to prevent a malicious operation of a service request when a service request voice is cumulatively inputted from a plurality of authorized users A secretary service providing method, and a voice recognition device used therefor.

Recently, the artificial intelligence secretary service using voice recognition technology has been widely launched at home and abroad, and the world market of artificial intelligent speaker is expected to reach about 2.5 trillion won in 2020, and the related market size will increase sharply It is expected.

Meanwhile, the artificial intelligent speaker according to the related art requires a user to utter a predetermined call word for switching to an active mode (wake-up mode) in an operation standby state, and an artificial intelligent speaker When there is a service provision request of the user in the activated state, speech recognition of the voice of the request and provision of the service is performed.

In this way, the utterance of the caller required for activating the artificial intelligent speaker according to the prior art is not required only once, and even if the same user requests additional artificial intelligence service at a time interval, A cumbersome procedure is required to turn the artificial intelligent speaker into an active state by uttering a call word.

In addition, when the caller is recognized, the artificial intelligent speaker according to the related art provides a service according to the service request without performing any authentication for the user after the service request voice of the user.

For this reason, when there are a large number of users (A, B, C, D ..) in the space where the artificial intelligent speaker is installed, the user A inputs the caller and then the user B in the same space The artificial intelligent speaker recognizes the voice of the user A as a service request voice to cause a malfunction.

In addition, since the artificial intelligent speaker according to the related art can not discriminate and recognize the voices of a plurality of users A, B, C, and D, the service request of the user A, the service request of the user B, In the case where service requests are sequentially performed, there is a technical limitation in that these requests can not be divided and processed according to users, and each request can be processed in parallel.

Accordingly, it is an object of the present invention to provide a method and system for enabling a user to continuously use an artificial intelligent secretarial service without repeatedly inputting a predetermined call word, and separately executing a speaker authentication procedure for a service request of a user, It is possible to prevent a malfunction caused by a voice of a person who is not authorized by the user and also to classify and process the service request in a case where the service request voice is cumulatively input from a plurality of authorized users, And a speech recognition device used therein.

According to another aspect of the present invention, there is provided a method for providing an artificial intelligent assistant service, comprising the steps of: (a) receiving speech data from a user; (b) determining whether the voice recognition device matches the caller entered by the user with a preset caller; And (c) when the voice recognition device judges that the caller matches the preset caller, and when the service request voice is input from the user, And authenticating the speaker by comparing the parameters for analysis.

Preferably, before the step (a), the speech recognition apparatus further includes outputting the call alert voice guidance voice.

In addition, the set call word is a call word arbitrarily selected by the user.

The method further includes the step of (d) outputting a voice guiding the authentication result of the service use authority executed based on the ID of the speaker authenticated in the step (c).

The method further includes (d) determining, by the voice recognition apparatus, a service content to be provided to the user based on the ID of the speaker authenticated in step (c).

According to another aspect of the present invention, there is provided a speech recognition apparatus comprising: an input unit for receiving a speech sound from a user; And determining whether or not the caller input by the user coincides with a predetermined caller, and when it is determined that the caller matches the preset caller, and if the service request voice is input from the user, And a speaker authentication unit for authenticating the speaker by comparing the voice print analysis parameter of the set user voice.

Preferably, the mobile communication terminal further includes an output unit for outputting the calling speech voice guidance voice.

In addition, the set call word is a call word arbitrarily selected by the user.

The apparatus further includes an output unit outputting a voice for guiding an authentication result of the service utilization right executed based on the ID of the authenticated speaker.

And a determination unit for determining a service content to be provided to the user based on the ID of the authenticated speaker.

According to the present invention, the user can continuously use the AI secretarial service without having to repeatedly input a predetermined call word.

In addition, according to the present invention, it is possible to prevent a malfunction caused by an unauthorized third party's voice by executing a speaker authentication separately for a service request of a user.

In addition, according to the present invention, when a service request voice is cumulatively input from a plurality of authorized users, the service request can be divided and processed for each user.

1 is a configuration diagram of an artificial intelligent assistant service providing system according to a first embodiment of the present invention;

2 is a functional block diagram illustrating a structure of a speech recognition apparatus according to a first embodiment of the present invention.

3 is a flowchart illustrating a speaker authentication method in the speech recognition apparatus according to the first embodiment of the present invention, and FIG.

4 is a signal flow diagram illustrating an execution procedure of the artificial intelligent assistant service providing method according to the second embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings. It is to be noted that the same elements among the drawings are denoted by the same reference numerals whenever possible. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

1 is a configuration diagram of an artificial intelligent assistant service providing system according to an embodiment of the present invention. Referring to FIG. 1, an artificial intelligent assistant service providing system according to an embodiment of the present invention includes a voice recognition device 100 and a service providing server 200.

The voice recognition apparatus 100 is an artificial intelligent speaker equipped with a voice recognition function, and is installed in a space such as a living room in which a user resides. In implementing the present invention, the voice recognition apparatus 100 includes a smart phone .

The voice recognition apparatus 100 guides the user to utter a predetermined call word (for example, 'silo') for a user registration procedure in the artificial intelligence secretary service, and then performs voice registration for each user according to the utterance of the user And authenticates the user on the basis of the voice information of the callee through the voice recognition function when the user utters the callee after completing voice registration for each user.

More sophisticatedly, the speech recognition apparatus 100 can be implemented as a speaker identification method that combines text dependent speaker recognition via an alias keyword and text independent speaker recognition based on an atypical natural language instruction .

In addition, the speech recognition apparatus 100 generates and stores parameter values for voice print analysis such as a frequency bandwidth and an amplitude spectrum in a user's speech voice signal and an unstructured natural language command speech (service request speech) signal, If there is an input of the service request voice of the user, the voice parameter values in the voice of the service request are compared with the previously stored parameter values, so that authentication through the text-independent speaker recognition method for the speaker Perform the procedure.

On the other hand, when the speech recognition apparatus 100 performs the speaker authentication procedure using the parameter for the grammar analysis, various conventional methods such as Korean Patent Laid-Open No. 10-2012-72906 may be used.

On the other hand, the service providing server 200 is a server installed and operated by a company that manufactures and sells the voice recognition apparatus 100 such as an artificial intelligent speaker. In the service providing server 200, a user who subscribes to the artificial intelligence secretary service Information on user personal information such as ID, age, sex, and preferred content information provided by the user at the time of subscription to the service, and information on the scope of use of the service for each user.

2 is a functional block diagram illustrating the structure of a speech recognition apparatus 100 according to an embodiment of the present invention. 2, the speech recognition apparatus 100 according to an embodiment of the present invention includes an input unit 110, an output unit 130, a speaker authentication unit 150, a determination unit 170, a storage unit 180, And a communication unit 190. [

First, the input unit 110 of the voice recognition apparatus 100 is implemented by a microphone module or the like. When a user speaks a call word, the voice of the user is input through the input unit 110.

The output unit 130 of the voice recognition apparatus 100 may be implemented as a speaker module or the like and may be a voice recognition system such as a voice recognition system (For example, say 'silo' if a beep sounds), and outputs a result of a subsequent service request of the registered user (such as a service unavailable guide, request information information such as weather, etc.) ) To the user.

Meanwhile, in the storage unit 180 of the voice recognition apparatus 100, the caller information set by the manufacturer or the purchaser (user) of the voice recognition apparatus 100 is stored. Also, in the user registration procedure, The voice is correlated with the user's ID for each user, and thereafter, parameter values for the voiceprint analysis in the caller voice signal inputted by the user for activation of the voice recognition apparatus 100, The parameter values for the grammar analysis are cumulatively associated with the user ID.

The speaker authentication unit 150 of the voice recognition apparatus 100 performs a context-dependent speaker identification based on the caller keyword and a context-independent speaker identification based on the unstructured natural language command as a speaker identification method.

In addition, when the service request voice is inputted from the user, the speaker authenticating unit 150 stores the parameter value for the sentence analysis in the service request voice signal as the parameter value of the user's voiceprint analysis stored in the storage unit 180 And performs an authentication procedure for the speaker requesting the service.

That is, the speaker authentication unit 150 separately performs the authentication procedure for the service request voice of the user using the text-independent speaker recognition method.

On the other hand, the determination unit 170 of the voice recognition apparatus 100 performs a function of determining a customized service content to be provided to the user based on the ID information of the speaker identified by the speaker authentication unit 150, The communication unit 190 of the server 100 performs data communication with the service providing server 200 or an external server providing the service contents requested by the user.

3 is a flowchart illustrating a speaker authentication method in the speech recognition apparatus according to the first embodiment of the present invention. Hereinafter, a speaker authentication method in a speech recognition apparatus according to a first embodiment of the present invention will be described with reference to FIG.

The speech recognition apparatus 100 may include a voice for guiding the user to register his / her voice in order to utilize the artificial intelligent assistant service according to the present invention, such as a voice call (for example, 'silo' The voice recognition device 100 outputs voice to the user through the output unit 130. The user speaks the voice call according to voice guidance of the voice recognition device 100, Accordingly, the voice of the user is input through the input unit 110 of the voice recognition apparatus 100.

Then, the user inputs his or her user ID through the input panel separately provided to the input unit 110. As a result, a directory for the user is created in the storage unit 180 of the voice recognition apparatus 100, The ID of the user and the voice information of the caller inputted by the user are stored in association with each other (S210).

Meanwhile, in step S210, the ID input by the user is the ID provided by the user at the time of subscription to the artificial intelligence secret service according to the present invention, so that the ID is the same as the ID stored in the service providing server 200 Lt; / RTI >

In addition, the above-described user registration procedure is repeatedly performed for each of a plurality of users (for example, family members) to be used together with the voice recognition apparatus 100 through the same call word.

After the user registration in step S210 is completed, the user first utters the call word to use the artificial intelligent assistant service according to the present invention, and based on the call word voice information uttered by the user, The controller 150 performs speaker identification for the user (S220). The voice information of the atypical natural language command, which is uttered following the uttered voice to use the artificial intelligent assistant service, (S230). &Lt; / RTI >

That is, if the user subsequently utters the idle language and unstructured natural language commands in order to utilize the artificial intelligence secretarial service according to the present invention, it is possible to use the atypical natural language command spoken by the user through the procedure of the above- The voice information is accumulated in the directory of the corresponding user (S240).

When the voice information of the atypical natural language instruction is accumulated in the directory of the corresponding user in a certain degree or more (for example, a net voice of 30 seconds or more), the speaker authentication unit recognizes the user- Frequency bandwidth, amplitude spectrum, and the like), and the user-specific parameter values thus generated are stored together with the corresponding user's directory (S250).

Accordingly, the speaker authentication unit 150 can independently perform the context-dependent speaker identification based on the caller keyword and the context-independent speaker identification based on the parameter values generated for each user.

When a specific user utteres a caller and an unstructured natural language command sequentially in order to use the artificial intelligent assistant service according to the present invention, the speaker authentication unit 150 determines whether the first speaker identification through the caller keyword (S260), and then the second speaker identification (context independent speaker identification) through the atypical natural language instruction is continuously executed (S270).

The speaker authentication unit 150 receives a sum of a value obtained by applying a predetermined weight to the first speaker identification result value through the context dependent speaker identification method and a value obtained by applying a predetermined weight to the second speaker identification result value through the context independent speaker identification And finally identifies the speaker based on the value (S280).

In the present specification, the final speaker identification method of the speaker authentication unit 150 will be referred to as a hybrid speaker identification method in which the context-dependent speaker identification method and the context-independent speaker identification method are fused.

In the meantime, in implementing the present invention, the speech information of the atypical natural language instruction word in step S270 is cumulatively stored in the user's directory, so that the user-specific speech recognition parameters generated in step S250 are additionally generated, It is desirable that the accuracy of the context independent speaker identification in the authentication unit 150 is continuously improved.

4 is a signal flow diagram illustrating an execution procedure of the artificial intelligent assistant service providing method according to the second embodiment of the present invention. Hereinafter, with reference to FIG. 1, FIG. 2, and FIG. 4, description will be made of an execution procedure of the artificial intelligent assistant service providing method according to an embodiment of the present invention.

4 is a state in which the user registration procedure in the speaker authentication method according to the first embodiment of the present invention shown in FIG. 3 is completed, It is assumed that the hybrid speaker identification method can be executed through cumulative learning in the speaker authentication unit 150. [

First, a user who wishes to use the artificial intelligent assistant service according to the present invention speaks a predetermined caller speech (for example, 'silos') and then successively transmits a service request voice (for example, Recommendation ') (S310).

Accordingly, the speaker identification unit 150 of the voice recognition apparatus 100 can identify the user ID of the corresponding user through execution of the hybrid speaker identification method through steps S260 through S280 (S320).

Meanwhile, in the present invention, even if the third party knows the caller information or has spoken the caller by accident, the third party who has not proceeded with the registration process and the directory creation process according to the user in FIG. 3 described above, (I.e., authentication is disabled), thereby limiting the use of the artificial intelligent assistant service according to the present invention.

After completing the speaker identification process, the voice recognition device provides the related service through voice analysis and recognition of the service request voice in step S310.

The user then inputs a voice requesting a specific service desired by the user through the input unit 110 of the voice recognition device 100 without generating another call (S330).

Accordingly, the speaker authentication unit 150 of the voice recognition apparatus 100 cumulatively stores the parameter values for the voiceprint analysis in the service request voice signal in the storage unit 180 (S335) And the parameter value for the grammar analysis in the service request speech signal stored in the above-described step S335 are compared with each other to judge whether or not the parameters match, thereby executing the parameter authentication procedure, which is an additional authentication procedure for the user S340).

As a result, when it is determined that the parameter values do not coincide with each other, the voice recognition apparatus 100 transmits a call speech announcement voice message such as 'Please call the caller first to use the service' To the user (S345).

As described above, according to the present invention, by executing the additional authentication procedure for determining whether or not the parameter value for the voice-sentence analysis in the service request voice signal of the user coincides with the parameter value for the voice- It is possible to prevent the voice recognition device 100 from erroneously recognizing the voice uttered by the user B in the same space without the intention of the service request as the voice of the service request of the user A .

In addition, when the present invention is implemented, even when the user who has made the service request (primary service request) in the above-described step S330 requests the service again (secondary service request), the speaker authentication The unit 150 cumulatively stores the parameter values for the sentence analysis in the secondary service request speech signal in the storage unit and then transmits the stored parameter values to the storage unit 1 through step S335 described above with respect to the primary service request in the above- A parameter value for voiceprint analysis in the voice service request signal or a parameter value for voiceprint analysis generated and stored in step S250 is determined and the parameter authentication procedure for the user is executed.

As described above, according to the present invention, in the case of the second service request from the same user, the parameter value for the grammar analysis (or the parameter value for the grammar analysis generated and stored in the step S250) (Context independent speaker authentication) through comparison with the previous service request, and if there is a tertiary service request, the parameter value for the grammar analysis, which was stored at the time of the previous service request (primary or secondary service request) By performing the user authentication (context independent speaker authentication) through comparison with the parameter values for the generated and stored sentence analysis in step S250, the user who uttered the call word once in step S310 repeatedly It is possible to utilize continuous artificial intelligence service through speech recognition equipment through context independent speaker authentication procedure do.

In addition, in the present invention, in the storage unit 180 of the voice recognition apparatus 100, parameter values for the voiceprint analysis in the caller speech of the corresponding user for each ID of a plurality of users, The speaker authentication unit 150 of the voice recognition apparatus 100 may include a plurality of users inputting the voice of the caller in step S315, The service requests of a plurality of users can be recognized separately for each user even when the service requests are sporadically requested.

As a result, if the primary service request of the user A, the service request of the user B, and the secondary service request of the user A are sequentially performed, for example, the voice recognition device associates the first service request of the user A with the second service request of the user A It is possible to process the service request in a linked manner.

If it is determined in step S340 that the parameter values for the sentence analysis match, the speaker authentication unit 150 of the speech recognition apparatus 100 recognizes and analyzes the service request speech of the user, Transmits the service use authorization authentication request message including the user ID information and the requested service content to the service providing server 200 at step S350.

Meanwhile, when the speaker authentication unit 150 performs the service request speech analysis and the speech recognition, the speech analysis and recognition technology in various speech recognition services according to the related art may be used.

Then, the service providing server 200 executes the authentication procedure for the usage right of the requested service based on the user's ID information and the requested service content information included in the service usage right request message received from the voice recognition device 100 (S355).

Specifically, the service providing server 200 may store user-specific information provided by the user in the step of joining the artificial intelligence assistant service according to the present invention as shown in Table 1 below.

In the meantime, in the present invention, the same information as shown in Table 1 may be stored in the storage unit 180 of the voice recognition apparatus 100 as well.

사용자 IDUser ID	생년월일date of birth	성별gender	접근 제한 서비스Access Restriction Service
KIM77KIM77	1977.08.12.December 7, 1977.	남south	없음none
PARK78PARK78	1978.05.01.May 1, 1978.	여female	유료 콘텐츠Paid content
KIM08KIM08	2008.12.15.December 15, 2008.	남south	성인용 콘텐츠Adult content

If the user ID included in the service use authorization authentication request message in step S350 is' KIM08 'and the requested service content included in the corresponding message is' viewing adult movie content', the service providing server 200 transmits' The access right rejection message is transmitted to the voice recognition apparatus 100 in step S360 without authenticating the usage right of the service based on the user information in Table 1. In step S360, The output unit 130 outputs a service unavailable guidance voice such as 'the requested service is unavailable' to the corresponding user.

Meanwhile, when the service providing server 200 authenticates the service using right in step S355, the service providing server 200 transmits a service use permission completion message to the voice recognition apparatus 100 (S370) Accordingly, the voice recognition apparatus 100 performs service provision according to the service request in step S330 (S375).

In the case where the user information in Table 1 is stored in the storage unit 180 of the voice recognition apparatus 100 in the present invention, May be executed by itself through the determination unit 170 of the control unit 100.

In executing the service provision in the above-described step S375, the voice recognition apparatus 100 may provide a personalized service based on the user ID of the speaker identified in step S325 and related information in Table 1 will be.

Specifically, in step S320, the speaker authentication unit 150 of the voice recognition apparatus 100 searches the service request voice in step S330 for ' The determination unit 170 of the voice recognition apparatus 100 determines the personalized content for 'PARK78' as 'PARK78' based on the user information in Table 1 and the voice analysis result for the service request voice, American drama '.

Specifically, in step S375, the determination unit 170 of the voice recognition apparatus 100 provides preferential content information of 'PARK78' to the service providing server 200 and voice recognition The other female members belonging to the age range of 'PARK78' among the 'American drama / family movie / latest song' which is the favorite content information additionally stored together with the user information in the above table 1 in the storage unit 180 of the apparatus 100 The US drama ', which is a content having a relatively high preference, can be determined as' US drama' as customized content for 'PARK78'.

Accordingly, the determination unit 170 of the voice recognition apparatus 100 generates a customized service proposal message such as 'Would you like to watch American dramas recommended by the silos?', 130 outputs a voice message to the user.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention.

The present invention is recognized as being industrially applicable in the field of speech recognition service industry.

Claims

(a) a voice recognition device receiving voice of a caller from a user;

(b) determining whether the voice recognition device matches the caller entered by the user with a preset caller; And

(c) when the voice recognition device judges that the caller matches the predetermined caller, and when a service request voice is input from the user, voice call analysis of the service request voice and predetermined user voice Authenticating the speaker by comparing the parameters for the speaker

The method comprising the steps of:
The method according to claim 1,

Before the step (a)

Further comprising the step of the voice recognition device outputting the call alert utterance announcement voice.
The method according to claim 1,

Wherein the set call word is a call word arbitrarily selected by the user.
The method according to claim 1,

(d) outputting a voice guiding the voice recognition device to the authentication result of the service use right executed based on the ID of the speaker authenticated in the step (c) .
An input unit for receiving a call voice from a user; And

Determining whether or not the caller input by the user coincides with a preset caller, determining that the caller matches the preset caller, and when receiving the service request voice from the user, A speaker authentication unit for authenticating a speaker by comparing parameters for voice print analysis of the user voice,

A voice recognition device.
6. The method of claim 5,

And an output unit for outputting the call alert voice guidance voice.
6. The method of claim 5,

Wherein the set call word is a call word arbitrarily selected by the user.
6. The method of claim 5,

Further comprising: an output unit outputting a voice for guiding an authentication result on the service usage right executed based on the ID of the authenticated speaker.