WO2023128586A1

WO2023128586A1 - Artificial intelligence-based dialogue situation prediction and intention classification system, and method thereof

Info

Publication number: WO2023128586A1
Application number: PCT/KR2022/021461
Authority: WO
Inventors: 정호영; 김준우; 윤혜경; 윤은지
Original assignee: 경북대학교 산학협력단
Priority date: 2021-12-28
Filing date: 2022-12-28
Publication date: 2023-07-06

Abstract

An artificial intelligence-based dialogue situation prediction and intention classification system according to the present invention comprises at least one processor that determines the gender and age of a user on the basis of a user's request utterance uttered from one point in time to another point in time, determines the utterance intention of the user by converting the request utterance of the user to text, predicts an expected utterance of the user after the other point in time, and generates a response to the user's request utterance on the basis of the predicted expected utterance and the utterance intention of the user.

Description

Dialogue situation prediction and intention classification system based on artificial intelligence and its method

The present invention relates to a conversation situation prediction and intention classification system and method based on artificial intelligence.

Recently, as technology using artificial intelligence develops, the demand for artificial intelligence speakers is increasing.

In general, an artificial intelligence speaker can recognize a voice provided by a user and generate and output a response based on a built-in algorithm. Users can conveniently access various information using artificial intelligence speakers.

On the other hand, due to a noisy environment or various external/internal factors, the artificial intelligence speaker cannot accurately recognize the user's voice and may provide inaccurate information to the user.

In addition, since a response is generated based on the user's speech, information that can be provided to the user may be limited. Accordingly, since the user has to use the artificial intelligence speaker several times to obtain a desired response, efficiency may decrease in terms of information acquisition.

Accordingly, there is a need for a technology capable of operating in various environments and efficiently providing various information to users.

In addition, the present invention is a Ministry of Science and ICT Ministry of Information, Communication and Broadcasting Innovation Talent Fostering (R & D) (Task identification number: 1711125907, task number: 2020-0-01808-002, research task name: complex information-based predictive intelligence innovative technology research, It was derived from a study conducted as part of the project management agency: Information and Communications Technology Planning and Evaluation Institute, project executing agency: Kyungpook National University Industry-University Cooperation Foundation, research period: 2021.01.01. ~ 2021.12.31.). Meanwhile, there is no property interest of the Korean government in any aspect of the present invention.

A technical problem to be solved by the present invention is to provide a conversation situation prediction and intention classification system and method based on artificial intelligence that recognizes a user's voice according to the user's gender and age.

In addition, a technical problem to be solved by the present invention is to provide a conversation situation prediction and intention classification system and method based on artificial intelligence for determining a user's speech intention based on a user's voice.

In addition, a technical problem to be solved by the present invention is to provide a dialogue situation prediction and intention classification system and method based on artificial intelligence that predicts the user's speech after the user's speech based on the user's speech intention.

In addition, a technical problem to be solved by the present invention is to provide a dialogue situation prediction and intention classification system and method based on artificial intelligence that generates a response based on the user's speech intention and the predicted user's speech.

In the dialog situation prediction and intention classification system based on artificial intelligence including at least one processor according to an embodiment of the present invention, the at least one processor determines the user's request utterance from one point in time to another point in time. A voice determination unit that determines the gender and age of the user, a voice processing unit that converts the user's requested utterance into text to determine the user's utterance intention, and determines the user's predicted utterance after another point in time, and the user's utterance intention and predicted prediction and a response generation unit that generates a response to the user's requested utterance based on the utterance.

In addition, the voice determination unit according to an embodiment of the present invention, the voice extraction unit for extracting the first voice data based on the requested utterance at a certain point in time, the first voice data as an input value of the first decision algorithm stored in advance A first deep learning unit that calculates the user's gender and age as a first probability value and determines the user's gender and age based on the first probability value; and and a voice recognizer distribution unit recognizing the requested utterance and extracting second voice data.

In addition, the first deep learning unit according to an embodiment of the present invention determines the gender and age corresponding to the highest probability value among the first probability values as the gender and age of the user.

In addition, the voice recognizer distribution unit according to an embodiment of the present invention includes a plurality of different voice recognizers for recognizing a user's requested utterance and generating second voice data corresponding to the determined gender and age.

In addition, the voice processing unit according to an embodiment of the present invention, a voice-to-text conversion unit for converting the second voice data into a first converted text, and inputting the first converted text as an input value of a first deep learning algorithm stored in advance A second deep learning unit that determines the first request text corresponding to the user's requested utterance, and inputs the first request text as an input value of a pre-stored intention classification algorithm to generate speech intention data for the utterance intention. and a speech intention prediction unit for generating predicted speech data for predicted speech by inputting the degree classification unit and speech intention data as input values of an intention prediction algorithm stored in advance.

In addition, the second deep learning unit according to an embodiment of the present invention inputs the converted text input unit that receives the first converted text, and inputs the first converted text as an input value of the first deep learning algorithm stored in advance to obtain the first request text. It includes a first voice model deep learning unit to generate and a request text output unit to output a first request text.

In addition, the second deep learning unit according to an embodiment of the present invention includes a reference text storage unit for storing a first reference text corresponding to a user's requested utterance, and a first error rate value between the first converted text and the first requested text. It further includes a first error rate calculation unit that calculates.

In addition, the second deep learning unit according to an embodiment of the present invention inputs the first reference text as an input value of the second deep learning algorithm stored in advance to perform deep learning, the second voice model deep learning unit, the first reference A second error rate calculation unit that calculates a second error rate value between the text and a second reference text that is an output value of the second voice model deep learning unit, and the first voice model deep learning unit deepens based on the first error rate value and the second error rate value. It further includes a weight value calculation unit that calculates weight values for running.

In addition, the first deep learning unit for speech model according to an embodiment of the present invention sets the weight value as the weight value of the first deep learning algorithm stored in advance and inputs the first converted text as an input value to perform deep learning.

In addition, the speech intention classification unit according to an embodiment of the present invention inputs the first request text as an input value of the intention classification algorithm stored in advance to generate speech intention data obtained by calculating the speech intention as a second probability value.

In addition, the speech intention classification unit according to an embodiment of the present invention determines the speech intention having the highest probability value among the second probability values as the user's speech intention.

In addition, the speech intention prediction unit according to an embodiment of the present invention inputs speech intention data as an input value of an intention prediction algorithm stored in advance to generate predicted speech data obtained by calculating predicted speech as a third probability value.

In addition, the speech intention classification unit according to an embodiment of the present invention determines the predicted utterance having the highest probability value among the third probability values as the user's predicted utterance.

In addition, the response generator according to an embodiment of the present invention uses speech intention data and predicted speech data as input values of a response algorithm stored in advance to generate a response text in response to a user's requested utterance, and a response text generator. It includes a text-to-speech conversion unit that converts into voice data.

In addition, in the voice processing method based on artificial intelligence by a voice processing system based on artificial intelligence including at least one processor according to an embodiment of the present invention, the at least one processor includes a voice determination unit, a voice processing unit, and Including a response generation unit, determining the user's gender and age based on the user's requested utterance from one point to another by the voice determination unit, converting the user's requested utterance into text by the voice processing unit, and Determining the utterance intention of the user and determining the predicted utterance of the user after another point in time, and generating a response to the user's requested utterance based on the user's utterance intention and the predicted predicted utterance by a response generating unit. do.

In addition, it includes a computer-readable non-transitory recording medium on which a program for executing a voice processing system based on artificial intelligence according to an embodiment of the present invention is recorded.

The dialogue situation prediction and intention classification system based on artificial intelligence according to the present invention can accurately recognize a user's voice according to the user's gender and age.

In addition, the dialogue situation prediction and intention classification system based on artificial intelligence according to the present invention can determine the user's speech intention based on the user's voice.

In addition, the conversation situation prediction and intention classification system based on artificial intelligence according to the present invention can predict the user's utterance after the user's utterance based on the user's utterance intention.

In addition, the dialogue situation prediction and intention classification system based on artificial intelligence according to the present invention may generate a response based on the user's speech intention and the predicted user's speech.

1 is a diagram illustrating a dialogue situation prediction and intention classification system based on artificial intelligence according to an embodiment of the present invention.

2 is a diagram explaining a process of determining the gender and age of a user and extracting second voice data according to an embodiment of the present invention. 3 is a diagram explaining a user's request utterance and first converted text according to an embodiment of the present invention.

4 is a diagram illustrating a second deep learning unit according to an embodiment of the present invention.

5 is a diagram explaining a process of determining a user's utterance intention and predicted utterance and generating a response according to an embodiment of the present invention.

6 is a diagram illustrating a method for predicting conversation situations and classifying intentions based on artificial intelligence according to an embodiment of the present invention.

7 is a diagram explaining a method of adjusting a first deep learning algorithm according to an embodiment of the present invention.

Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may be embodied in many different forms and is not limited to the embodiments set forth herein.

In order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals are assigned to the same or similar components throughout the specification. Therefore, the reference numerals described above can be used in other drawings as well.

In addition, since the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of explanation, the present invention is not necessarily limited to the shown bar. In the drawing, the thickness may be exaggerated to clearly express various layers and regions.

In addition, the expression "the same" in the description may mean "substantially the same". That is, it may be the same to the extent that a person with ordinary knowledge can understand that it is the same. Other expressions may also be expressions in which "substantially" is omitted.

In addition, when a part in the description 'includes' a certain component, it means that it may further include other components, not excluding other components unless otherwise stated. '~ unit' used in this specification is a unit that processes at least one function or operation, and may mean, for example, software, an FPGA, or a hardware component. Functions provided by '~unit' may be performed separately by a plurality of components or may be integrated with other additional components. '~unit' in this specification is not necessarily limited to software or hardware, and may be configured to be in an addressable storage medium or configured to reproduce one or more processors. Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

A dialogue situation prediction and intention classification system 1 based on artificial intelligence according to an embodiment of the present invention includes at least one processor, and the at least one processor includes a voice determination unit 10, a voice processing unit 20, and The response generator 30 may be implemented or included.

The voice determination unit 10 may include a voice extraction unit 100 , a first deep learning unit 110 and a voice recognizer distribution unit 120 . The voice processing unit 20 may include a voice-to-text conversion unit 200, a second deep learning unit 210, a speech intention classification unit 220, and a speech intention prediction unit 230. The response generator 30 may include a response text generator 300 and a text-to-speech converter 310 .

The artificial intelligence speaker 2 may recognize voice based on the user 3's speech. The artificial intelligence speaker 2 may output a response corresponding to the user 3's speech. Through this, the user 3 can obtain various information from the artificial intelligence speaker 2. Hereinafter, the speech of the user 3 input to the artificial intelligence speaker 2 will be referred to as 'request speech'.

The voice determination unit 10 may recognize the requested utterance of the user 3 from one point in time to another point in time.

Hereinafter, it is assumed that the user 3 starts uttering a request at a point in time in FIG. 1, and utters a request saying 'Air, tell me the weather today' to the artificial intelligence speaker 2 from one point to another point in time. do it with At this time, it is assumed that 'Air' is an initial command for operating the artificial intelligence speaker 2.

The voice extracting unit 100 may extract voice data at any point in time based on the requested utterance of the user 3 . Hereinafter, the voice data of the user 3 extracted by the voice extractor 100 at any point in time will be referred to as first voice data.

Specifically, the voice extraction unit 100 may extract first voice data including 'Air,' which is a requested utterance of the user 3 at any point in time.

The first deep learning unit 110 may calculate the gender and age of the user 3 as probability values by using the first voice data as an input value of the first decision algorithm stored in advance. The first deep learning unit 110 may determine the user's gender and age based on the probability value.

Specifically, the first deep learning unit 111 inputs the first voice data as an input value of the first decision algorithm stored in advance so that the gender of the user 3 is male or female and the age is an adult, an elderly person, or a child. A probability corresponding to any one of the above may be calculated as a probability value.

Hereinafter, a probability value calculated by inputting the first voice data in the first deep learning unit 111 as an input value of a first decision algorithm stored in advance will be referred to as a first probability value.

At this time, the first deep learning unit 11 may determine the gender and age corresponding to the highest probability value among the calculated first probability values as the gender and age of the user 3 .

A process in which the first deep learning unit 111 determines the gender and age of the user 3 by inputting the first voice data as an input value of a previously stored decision algorithm will be described in detail with reference to FIG. 2 below.

The voice recognizer distribution unit 120 recognizes the requested utterance of the user 3 from one time point to another according to the gender and age of the user 3, and generates voice data based on the user 3's requested utterance. may include a plurality of different voice recognizers.

The voice recognizer distribution unit 120 determines one voice recognizer corresponding to the user's gender and age, and recognizes the user's requested utterance from one point in time to another point in time using the one voice recognizer. there is.

Any one of the voice recognizers may recognize a user's requested utterance and extract voice data based on it.

Hereinafter, voice data recognized and extracted by any one voice recognizer included in the voice recognizer distribution unit 120 will be referred to as second voice data.

Specifically, when the gender of the user 3 is male and the age of the user 3 is determined to be adult in the first deep learning unit 110, the voice recognizer distribution unit 120 is any one voice recognizer for recognizing the requested utterance of an adult male. can decide Any one of the voice recognizers may recognize an adult male's requested utterance and generate second voice data.

When the gender of the user 3 is female and the age of the user 3 is determined to be adult in the first deep learning unit 110, the voice recognizer distribution unit 120 may select any one voice recognizer for recognizing the requested utterance of an adult female. there is. Any one of the voice recognizers may recognize the requested utterance of an adult female and generate second voice data.

When the gender of the user 3 is determined to be male and the age of the user 3 to be elderly in the first deep learning unit 110, the voice recognizer distribution unit 120 may select one voice recognizer for recognizing the requested utterance of the elderly male. there is. Any one of the voice recognizers may recognize the requested utterance of the elderly male and generate second voice data.

When the gender of the user 3 is female and the age of the user 3 is determined to be elderly in the first deep learning unit 110, the voice recognizer distribution unit 120 may select one voice recognizer for recognizing the requested utterance of the elderly woman. there is. Any one of the voice recognizers may recognize the elderly woman's requested utterance and generate second voice data.

When the gender of the user 3 is male and the age of the user 3 is determined to be child in the first deep learning unit 110, the voice recognizer distribution unit 120 may select one voice recognizer for recognizing the requested utterance of the male child. there is. Any one of the voice recognizers may recognize the boy's requested utterance and generate second voice data.

When the gender of the user 3 is female and the age of the user 3 is determined to be a child in the first deep learning unit 110, the voice recognizer distribution unit 120 may select one voice recognizer for recognizing the requested utterance of a girl child. there is. Any one of the voice recognizers may recognize the girl child's requested utterance and generate second voice data.

That is, the voice recognizer distributing unit 120 provides any one voice for recognizing the requested utterance of the user 3 corresponding to either male or female gender and any one of the elderly, adults, and children. recognizer can be determined.

In addition, any one voice recognizer may recognize the requested utterance of the user 3 from one point in time to another point in time. Any one of the voice recognizers may extract the second voice data based on the requested utterance of the user 3 .

The voice processing unit 20 may determine the user's speech intention by converting the user's requested speech from one point in time to another point in time into text. The voice processing unit 20 may determine the user's predicted utterance after another point in time.

The voice-to-text conversion unit 200 may convert the second voice data extracted from any one voice recognizer included in the voice recognizer distribution unit 120 into text. At this time, the text may contain errors. Hereinafter, text converted by the speech-to-text conversion unit 200 and including errors will be referred to as first converted text.

Specifically, one of the voice recognizers may misrecognize the requested utterance of the user 3 due to various external or internal factors, such as the inclusion of external noise in the process of recognizing the requested utterance of the user 3.

For example, one voice recognizer that recognizes an adult man's requested utterance may mistakenly recognize it as 'Air tell me the schedule for May' based on the adult man's requested utterance 'Air tell me today's weather'. In addition, one of the voice recognizers may extract second voice data composed of 'Air tell me the schedule for May' based on this.

At this time, the voice-to-text conversion unit 200 may convert the second voice data into first converted text, and the first converted text may be composed of 'Air, let me know the schedule for May' including an error.

The second deep learning unit 210 may generate the first request text by inputting the first converted text as an input value of the first deep learning algorithm stored in advance.

Specifically, when the first converted text is input as an input value of the first deep learning algorithm stored in advance in the second deep learning unit 210, the first deep learning algorithm outputs the request text corresponding to the user's speech request as an output value. can create

Hereinafter, the request text generated by the first deep learning algorithm previously stored in the second deep learning unit 210 will be referred to as a first request text.

The second deep learning unit 210 is composed of an input layer, a hidden layer, and an output layer, and the first deep learning algorithm inputs the first converted text as an input value of the input layer and outputs the first request text from the output layer. can mean process.

In this case, the first converted text input to the input layer may be converted into first request text as an output value by adding predetermined weight values in the hidden layer.

The predetermined weight values added in the hidden layer of the second deep learning unit 210 may be reset by weight values including the first error rate and the second error rate to be described in FIG. 4 below, and through the above process, the first deep The learning algorithm can be tuned.

The second deep learning unit 210 may calculate an error rate (or loss value) between the first converted text and the first request text output from the first deep learning algorithm.

Hereinafter, the error rate (or loss value) calculated between the first converted text in the second deep learning unit 210 and the first request text output from the first deep learning algorithm is the first error rate (or the first loss value). ) to be named.

Specifically, when the first converted text is input as an input value of the first deep learning algorithm of the second deep learning unit 210, the first request text output from the first deep learning algorithm corresponds to the user's 'speech request'. may not correspond exactly.

Therefore, in order to fine-tune the first deep learning algorithm to be described later, the second deep learning unit 210 uses the first request text output from the first deep learning algorithm and the first request text input as input values of the first deep learning algorithm. A first error rate (or first loss value) between converted texts may be calculated.

The second deep learning algorithm of the second deep learning unit 210 uses the first reference text (or first transcript text) as an input value of the previously stored second deep learning algorithm, and the output value is the first reference text (or first transcription text). The second deep learning algorithm may be fine-tuned to output the same second reference text (or second transcription text) as the transcription text. At this time, the first reference text (or transcription text) corresponds to a text that exactly corresponds to the user's utterance intention and corresponds to 'Air tell me today's weather' without an error.

Specifically, the second deep learning unit 210 connects the first reference text (or first transcription text) and the second reference text (or second transcription text) output as the output value of the second deep learning algorithm stored in advance. A second error rate (or second loss value) of can be calculated.

In addition, the second deep learning algorithm of the second deep learning unit 210 may fine-tune the second deep learning algorithm based on the second error rate (or the second loss value) and perform deep learning.

The second deep learning unit 210 may calculate a weight value using a first error rate (or a first loss value) and a second error rate (or a second loss value). The second deep learning unit 210 may fine-tune the previously stored first deep learning algorithm using the generated weight values.

Through the above process, even in a noisy environment or when the voice recognizer misrecognizes the user's use utterance, the first deep learning algorithm of the second deep learning unit 210 accurately recognizes the user's requested utterance and text (or , the first request text).

The speech intention classification unit 220 may generate speech intention data for the user's speech intention by inputting the first request text as an input value of the intention classification algorithm stored in advance.

Specifically, the speech intention classification unit 220 may calculate the user's speech intention as a probability value by inputting the first request text 'Air, tell me the weather today' as an input value of a pre-stored intention classification algorithm.

Hereinafter, the probability value calculated by the speech intention classification unit 220 will be referred to as a second probability value.

At this time, the speech intention classification unit 220 may determine the speech intention corresponding to the highest probability value among the calculated second probability values as the user's speech intention. For example, the speech intention classification unit 220 may determine that the user's speech intention is 'weather' of 'today'.

The process of calculating the user's speech intention as a probability value by using the first request text as an input value of the intention classification algorithm stored in advance in the speech intention classification unit 220, and determining the user's speech intention based on this is shown below. It will be described in detail in 5.

The speech intention prediction unit 230 may generate predicted speech data for the user's predicted speech by inputting speech intention data as an input value of an intention prediction algorithm stored in advance.

Specifically, the speech intention prediction unit 230 may calculate the user's predicted speech as a probability value by inputting 'today' and 'weather', which are speech intention data, as input values of an intention prediction algorithm stored in advance. Hereinafter, the probability value calculated by the speech intention prediction unit 230 will be referred to as a third probability value.

In this case, the utterance intention prediction unit 230 may determine the predicted utterance corresponding to the highest probability value among the calculated third probability values as the user's predicted utterance.

For example, the speech intention prediction unit 230 may determine 'clothes' as the user's predicted utterance at a first point in time, which is after another point in time when the user 3's requested utterance ends. In addition, the speech intention prediction unit 230 may determine the user's predicted speech at a second time point after the first time point as a 'place'.

The process of calculating the predicted utterance of the user as a probability value by using the utterance intention data stored in advance in the utterance intention prediction unit 230 as an input value of the intention prediction algorithm, and determining the user's predicted utterance based on this is shown in FIG. 5 below. to be described in detail.

That is, as described above, the speech intention classification unit 220 may determine the user's speech intention from one point in time to another point in time based on the first request text and generate speech intention data. Based on the speech intention data, the speech intention prediction unit 230 may determine the user's predicted speech after another point in time when the user 3's speech request ends, and generate predicted speech data.

The response generation unit 30 generates a response to the user's requested utterance based on the user's speech intention determined by the speech intention classification unit 220 and the user's predicted speech determined by the speech intention prediction unit 230. can do.

Specifically, the response text generator 300 may generate a response to the user's requested utterance by inputting utterance intention data and predicted utterance data as input values of a response algorithm stored in advance.

For example, since the user's utterance intention is about 'weather' of 'today' and the user's predicted utterance is about 'clothes' and 'place', the response text generator 300 selects 'today' Speech intention data including 'weather' and predicted speech data including 'clothes' and 'place' may be input as input values of a previously stored response algorithm.

Based on this, 'today's weather is hot' as the output value of the pre-stored response algorithm. It is recommended to go outside wearing thin and long clothes to prevent air-conditioning disease.' can be generated as an output value.

That is, the response text generating unit 300 says, 'Today's weather is hot. It is recommended to go outside wearing thin and long clothes to prevent air-conditioning sickness.'

The text-to-speech conversion unit 310 may convert the response text generated by the response text generation unit 300 into voice data. The text-to-speech conversion unit 310 may transmit voice data to the artificial intelligence speaker 2 and output the result as a response to the user 3's requested utterance.

2 is a diagram explaining a process of determining the gender and age of a user and extracting second voice data according to an embodiment of the present invention.

The voice extraction unit 100 may extract first voice data ('Air Ya') at any one point in time based on the requested utterance of the user 3. The first deep learning unit 110 may calculate the gender and age of the user 3 as probability values by using the first voice data ('Air') as an input value of the first decision algorithm stored in advance.

Hereinafter, in FIG. 2, the voice extraction unit 100 extracts first voice data ('Air Ya') based on the user 3's requested utterance, and the first deep learning unit 110 extracts the first voice data ( It is assumed that 'Air') is input as an input value of the first decision algorithm stored in advance.

When the first voice data ('Ayeah') is input as an input value of the first decision algorithm stored in advance, a first probability value that the user 3 is male and the age is an adult can be calculated as 0.9 as an output value. . That is, the first deep learning unit 110 may determine that the first probability value that the user 3 has a male gender and an adult age is 0.9.

When the first voice data ('Ayeah') is input as an input value of the first decision algorithm stored in advance, a first probability value that the user 3 has a female gender and an adult age can be calculated as 0.02 as an output value. . That is, the first deep learning unit 110 may calculate a first probability value that the user 3 has a female gender and an adult age as 0.02.

When the first voice data ('Aya') is input as an input value of the first decision algorithm stored in advance, a first probability value that the user 3 is a male and an elderly person can be calculated as 0.03 as an output value. That is, the first deep learning unit 110 may calculate a first probability value of 0.03 when the gender of the user 3 is male and the age is an elderly person.

When the first voice data ('Aya') is input as an input value of the first decision algorithm stored in advance, a first probability value that the user 3 is a woman and an elderly person can be calculated as 0.02 as an output value. That is, the first deep learning unit 110 may calculate a first probability value that the user 3 has a female gender and an elderly age as 0.02.

When the first voice data ('Aya') is input as an input value of the first decision algorithm stored in advance, a first probability value that the user 3 is male and the age is a child can be calculated as 0.02 as an output value. . That is, the first deep learning unit 110 may calculate a first probability value that the user 3 has a male gender and a child age as 0.02.

When the first voice data ('Ayeah') is input as an input value of the first decision algorithm stored in advance, a first probability value that the user 3 has a female gender and a child age can be calculated as 0.01 as an output value. . That is, the first deep learning unit 110 may calculate a first probability value that the user 3 has a female gender and a child age as 0.01.

At this time, the first deep learning unit 110 may determine the gender (male) and age (adult) corresponding to the highest probability value (0.9) among the first probability values as the gender and age of the user 3 .

The voice recognizer distribution unit 120 may determine any one voice recognizer for recognizing an adult male's requested utterance. Any one voice recognizer selected by the voice recognizer distribution unit 120 may recognize a user's requested utterance from a certain point in time to another point in time, and extract second voice data based thereon.

That is, when the gender and age of the user 3 are determined to be an adult male in the first deep learning unit 110, one of the voice recognizers recognizes the requested utterance of the adult male user 3, and based on this, the second Voice data can be extracted.

3 is a diagram explaining a user's request utterance and first converted text according to an embodiment of the present invention.

Based on the gender and age of the user 3, the voice recognizer distribution unit 120 may determine one of the voice recognizers for recognizing the requested utterance of the user 3. Any one of the voice recognizers may recognize the requested utterance of the user 3 and change the first converted text based thereon.

At this time, any one voice recognizer may misrecognize the requested utterance of the user 3 due to various external or internal factors, such as the inclusion of external noise in the process of recognizing the requested utterance of the user 3 .

Referring to FIG. 3 , when the request utterance of the adult male user 3 is 'Air, tell me the weather today', one of the voice recognizers responds to the request utterance of the user 3 as 'Air, tell me the weather today' ' can be misinterpreted.

Based on this, one voice recognizer may extract second voice data including 'Air, make a reservation for May'. Any one of the voice recognizers may convert the second voice data into first converted text consisting of 'Air, make a reservation for May'.

Alternatively, when the request utterance of the adult female user 3 is 'Air, call my daughter Jin-hee', one voice recognizer converts the user 3's request utterance to 'Air, ## tell Genie'. may be misinterpreted.

Based on this, any one voice recognizer may extract second voice data including 'Air ## Tell it to Genie'. Any one of the voice recognizers may convert the second voice data into first converted text composed of 'Air ## Tell it to Genie'.

Alternatively, if the requested utterance of the user (3), a child and a male, is 'Play Air Pororo at 6 o'clock', one of the voice recognizers sends the requested utterance of the user (3) to 'Play Air Pororo at 6 o'clock' It can be misinterpreted as 'give'.

Based on this, one of the voice recognizers may extract second voice data including 'play the air lottery game at 6 o'clock'. Any one of the voice recognizers may convert the second voice data into a first converted text consisting of 'play the air lottery game at 6 o'clock'.

Alternatively, when the request utterance of the user 3, a child or a woman, is 'Air should put it in the YouTube playlist for studying English', one voice recognizer responds to the request utterance of the user 3 as 'Air should ## study English'. It can be mistakenly recognized as 'don't play YouTube'.

Based on this, one of the voice recognizers may extract second voice data including 'Air ## do not play YouTube to study'. Any one of the voice recognizers may convert the second voice data into first converted text composed of 'Don't play ##StudyingYouTube.'

As described above, any one voice recognizer may misrecognize the requested utterance of the user 3 due to various external or internal factors, such as the inclusion of external noise in the process of recognizing the requested utterance of the user 3. there is. Based on this, second voice data may be extracted and first converted text may be generated.

Hereinafter, it is assumed that the gender and age of the user 3 in FIGS. 4 and 5 are male and adult, and the request utterance of the user 3 is 'Air tell me today'.

The second deep learning unit 210 includes a converted text input unit 211, a first speech model deep learning unit 212, a request text output unit 213, a reference text storage unit 214, and a second speech model deep learning unit. 215, a weight value calculation unit 216, a first error rate calculation unit 2120, and a second error rate calculation unit 2150 may be included.

The converted text input unit 211 may receive the first converted text converted by any one voice recognizer.

Specifically, the converted text input unit 211 may receive a first converted text consisting of 'Air, make a reservation for May'.

In the first voice model deep learning unit 212, the first converted text may be input as an input value of a first deep learning algorithm stored in advance. The first deep learning algorithm previously stored in the first voice model deep learning unit 212 may generate the first request text as an output value.

The first voice model deep learning unit 212 may adjust the first deep learning algorithm using the weight values provided by the weight value calculation unit 216 to be described below. The first voice model deep learning unit 212 may perform deep learning by inputting the first converted text as an input value of the adjusted first deep learning algorithm and outputting the first request text.

The first error rate calculation unit 2120 includes the first converted text input to the first voice model deep learning unit 212 generated by the first speech model deep learning unit 212 and the first converted text output from the first deep learning algorithm. A first error rate (or a first loss value) between request texts may be calculated. The first error rate calculation unit 2120 may provide the first error rate (or first loss value) to the weight value calculation unit 216 .

The request text output unit 213 transmits the first request text ('Air, tell me today's weather') generated by the first deep learning algorithm that is pre-stored in the first voice model deep learning unit 212 and finely tuned to the response generator. (30, see FIG. 1).

The reference text storage unit 214 may store the first reference text (or first transcription text) in advance.

Specifically, the reference text storage unit 214 stores in advance a first reference text (or first transcription text) composed of 'Air, make a reservation for May', which corresponds to the user's requested utterance and does not include an error. can

The second voice model deep learning unit 215 may receive the first reference text (or first transcription text) from the reference text storage unit 214 . The second voice model deep learning unit 215 inputs the first reference text (or first transcription text) as an input value of the second deep learning algorithm stored in advance, and uses the first reference text (or first transcription text) The second deep learning algorithm may be fine-tuned to output the same second reference text (or second transcript text) and deep learning may be performed.

Specifically, the second voice model deep learning unit 215 sets the first reference text (or first transcription text) provided from the reference text storage unit 214 as an input value of the second deep learning algorithm stored in advance, Deep learning may be performed by using the same second reference text (or second transcription text) as the first reference text (or first transcription text) as an output value of the second deep learning algorithm stored in advance.

The second error rate calculation unit 2150 includes the first reference text (or first transcription text) and the first reference text (or, A second error rate between the second reference text (or the second transcription text) outputted from the pre-stored second deep learning algorithm after inputting the first transcription text may be calculated.

The second error rate calculation unit 2150 may provide the second error rate to the weight value calculation unit 216 .

The weight value calculation unit 216 may calculate weight values for fine-tuning the first deep learning algorithm of the first voice model deep learning unit 212 based on the first error rate value and the second error rate value.

Specifically, the weight value may be expressed as [Equation 1] below.

Weight value = a*first error rate value + b*second error rate value

However, a and b are 0.5

The weight value calculation unit 216 may provide the weight values to the first voice model deep learning unit 212 . The weight value provided by the weight value calculation unit 216 is determined as the weight value of the first voice model deep learning unit 212 to adjust the first deep learning algorithm, and the first converted text is used to determine the weight value of the adjusted first deep learning algorithm. The first voice model deep learning unit 212 may be deep-learned by inputting an input value.

That is, the weight values provided from the weight value calculator 216 may be determined as weight values added in the hidden layer. Through the above process, the first deep learning algorithm can be fine-tuned.

As described above, referring to FIG. 5 , the first voice model deep learning unit 212 according to an embodiment of the present invention inputs the first converted text containing errors as an input value of the first deep learning algorithm stored in advance. Thus, the error is removed, and the first request text exactly corresponding to the user's request utterance can be determined.

In addition, the first voice model deep learning unit 212 according to an embodiment of the present invention is a first error rate value (or a first loss value) calculated between the first converted text containing errors and the first request text. ) and the first reference text (or first transcription text) as an input value and the second reference text (or second transcription text) as an output value, and a weight value including a second error rate value calculated as a first voice model The first deep learning algorithm may be fine-tuned using the weight value of the deep learning unit 212 .

In addition, the first voice model deep learning unit 212 may input the first conversion test as an input value of the adjusted first deep learning algorithm and generate a first request text that exactly corresponds to the user's request utterance.

Through this, it is possible to increase the accuracy of the first deep learning algorithm, and even if the first conversion text containing an error is input, the error can be removed and the first request text that accurately corresponds to the user's request utterance can be determined.

The speech intention classification unit 220 may receive a first request text corresponding to the user's requested speech. The speech intention classification unit 220 may calculate the user's speech intention as a second probability value by inputting the first request text as an input value of the intention classification algorithm stored in advance. The speech intention classification unit 220 may determine the speech intention based on the user's speech request based on the second probability value.

Referring to FIG. 5 , the speech intention classification unit 220 inputs the first request text ('Let me know the weather today') as an input value of a pre-stored intention classification algorithm, and converts the user's speech intention into a second probability value. can be calculated

Specifically, when the first request text is input as an input value of the intention classification algorithm stored in advance, the speech intention classification unit 220 may calculate a second probability value that the user's speech intention is 'weather' as 0.9. . The speech intention classification unit 220 may calculate a second probability value that the user's speech intention is 'clothes' as 0.1.

The speech intention classification unit 220 may determine the speech intention ('weather') corresponding to the highest probability value (0.9) among the second probability values as the speech intention based on the user's speech request. The speech intention classification unit 220 may generate speech intention data including 'weather' based on the user's speech intention.

The speech intention prediction unit 230 may receive speech intention data corresponding to the user's speech intention. The speech intention prediction unit 230 may calculate the user's predicted speech as a third probability value by inputting the speech intention data as an input value of the intention prediction algorithm stored in advance. The speech intention prediction unit 230 may determine the user's predicted speech based on the third probability value.

Referring to FIG. 5 , the speech intention prediction unit 230 inputs speech intention data ('weather') as an input value of an intention prediction algorithm stored in advance to predict the user's speech at a first time point after the next time point. It can be calculated with 3 probability values.

Specifically, when speech intention data is input as an input value of an intention prediction algorithm stored in advance, the speech intention prediction unit 230 calculates a third probability value that the user's predicted utterance is 'clothes' at the first point in time as 0.6. can do. The utterance intention prediction unit 230 may calculate a third probability value that the predicted utterance of the user at the first time point after the next time point is 'place' as 0.4.

The utterance intention prediction unit 230 may determine the predicted utterance ('clothing') corresponding to the highest probability value (0.6) among the third probability values as the user's predicted utterance at the first time point. The speech intention prediction unit 230 may generate first predicted speech data including 'clothes' based on the user's predicted speech.

In addition, the speech intention prediction unit 230 inputs the first predicted speech data ('clothes') as an input value of the intention prediction algorithm stored in advance to predict the user's predicted speech at the second time point after the first time point as a third probability value. can be calculated as

Specifically, when the first predicted speech data is input as an input value of the intention prediction algorithm stored in advance, the speech intention prediction unit 230 sets the third probability value that the user's predicted speech is 'song' at the second point in time to 0.3. can be calculated The utterance intention prediction unit 230 may calculate a third probability value that the predicted utterance of the user at the second time point is 'place' as 0.7.

The utterance intention prediction unit 230 may determine the predicted utterance ('place') corresponding to the highest probability value (0.7) among the third probability values as the user's predicted utterance at the second point in time. The speech intention prediction unit 230 may generate second predicted speech data including 'place' based on the user's predicted speech.

The speech intention classification unit 220 may provide speech intention data ('weather') and predicted speech data ('clothes', 'place') to the response generator 30.

The response text generator 300 may generate a response text in response to a user's requested utterance by inputting utterance intention data and predicted utterance data as input values of a previously stored response algorithm.

Specifically, when speech intention data ('weather') and predicted speech data ('clothes', 'place') are input as input values of a pre-stored response algorithm, 'Today's weather is hot. A response text consisting of 'It is recommended to wear thin and long clothes to prevent air-conditioning sickness' may be generated as an output.

That is, the response text generator 300 says 'Today's weather is hot. A response text consisting of 'We recommend thin and long clothes to prevent air-conditioning sickness' can be created.

The text-to-speech conversion unit 310 may convert response text ('The weather is hot today. We recommend wearing thin and long clothes to prevent air-conditioning sickness.') into voice data. The text-to-speech conversion unit 310 may transmit voice data to the artificial intelligence speaker 20 .

The artificial intelligence speaker 2 can output voice data ('The weather is hot today. We recommend wearing thin and long clothes to prevent air-conditioning sickness') as a response to the user's request.

6 is a diagram illustrating a voice processing method based on artificial intelligence according to an embodiment of the present invention.

In step S10, the voice extraction unit may extract first voice data based on the user's requested utterance at any point in time.

Specifically, when the user 3 makes a requested utterance to the artificial intelligence speaker 2 from one point in time to another point in time, the voice extraction unit 100 performs a first step based on the user's requested utterance at a certain point in time. Voice data can be extracted.

In step S11, the first deep learning unit may determine the gender and age of the user by inputting the first voice data as an input value of the first determination algorithm stored in advance.

Specifically, the first deep learning unit 110 may calculate the gender and age of the user 3 as a first probability value by using the first voice data as an input value of the first decision algorithm stored in advance. At this time, the first deep learning unit 11 may determine the gender and age corresponding to the highest probability value among the calculated first probability values as the gender and age of the user 3 .

In step S12, the voice recognizer distribution unit may determine one voice recognizer corresponding to the user's gender and age.

Specifically, the voice recognizer distribution unit 120 recognizes the requested utterance of the user 3 from one point to another according to the gender and age of the user 3, and a plurality of different voice recognizers that generate voice data. can include The voice recognizer distribution unit 120 may determine one voice recognizer corresponding to the user's gender and age.

In step S13, any one voice recognizer may extract second voice data based on the user's requested utterance from one point in time to another point in time.

Specifically, it is possible to recognize the requested utterance of the user 3 from one point in time to another point in time using any one voice recognizer corresponding to the user's gender and age. Any one of the voice recognizers may recognize the requested utterance of the user 3 and extract the second voice data based on it.

In step S14, the voice-to-text converter may convert the second voice data into the first converted text.

Specifically, the voice-to-text conversion unit 200 may convert second voice data extracted from any one voice recognizer included in the voice recognizer distribution unit 120 into first converted text.

In step S15, the first requested text may be generated by inputting the first converted text as the input value of the first deep learning algorithm stored in advance in the first voice model deep learning unit.

In step S16, the first request text may be input as an input value of the intention classification algorithm previously stored in the speech intention classification unit to determine the speech intention and generate speech intention data.

Specifically, the speech intention classification unit 220 may calculate the user's speech intention as a second probability value by inputting the first request text as an input value of the intention classification algorithm stored in advance. The speech intention classification unit 220 may determine the speech intention corresponding to the highest probability value among the calculated second probability values as the user's speech intention.

In step S17, the predicted speech may be determined by inputting speech intention data as an input value of the intention prediction algorithm stored in advance in the speech intention prediction unit, and predicted speech data may be generated.

Specifically, the speech intention prediction unit 230 may calculate the user's predicted speech as a third probability value by inputting speech intention data as an input value of the intention prediction algorithm stored in advance. In this case, the utterance intention prediction unit 230 may determine the predicted utterance corresponding to the highest probability value among the calculated third probability values as the user's predicted utterance.

In step S18, a response may be generated by inputting speech intention data and predicted speech data as input values of the response algorithm previously stored in the response text generator.

In step S20, the request text output unit may provide the first converted text and the first request text to the first error rate calculator.

Specifically, the first converted text as an input value and the first request text as an output value of the first voice deep learning unit 212 learned and stored in advance in the first voice model deep learning unit 212 are requested text output unit 213 can be provided in At this time, the request text output unit 213 may provide the first converted text and the first request text to the first error rate calculator 2120 .

In step S21, the first error rate calculation unit 2120 may calculate a first error rate (or a first loss value) between the first converted text and the first request text.

In step S22, the reference text storage unit 214 may provide the first reference text (or first transcription text) to the second voice model deep learning unit 215.

In step S23, the first reference text (or first transcription text) may be input as an input value of the second deep learning algorithm previously stored in the second voice model deep learning unit 215, and an output value may be output.

In step S24, the second error rate calculation unit 2150 may calculate a second error rate between the first reference text (or first transcription text) and the second reference text (or second transcription text).

Specifically, the second voice model deep learning unit 215 is configured to output the second reference text (or second transcription text) identical to the first reference text (or first transcription text) by the second voice model deep learning unit. It can be fine-tuned and deep-learned.

Specifically, the second error rate calculation unit 2150 sets the first reference text (or first transcription text) provided from the reference text storage unit 214 as an input value of the second deep learning algorithm stored in advance, and between the output values A second error rate (or second loss value) of can be calculated.

The second speech model deep learning unit 215 uses a second error rate (or a second loss value) so that the output value of the second deep learning algorithm is the same as the first reference text (or first transcription text). The second deep learning algorithm can be fine-tuned until it becomes text (or second transcription text).

In step S25, the weight value calculation unit 216 may calculate the weight value using the first error rate (or the first loss value) and the second error rate (or the second loss value).

Specifically, the weight value calculation unit 216 calculates the first voice model deep learning unit 212 based on the first error rate value (or first loss value) and the second error rate value (or second loss value). Weight values for fine adjustment can be calculated.

In step S26, the weight value calculation unit may provide the weight values to the first speech model deep learning unit.

In step S27, the first voice model deep learning unit may fine-tune the pre-stored first deep learning algorithm based on the weight value.

Specifically, the weight value provided by the weight value calculation unit 216 is used as the weight value of the first speech model deep learning unit 212 to adjust the first deep learning algorithm, and the first converted text is fine-tuned. The first voice model deep learning unit 212 may be deep-learned by inputting the input value of the learning algorithm.

The drawings and detailed description of the present invention referred to so far are only examples of the present invention, which are only used for the purpose of explaining the present invention, and are used to limit the scope of the present invention described in the meaning or claims. It is not. Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

The embodiments described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA) array), programmable logic units (PLUs), microprocessors, or any other device capable of executing and responding to instructions.

A processing device may run an operating system and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art know that a processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It will be understood that it can include

For example, a processing device may include a plurality of processors or a processor and a controller. Also, other processing configurations are possible, such as a parallel processor. Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device.

Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. Computer readable media may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software.

Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CDROMs and DVDs, and ROMs, RAMs, and flash memories. hardware devices specially configured to store and execute program instructions, such as; Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved. Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

In the voice processing system based on artificial intelligence comprising at least one processor,

The at least one processor,

a voice determination unit that determines the user's gender and age based on the user's requested utterances from one point in time to another point in time;

a voice processing unit that converts the user's requested utterance into text, determines the user's utterance intention, and determines the user's predicted utterance after the other point in time; and

And a response generator for generating a response to the user's requested utterance based on the user's utterance intention and the predicted predicted utterance.

Voice processing system based on artificial intelligence.
According to claim 1,

The voice judgment unit,

a voice extraction unit extracting first voice data based on the requested utterance at any one time point;

First deep learning for calculating the gender and age of the user as a first probability value by using the first voice data as an input value of a first decision algorithm stored in advance, and determining the gender and age of the user based on the first probability value wealth; and

And a voice recognizer distribution unit for recognizing the requested utterance of the user from the one time point to the other time point based on the determined gender and age and extracting second voice data,

Voice processing system based on artificial intelligence.
According to claim 2,

The first deep learning unit,

determining the gender and age corresponding to the highest probability value among the first probability values as the gender and age of the user;

Voice processing system based on artificial intelligence.
According to claim 2,

The voice recognizer distribution unit,

Including a plurality of different voice recognizers for recognizing the user's requested utterance and generating the second voice data corresponding to the determined gender and age,

Voice processing system based on artificial intelligence.
According to claim 2,

The voice processing unit,

a voice-to-text converter that converts the second voice data into first converted text;

a second deep learning unit inputting the first converted text as an input value of a first deep learning algorithm stored in advance and determining a first request text corresponding to the user's request utterance;

a speech intention classification unit generating speech intention data for the speech intention by inputting the first request text as an input value of an intention classification algorithm stored in advance; and

And a speech intention prediction unit inputting the speech intention data as an input value of an intention prediction algorithm stored in advance to generate predicted speech data for the predicted speech.

Voice processing system based on artificial intelligence.
According to claim 5,

The second deep learning unit,

a converted text input unit receiving the first converted text;

a first voice model deep learning unit generating the first request text by inputting the first converted text as an input value of the first deep learning algorithm stored in advance; and

Including a request text output unit for outputting the first request text,

Voice processing system based on artificial intelligence.
According to claim 6,

The second deep learning unit,

a reference text storage unit for storing a first reference text corresponding to the user's requested utterance; and

Further comprising a first error rate calculation unit for calculating a first error rate value between the first converted text and the first request text,

Voice processing system based on artificial intelligence.
According to claim 7,

The second deep learning unit,

a second voice model deep learning unit for performing deep learning by inputting the first reference text as an input value of a previously stored second deep learning algorithm;

a second error rate calculation unit calculating a second error rate value between the first reference text and a second reference text that is an output value of the second speech model deep learning unit; and

Further comprising a weight value calculation unit for calculating weight values for deep learning the first speech model deep learning unit based on the first error rate value and the second error rate value.

Voice processing system based on artificial intelligence.
According to claim 8,

The first voice model deep learning unit,

Performing deep learning by setting the weight value as a weight value of the pre-stored first deep learning algorithm and inputting the first converted text as an input value,

Voice processing system based on artificial intelligence.
According to claim 9,

The utterance intention classification unit,

generating the speech intention data obtained by calculating the speech intention as a second probability value by inputting the first request text as an input value of the intention classification algorithm stored in advance;

Voice processing system based on artificial intelligence.
According to claim 10,

The utterance intention classification unit,

determining the utterance intention having the highest probability value among the second probability values as the utterance intention of the user;

Voice processing system based on artificial intelligence.
According to claim 10,

The ignition intention prediction unit,

generating the predicted speech data obtained by calculating the predicted speech as a third probability value by inputting the speech intention data as an input value of the pre-stored intention prediction algorithm;

Voice processing system based on artificial intelligence.
According to claim 11,

The utterance intention classification unit,

Determining the predicted utterance having the highest probability value among the third probability values as the predicted utterance of the user;

Voice processing system based on artificial intelligence.
According to claim 6,

The response generating unit,

a response text generator configured to generate a response text in response to the requested utterance of the user by using the utterance intention data and the predicted utterance data as input values of a previously stored response algorithm; and

Including a text-to-speech conversion unit for converting the response text into voice data,

Voice processing system based on artificial intelligence.
A voice processing method based on artificial intelligence by a voice processing system based on artificial intelligence including at least one processor, wherein the at least one processor includes a voice determination unit, a voice processing unit, and a response generation unit,

determining the user's gender and age based on the user's requested utterances from one point in time to another point in time by the voice determination unit;

converting the user's requested utterance into text by the voice processing unit to determine the user's utterance intention and to determine the user's predicted utterance after the other time point; and

Generating a response to the user's requested utterance based on the user's utterance intention and the predicted predicted utterance by the response generating unit,

Voice processing method based on artificial intelligence.
A computer-readable non-transitory recording medium on which a program for executing the voice processing system based on the artificial intelligence of claim 1 is recorded.