WO2021132786A1 - 고령자를 위한 휴먼케어 로봇의 학습 데이터 처리 시스템 - Google Patents
고령자를 위한 휴먼케어 로봇의 학습 데이터 처리 시스템 Download PDFInfo
- Publication number
- WO2021132786A1 WO2021132786A1 PCT/KR2019/018759 KR2019018759W WO2021132786A1 WO 2021132786 A1 WO2021132786 A1 WO 2021132786A1 KR 2019018759 W KR2019018759 W KR 2019018759W WO 2021132786 A1 WO2021132786 A1 WO 2021132786A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- text
- voice
- text data
- user terminal
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 title claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims abstract description 48
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000013480 data collection Methods 0.000 claims description 36
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 206010020772 Hypertension Diseases 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000032683 aging Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/61—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- Embodiments of the present invention relate to a system for processing learning data of a human care robot for the elderly, and to a system for collecting data used for learning of an artificial neural network for voice processing of a human care robot.
- the human care robot can provide personalized care services based on psychological and emotional sympathy with the elderly, and can be used for active medical health and life support services by combining with the existing u-Heathcare technology.
- the present invention is to solve the above problems, and to implement a human care robot more suitable for the care of the elderly.
- a method for collecting voice data used for learning of an artificial neural network for voice processing of a human care robot for the elderly includes a first method including first text data corresponding to a voice to be acquired. transmitting data to a user terminal; receiving second data including reading data of the first text data from the user terminal; verifying validity of the reading data by comparing second text data generated from the reading data and the first text data; and storing the reading data determined to be valid as learning data by matching the first text data with the reading data.
- the first data further includes voice data obtained by converting the first text data into speech using the learned first artificial neural network, and the user terminal provides at least one of the first text data and the speech data to the user.
- the first artificial neural network may be a neural network trained to convert input text data into voice data corresponding to the input text data.
- the second data further includes metadata for a user who uses the user terminal, and the storing in correspondence includes storing the reading data determined to be valid, the first text data, and the metadata in correspondence with each other.
- the method for collecting voice data may further include, after the storing as the learning data, transmitting the updated voice data collection status to the user terminal.
- Transmitting the updated voice data collection status to the user terminal may include: checking a time length of the monophonic data; calculating an accumulated time length by accumulating the confirmed time length to the time length of pre-obtained reading data; and generating the voice data collection status including the accumulated time length and the target time length.
- the verifying of the validity may include: generating the second text data from the reading data using a learned second artificial neural network; calculating a similarity between the first text data and the second text data; and determining, as valid data, the reading data having the similarity greater than or equal to a predetermined threshold similarity.
- the second artificial neural network may be a neural network that has been trained to convert input reading data into text data corresponding to the input reading data.
- the determining may include determining the reading data having the similarity lower than the predetermined threshold similarity as the judgment pending reading data.
- first text data corresponding to the monophonic data determined to be invalid as a result of the validity determination is transmitted to the user terminal and receiving rereading data for the first text data.
- the transmitting of the first data to the user terminal may include a font size control signal for controlling a size at which the first text is displayed in the user terminal.
- a first artificial neural network that converts input text data into voice data corresponding to the input text data using the learning data learning; and training a second artificial neural network to convert the input reading data into text data corresponding to the input reading data.
- data determined to be invalid is separately provided to the manager, thereby minimizing human intervention in data collection while maintaining high quality.
- FIG. 1 is a diagram schematically illustrating the configuration of an artificial neural network learning system of a human care robot for the elderly according to an embodiment of the present invention.
- FIG. 2 is a diagram schematically illustrating the configuration of a voice data collection device 110 provided in the server 100 according to an embodiment of the present invention.
- 3 and 4 are flowcharts for explaining a voice data collection method performed by the server 100 including the voice data collection device 110 .
- FIG 5 is an example of a screen 500 on which first text data is displayed on the user terminal 200 according to an embodiment of the present invention.
- FIG. 6 is a diagram illustrating an exemplary screen 600 displayed on the manager terminal 300 .
- a method for collecting voice data used for learning of an artificial neural network for voice processing of a human care robot for the elderly includes a first method including first text data corresponding to a voice to be acquired. transmitting data to a user terminal; receiving second data including reading data of the first text data from the user terminal; verifying validity of the reading data by comparing second text data generated from the reading data and the first text data; and storing the reading data determined to be valid as learning data by matching the first text data with the reading data.
- FIG. 1 is a diagram schematically illustrating the configuration of an artificial neural network learning system of a human care robot for the elderly according to an embodiment of the present invention.
- the artificial neural network learning system may transmit some of the learning data of the artificial neural network to the user terminal, and receive response data that is the remaining part of the learning data from the user terminal to generate completed learning data.
- the system according to an embodiment of the present invention transmits text data to the user terminal and receives reading data obtained by the user reading the text data in response thereto, thereby generating learning data including the text data and the reading data.
- Such an artificial neural network learning system may include a server 100 , a user terminal 200 , a manager terminal 300 , and a communication network 400 as shown in FIG. 1 .
- the user terminal 200 and the manager terminal 300 are various types of intermediary between a person and the server 100 so that each of the user and the manager can use the various services provided by the server 100 .
- the user terminal 200 may display text data received from the server 100 on a screen, and allow the user to read the text data displayed on the screen.
- reading data may be acquired according to the user's reading of text data and transmitted back to the server 100 .
- the manager terminal 300 may receive, display and/or reproduce the judgment pending reading data from the server 100 , obtain the manager's input thereto, and transmit it to the server 100 .
- Such terminals 200 and 300 may mean portable terminals 201 , 202 , 203 like the user terminal 200 shown in FIG. 1 , or may mean the computer 204 .
- the terminals 200 and 300 may include a display means for displaying content and the like in order to perform the above-described functions, and an input means for obtaining a user's input for such content.
- the input means and the display means may be configured in various ways.
- the input means may include, but is not limited to, a keyboard, a mouse, a trackball, a microphone, a button, a touch panel, and the like.
- both the user terminal 200 and the manager terminal 300 are illustrated as being singular in FIG. 1 , these quantities are exemplary and the spirit of the present invention is not limited thereto. Accordingly, the user terminal 200 and the manager terminal 300 may be plural.
- the communication network 400 may refer to a communication network that mediates data transmission/reception between each component of the system.
- the communication network 400 may include wired networks such as Local Area Networks (LANs), Wide Area Networks (WANs), Metropolitan Area Networks (MANs), Integrated Service Digital Networks (ISDNs), wireless LANs, CDMA, Bluetooth, satellite communication, and the like. may cover a wireless network, but the scope of the present invention is not limited thereto.
- the server 100 transmits some of the training data of the artificial neural network to the user terminal 200 and receives the response data that is the remaining part of the training data from the user terminal 200 to complete the training data.
- the server 100 transmits text data to the user terminal 200 and in response receives the reading data obtained by the user reading the text data, learning including the text data and the reading data. data can be generated.
- the server 100 transmits the judgment pending syllable data requiring the administrator's confirmation to the manager terminal 300 , and receives the validity determination result for the judgment pending syllable data from the manager terminal 300 . You may.
- the voice data collection apparatus 110 may include a communication unit 111 , a control unit 112 , and a memory 113 . Also, although not shown in the drawings, the voice data collection apparatus 110 according to the present embodiment may further include an input/output unit, a program storage unit, and the like.
- the communication unit 111 is a device including hardware and software necessary for the voice data collection device 110 to transmit and receive signals such as control signals or data signals through wired/wireless connection with other network devices such as terminals 200 and 300 . can
- the controller 112 may include any type of device capable of processing data, such as a processor.
- the 'processor' may refer to a data processing device embedded in hardware, for example, having a physically structured circuit to perform a function expressed as a code or a command included in a program.
- a microprocessor a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated (ASIC) Circuit
- a processing device such as an FPGA (Field Programmable Gate Array) may be included, but the scope of the present invention is not limited thereto.
- the memory 113 performs a function of temporarily or permanently storing data processed by the voice data collection device 110 .
- the memory may include a magnetic storage medium or a flash storage medium, but the scope of the present invention is not limited thereto.
- the memory 113 may temporarily and/or permanently store the generated learning data.
- the server 100 may transmit first data including the first text data to the user terminal 200 ( S310 ).
- the first text data is voice (or the user's utterance).
- the server may transmit the first data including "Hello" to the user terminal 200 .
- the first data transmitted in step S310 may further include voice data obtained by converting the first text data into voice.
- the voice data may be generated by the server 100 in various ways.
- the server 100 may generate voice data from the first text data by using the first artificial neural network trained to convert the input text into voice data corresponding to the input text.
- the server 100 may generate voice data based on data obtained by another user or administrator reading the first text data.
- the first data transmitted in step S310 may further include a font size control signal for controlling the size at which the first text data is displayed in the user terminal 200 .
- the first data may include a font size control signal for controlling the first text data to be displayed at 30 points or more.
- the user terminal 200 may provide the user with the first data received from the server 100 in step S310. Also, the user terminal 200 may generate the reading data of the first text data. (S320)
- FIG 5 is an example of a screen 500 on which first text data is displayed on the user terminal 200 according to an embodiment of the present invention.
- the screen 500 includes an area 510 for displaying identification information of a work in progress, an interface 520 for listening to voice data for first text data, and a display area 530 for first text data. ), a voice data collection status display area 540 , an interface 550 for starting a single sound, and an area 560 in which information is displayed.
- the user terminal 200 may display the first text data on the area 530 of the screen 500 so that the user reads the text.
- the user terminal 200 may adjust the size of the text displayed according to the first text data according to the font size control signal included in the first data.
- the user terminal 200 may obtain a user's input for the interface 520 and provide voice data to the user.
- the user may input an input to the interface 520 before the start of recording to hear an example voice for the text to be read by the user. According to this process, reading data can be acquired even from a user who has vision problems or cannot read text.
- the user may generate reading data for the first text by referring to the first text data displayed in the area 530 and/or voice data provided according to an input to the interface 520 .
- the user may start recording by performing an input to the interface 550 for starting the reading, and may generate the reading data by reading the text.
- the user may also generate the reading data by referring to the guidance displayed on the area 560 where the guidance is displayed.
- the user terminal 200 may transmit the second data including the reading data generated according to the above-described process to the server 100 .
- the server 100 may receive the second data including the reading data of the first text data from the user terminal 200 (S330).
- the second data may further include metadata about a user who uses the user terminal 200 .
- the metadata may include various items that can represent the characteristics of the user, such as the user's age, the user's gender, the user's residential area, and the user's education level.
- the server 100 may verify the validity of the reading data received in step S330. (S340) For example, the server 100 may generate the second text data and the first text from the reading data. The validity of the reading data can be verified by comparing the data.
- the server 100 may generate the second text data from the reading data by using the learned second artificial neural network.
- the second artificial neural network may be a neural network that has been trained to convert the input reading data into text data corresponding to the input reading data.
- the server 100 may calculate a degree of similarity between the first text data and the second text data, and determine the reading data having a calculated similarity greater than or equal to a predetermined threshold similarity as valid data.
- the server 100 calculates the similarity between the two text data to be 47%, and since the calculated similarity is less than the threshold similarity (assuming the critical similarity is 80%), it is possible to determine the reading data as invalid data. .
- the server 100 may calculate the similarity between two text data in various ways. For example, the server 100 may generate a feature vector for each text, and calculate a similarity based on a distance between the generated vectors. However, such a method is exemplary and the spirit of the present invention is not limited thereto.
- the server 100 may determine the corresponding read-only data as the decision-preserved read-only data. (S370) The server 100 holds the decision A specific method of processing the reading data will be described later with reference to steps S390 and S400.
- the server 100 may store the reading data as learning data by matching the reading data with the first text data (S350).
- the server 100 may store the user's meta data of the user terminal 200 as learning data in addition to the reading data and the first text data.
- the metadata may be data included in the second data received in step S330 described above.
- the server 100 updates the updated voice data collection status in consideration of the learning data generated in step S350 ( S360 ), and transmits the updated voice data collection status to the user terminal 200 . (S380)
- the server 100 may check the length of time of the reading data included in the learning data. For example, the server 100 may check the length of time of reading data for "hello" as 1 second.
- the server 100 may calculate the accumulated time length by accumulating the pre-obtained time length of the reading data. For example, if the total time length of the reading data performed by the user of the user terminal 200 in the past is 3 hours 20 minutes 50 seconds, the server 100 accumulates 1 second to the total time length and 3 hours 20 minutes 51 seconds can be calculated as the cumulative time length.
- the server 100 may accumulate and manage the accumulated time length for each user.
- the server 100 may manage the voice data collection status by the number of syllables rather than the length of time. In this case, the server 100 may manage the voice data collection status, such as '230 out of 300 progress'.
- the voice data collection status transmitted to the user terminal 200 may be provided to the user.
- the user terminal 200 may display the voice data collection status in the voice data collection status display area 540 of FIG. 5 and provide it to the user.
- the server 100 may transmit the determination-pending monologue data determined in step S370 to the manager terminal 300 (S390).
- the manager terminal 300 may provide the decision pending reading data received from the server 100 to the manager, obtain an input corresponding to the manager's validity determination result, and transmit it back to the server. Accordingly, the server 100 may receive the validity determination result for the determination pending reading data from the manager terminal 300 (S400).
- FIG. 6 is a diagram illustrating an exemplary screen 600 displayed on the manager terminal 300 .
- the screen 600 may include an area 610 in which the manager's identification information is displayed, and an area 620 in which a list of read-only data that is pending judgment is displayed.
- the list displayed in the list display area 620 includes a second text data item 621, a reading data item 622, a similarity item 623, and a decision item 624 generated from the user's reading data. can do.
- the administrator may, for example, read the second text data item 621 and listen to the single tone data item 622 to determine the validity of the corresponding determination pending single tone data.
- the manager may determine that the data is valid by performing an input for 'use' in the decision item 624 for the first judgment pending reading data, and the corresponding data is valid by performing an input for 're-recording' and may decide that re-recording is necessary.
- the present invention minimizes human intervention in the collection of voice data while maintaining high quality.
- the server 100 may determine the validity of the corresponding judgment pending reading data based on the judgment result received in step S400 (S410).
- the server 100 may determine the determination pending data as valid.
- the server 100 may determine that the judgment pending reading data is invalid.
- the server 100 transmits the first data including the first text data corresponding to the determination pending reading data determined to be invalid in step S410 again to the user terminal 200 ( S420), it is possible to receive the reread data for the first text data from the user terminal 200 (S430).
- the server 100 may include a guide message that the first text data is a reread text in the first data transmitted to the user terminal 200 in step S420 .
- the user terminal 200 may provide a text reread text guide message to the user.
- the server 100 may store the rereading data received in step S430 or the readout data determined to be valid in step S410 as learning data by matching the first text data (S440).
- the server 100 may match the reading data determined to be valid in step S410 with the first text data corresponding thereto and store it as learning data. Also, the server 100 may store the rereading sound data obtained in step S430 and the first text data corresponding thereto as learning data.
- the server 100 may update the voice data collection status (S450), and transmit the updated voice data collection status to the user terminal 200 (S460).
- the description of steps S360 and S380 is replaced.
- the server 100 may use the learning data generated by the above-described process to learn an artificial neural network for voice processing of the human care robot for the elderly.
- the server 100 may train the first artificial neural network to convert the input text data into voice data corresponding to the input text data. Also, the server 100 may train the second artificial neural network to convert the input syllable data into text data corresponding to the input syllable data.
- the first artificial neural network may be the same neural network as the neural network that generates the voice data transmitted together with the first text data in step S310, or may be a separate neural network.
- the second artificial neural network may be the same neural network as the neural network that generates the second text data used to determine the validity of the first text data in step S340, or may be a separate neural network.
- the embodiment according to the present invention described above may be implemented in the form of a computer program that can be executed through various components on a computer, and such a computer program may be recorded in a computer-readable medium.
- the medium may be to store a program executable by a computer. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like.
- the computer program may be specially designed and configured for the present invention, or may be known and used by those skilled in the computer software field.
- Examples of the computer program may include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.
- connection or connection members of the lines between the components shown in the drawings exemplarily represent functional connections and/or physical or circuit connections, and in an actual device, various functional connections, physical connections that are replaceable or additional may be referred to as connections, or circuit connections.
- connection or circuit connections unless there is a specific reference such as "essential” or "importantly", it may not be a necessary component for the application of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims (10)
- 고령자를 위한 휴먼케어 로봇의 음성 처리를 위한 인공 신경망의 학습에 사용되는 음성 데이터를 수집하는 방법에 있어서획득하고자 하는 음성에 대응되는 제1 텍스트 데이터를 포함하는 제1 데이터를 사용자 단말에 전송하는 단계;상기 사용자 단말로부터 상기 제1 텍스트 데이터의 독음 데이터를 포함하는 제2 데이터를 수신하는 단계;상기 독음 데이터로부터 생성된 제2 텍스트 데이터 및 상기 제1 텍스트 데이터를 비교하여 상기 독음 데이터의 유효성을 검증하는 단계; 및유효한 것으로 판단된 상기 독음 데이터와 상기 제1 텍스트 데이터를 대응시켜 학습 데이터로 저장하는 단계;를 포함하는, 음성 데이터 수집 방법.
- 청구항 1에 있어서상기 제1 데이터는학습된 제1 인공 신경망을 이용하여 상기 제1 텍스트 데이터를 음성으로 변환한 음성 데이터를 더 포함하고,상기 사용자 단말은상기 제1 텍스트 데이터 및 상기 음성 데이터 중 적어도 하나를 사용자에게 제공하고,상기 제1 인공 신경망은입력 텍스트 데이터를 상기 입력 텍스트 데이터에 대응되는 음성 데이터로 변환하도록 학습된 신경망인, 음성 데이터 수집 방법.
- 청구항 1에 있어서상기 제2 데이터는상기 사용자 단말을 사용하는 사용자에 대한 메타 데이터를 더 포함하고,상기 대응시켜 저장하는 단계는상기 유효한 것으로 판단된 상기 독음 데이터, 상기 제1 텍스트 데이터 및 상기 메타 데이터를 대응시켜 저장하는, 음성 데이터 수집 방법.
- 청구항 1에 있어서상기 음성 데이터 수집 방법은상기 학습 데이터로 저장하는 단계 이후에,갱신된 음성 데이터 수집 현황을 상기 사용자 단말에 전송하는 단계;를 더 포함하는, 음성 데이터 수집 방법.
- 청구항 4에 있어서상기 갱신된 음성 데이터 수집 현황을 상기 사용자 단말에 전송하는 단계는상기 독음 데이터의 시간 길이를 확인하는 단계;상기 확인된 시간 길이를 기 획득된 독음 데이터의 시간 길이에 누적하여 누적 시간 길이를 산출하는 단계; 및상기 누적 시간 길이 및 목표 시간 길이를 포함하는 상기 음성 데이터 수집 현황을 생성하는 단계;를 포함하는, 음성 데이터 수집 방법.
- 청구항 1에 있어서상기 유효성을 검증하는 단계는학습된 제2 인공 신경망을 이용하여 상기 독음 데이터로부터 상기 제2 텍스트 데이터를 생성하는 단계;상기 제1 텍스트 데이터와 상기 제2 텍스트 데이터의 유사도를 산출하는 단계; 및상기 유사도가 소정의 임계 유사도 이상인 독음 데이터를 유효한 데이터로 판정하는 단계;를 포함하고,상기 제2 인공 신경망은입력 독음 데이터를 상기 입력 독음 데이터에 대응되는 텍스트 데이터로 변환하도록 학습된 신경망인, 음성 데이터 수집 방법.
- 청구항 6에 있어서상기 판정하는 단계는상기 유사도가 상기 소정의 임계 유사도 미만인 독음 데이터를 판단 보류 독음 데이터로 결정하는 단계;를 포함하고,상기 음성 데이터 수집 방법은상기 학습 데이터로 저장하는 단계 이후에,상기 판단 보류 독음 데이터를 상기 판단 보류 독음 데이터에 대응되는 제1 텍스트 데이터와 함께 관리자 단말에 전송하고, 상기 관리자 단말로부터 유효성 판단 결과를 수신하는 단계;를 더 포함하는, 음성 데이터 수집 방법.
- 청구항 7에 있어서상기 음성 데이터 수집 방법은상기 유효성 판단 결과를 수신하는 단계 이후에,상기 유효성 판단 결과 유효하지 않은 것으로 판단되는 독음 데이터에 대응되는 제1 텍스트 데이터를 상기 사용자 단말에 전송하여 상기 제1 텍스트 데이터에 대한 재독음 데이터를 수신하는 단계;를 더 포함하는, 음성 데이터 수집 방법.
- 청구항 1에 있어서상기 제1 데이터를 사용자 단말에 전송하는 단계는상기 사용자 단말에서 상기 제1 텍스트가 표시되는 크기를 제어하는 폰트 크기 제어 신호를 포함하는, 음성 데이터 수집 방법.
- 청구항 1에 있어서상기 음성 데이터 수집 방법은상기 학습 데이터로 저장하는 단계 이후에,상기 학습 데이터를 이용하여입력 텍스트 데이터를 상기 입력 텍스트 데이터에 대응되는 음성 데이터로 변환하는 제1 인공 신경망을 학습시키는 단계; 및입력 독음 데이터를 상기 입력 독음 데이터에 대응되는 텍스트 데이터로 변환하는 제2 인공 신경망을 학습시키는 단계;를 더 포함하는, 음성 데이터 수집 방법.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2019-0173476 | 2019-12-23 | ||
KR1020190173476A KR102330811B1 (ko) | 2019-12-23 | 2019-12-23 | 고령자를 위한 휴먼케어 로봇의 학습 데이터 처리 시스템 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021132786A1 true WO2021132786A1 (ko) | 2021-07-01 |
Family
ID=76574820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2019/018759 WO2021132786A1 (ko) | 2019-12-23 | 2019-12-31 | 고령자를 위한 휴먼케어 로봇의 학습 데이터 처리 시스템 |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102330811B1 (ko) |
WO (1) | WO2021132786A1 (ko) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118071564A (zh) * | 2024-04-22 | 2024-05-24 | 江西七叶莲科技有限公司 | 一种基于大数据的居家养老服务平台 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19990009682A (ko) * | 1997-07-11 | 1999-02-05 | 김유승 | 화자인식 원격 클라이언트 계정 검증 시스템 및 화자검증방법 |
US9984679B2 (en) * | 2011-05-09 | 2018-05-29 | Nuance Communications, Inc. | System and method for optimizing speech recognition and natural language parameters with user feedback |
KR101901920B1 (ko) * | 2018-03-07 | 2018-11-14 | 주식회사 아크로노드 | 인공지능 음성인식 딥러닝을 위한 음성 및 텍스트 간 역전사 서비스 제공 시스템 및 방법 |
KR20190085882A (ko) * | 2018-01-11 | 2019-07-19 | 네오사피엔스 주식회사 | 기계학습을 이용한 텍스트-음성 합성 방법, 장치 및 컴퓨터 판독가능한 저장매체 |
KR20190087353A (ko) * | 2019-07-05 | 2019-07-24 | 엘지전자 주식회사 | 음성 인식 검증 장치 및 방법 |
-
2019
- 2019-12-23 KR KR1020190173476A patent/KR102330811B1/ko active IP Right Grant
- 2019-12-31 WO PCT/KR2019/018759 patent/WO2021132786A1/ko active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19990009682A (ko) * | 1997-07-11 | 1999-02-05 | 김유승 | 화자인식 원격 클라이언트 계정 검증 시스템 및 화자검증방법 |
US9984679B2 (en) * | 2011-05-09 | 2018-05-29 | Nuance Communications, Inc. | System and method for optimizing speech recognition and natural language parameters with user feedback |
KR20190085882A (ko) * | 2018-01-11 | 2019-07-19 | 네오사피엔스 주식회사 | 기계학습을 이용한 텍스트-음성 합성 방법, 장치 및 컴퓨터 판독가능한 저장매체 |
KR101901920B1 (ko) * | 2018-03-07 | 2018-11-14 | 주식회사 아크로노드 | 인공지능 음성인식 딥러닝을 위한 음성 및 텍스트 간 역전사 서비스 제공 시스템 및 방법 |
KR20190087353A (ko) * | 2019-07-05 | 2019-07-24 | 엘지전자 주식회사 | 음성 인식 검증 장치 및 방법 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118071564A (zh) * | 2024-04-22 | 2024-05-24 | 江西七叶莲科技有限公司 | 一种基于大数据的居家养老服务平台 |
Also Published As
Publication number | Publication date |
---|---|
KR102330811B1 (ko) | 2021-11-25 |
KR20210081186A (ko) | 2021-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020034526A1 (zh) | 保险录音的质检方法、装置、设备和计算机存储介质 | |
WO2016006727A1 (ko) | 인지기능 검사 장치 및 방법 | |
WO2018128238A1 (ko) | 디스플레이 장치를 이용한 가상 상담 시스템 및 방법 | |
WO2022186528A1 (ko) | 비대면 방식의 정신 장애 진료 시스템 및 방법 | |
WO2015008931A1 (ko) | 감정 분석을 통한 선호 음원 관리 장치 및 방법 | |
WO2016060296A1 (ko) | 음향 정보 녹음 장치 및 그 제어 방법 | |
WO2015046945A1 (ko) | 다감각정보를 이용한 정서 인지능력 검사 시스템 및 방법, 다감각정보를 이용한 정서 인지 훈련 시스템 및 방법 | |
WO2015133713A1 (en) | Voice synthesis apparaatus and method for synthesizing voice | |
WO2020196977A1 (ko) | 사용자 페르소나를 고려한 대화형 에이전트 장치 및 방법 | |
WO2019078507A1 (en) | ELECTRONIC DEVICE AND METHOD FOR PROVIDING A STRESS INDEX CORRESPONDING TO THE ACTIVITY OF A USER | |
WO2022080774A1 (ko) | 말 장애 평가 장치, 방법 및 프로그램 | |
WO2021002649A1 (ko) | 개별 화자 별 음성 생성 방법 및 컴퓨터 프로그램 | |
WO2021132786A1 (ko) | 고령자를 위한 휴먼케어 로봇의 학습 데이터 처리 시스템 | |
WO2020111637A1 (ko) | 몰입도 운용 방법 및 이를 지원하는 전자 장치 | |
WO2020213785A1 (ko) | 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템 | |
WO2022203123A1 (ko) | 캐릭터를 활용한 인공지능 자연어 처리 기반의 화상교육 콘텐츠 제공 방법 및 장치 | |
CN111933137B (zh) | 语音唤醒测试方法及装置、计算机可读介质和电子设备 | |
WO2021125592A1 (ko) | 인공지능 기반 능동형 스마트 보청기 피팅 방법 및 시스템 | |
WO2022050459A1 (en) | Method, electronic device and system for generating record of telemedicine service | |
WO2022154217A1 (ko) | 음성 장애 환자를 위한 음성 자가 훈련 방법 및 사용자 단말 장치 | |
WO2023106516A1 (ko) | 인공지능 콜을 이용한 질의응답 기반의 치매 검사 방법 및 서버 | |
WO2022065537A1 (ko) | 자막 동기화를 제공하는 영상 재생 장치 및 그 동작 방법 | |
WO2021256889A1 (ko) | 오디오 인식을 활용한 라이프로그 장치 및 그 방법 | |
WO2021096279A1 (ko) | 내시경 검사 중 병변이 발견된 위치에서의 데이터 입력 방법 및 상기 데이터 입력 방법을 수행하는 컴퓨팅 장치 | |
WO2020204357A1 (ko) | 전자 장치 및 이의 제어 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19957667 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19957667 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.01.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19957667 Country of ref document: EP Kind code of ref document: A1 |