WO2023214740A1

WO2023214740A1 - Audio output system and method

Info

Publication number: WO2023214740A1
Application number: PCT/KR2023/005815
Authority: WO
Inventors: 박지희; 최기운
Original assignee: 주식회사 코코지
Priority date: 2022-05-06
Filing date: 2023-04-27
Publication date: 2023-11-09
Also published as: CN117055835A; KR102547972B1; KR20230156673A

Abstract

The present invention provides an audio output system and method. The audio output system comprises: an audio output device, which outputs sound content for auditory stimulation of infants; a user terminal into which voice data of a user is input; a content management server which receives the voice data of the user and which provides the voice data of the user to the audio output device; and a book recommendation server which receives the voice data of the user, and which uses the voice data of the user so as to recommend a book associated with the voice data of the user, wherein the audio output device outputs the voice data of the user as the sound content, and the book recommendation server generates a first keyword cluster on the basis of a plurality of pieces of book text data stored in a database, generates a second keyword cluster on the basis of the voice data of the user, and compares the first keyword cluster to the second keyword cluster so as to recommend the book associated with the voice data of the user.

Description

Audio output system and method

The present invention relates to an audio output system and method that provides an operating environment that allows users, including infants, to easily select various sounds, and a convenient control environment that allows changing sound content even when the audio output device is offline. .

The content described in this section simply provides background information for this embodiment and does not constitute prior art.

With the advancement and miniaturization of electronic devices such as smartphones and tablets, the screen time of users using digital video devices is increasing. Screen time refers to time spent sitting or lying down due to the use of digital video devices, excluding time for physical activity and learning activities. This increase in screen time can have a negative impact on the health of adult users and have an even more negative impact on infant users. In particular, long-term exposure to visual information for infants and toddlers under 24 months of age may have undesirable effects on their cerebral development, and the World Health Organization (WHO) also stipulates screen exposure guidelines for infants and toddlers that children under 1 year of age should not be exposed to electronic device screens. It has been suggested that exposure should be limited to 1 hour per day for children ages 2 to 5.

On the other hand, auditory stimulation can have a beneficial effect on the growth and development of infants and young children who begin to feel and learn about the world through auditory stimulation from the fetus. Various auditory stimuli have a significant impact on infants and young children's language development as well as their creativity and imagination. However, although guardians want to deliver auditory stimulation to infants and toddlers, there is a lack of means to stimulate the curiosity of infants and toddlers, who are actual users, and to provide infants and toddlers with an easy operating environment, so guardians ultimately choose digital video devices (TVs, smart devices). Phones) are currently providing auditory stimulation along with visual information.

Accordingly, it stimulates the curiosity of infants and young children, provides an easy operating environment for infants and young children where they can select various sounds even without going through the guardian's digital video device, and provides various auditory stimulation to infants and young children, which has a beneficial effect on the growth and development of infants and young children. A system that can influence is required.

The object of the present invention is to provide an audio output system and method that includes an operating environment that allows users, including infants and young children, to easily select various sounds.

In addition, another object of the present invention is to provide an audio output system and method that uses sound content recorded by a guardian's voice.

The objects of the present invention are not limited to the objects mentioned above, and other objects and advantages of the present invention that are not mentioned can be understood by the following description and will be more clearly understood by the examples of the present invention. Additionally, it will be readily apparent that the objects and advantages of the present invention can be realized by the means and combinations thereof indicated in the patent claims.

An audio output system according to some embodiments of the present invention for solving the above problems includes an audio output device that outputs sound content for auditory stimulation of infants and young children, a user terminal that inputs the user's voice data, and receives the user's voice data. and a content management server that provides the user's voice data to the audio output device and a book recommendation server that receives the user's voice data and recommends books associated with the user's voice data using the user's voice data. It includes, wherein the audio output device outputs the user's voice data as the sound content, and the book recommendation server generates a first keyword cluster based on a plurality of book text data stored in a database, and the user's voice data is output as the sound content. A second keyword cluster is created based on voice data, and a book related to the user's voice data is recommended by comparing the first keyword cluster and the second keyword cluster.

In addition, the audio output device includes an audio output station including a sound doll and a docking space in which the sound doll is docked, the audio output station recognizes the sound doll docked in the docking space, and the recognized sound doll Sound content corresponding to can be output.

In addition, when the content management server receives the user's voice data, it transmits a message about a content update to the audio output device and receives a request signal for the user's voice data from the audio output device. , the user's voice data can be transmitted to the audio output device.

In addition, the book recommendation server includes a communication unit that receives book selection data and the user's voice data from the user terminal and transmits a recommended book list and first book text data associated with the book selection data to the user terminal; A voice analysis unit that converts the user's voice data into user text data, a text analysis engine that generates the first keyword cluster and the second keyword cluster, compares the first and second keyword clusters, and makes the recommendation. It may include a book matching unit that generates a book list.

In addition, the text analysis engine includes a natural language processing unit that receives the user text data and the plurality of book text data, processes them into natural language, and extracts keywords, and creates the first and second keyword clusters based on the distribution of the keywords. It may include a keyword cluster creation unit.

An audio output method according to some embodiments of the present invention to solve the above problem includes generating a first keyword cluster by analyzing a plurality of book text data stored in a database included in the server, and collecting the user's voice data from the user terminal. Receiving and analyzing the user's voice data to generate a second keyword cluster, matching the first and second keyword clusters to generate a recommended book list, and transmitting the recommended book list to the user terminal. Receiving book selection data for the recommended book list from the user terminal and transmitting first book text data corresponding to the book selection data to the user terminal.

In addition, the step of generating the first keyword cluster includes receiving the plurality of book text data from the database, extracting book keywords for each of the plurality of book text data through natural language processing, and It may include generating the first keyword cluster according to the distribution of book keywords, and storing the first keyword cluster in the database.

In addition, the step of generating the second keyword cluster includes converting the user's voice data into user text data, extracting a user keyword from the user text data through natural language processing, and distribution of the user keyword It may include generating the second keyword cluster according to .

An audio output system according to some embodiments of the present invention for solving the above problems includes an audio output device that outputs sound content for auditory stimulation of infants and young children, a microphone into which the user's voice data is input, and a device that edits the user's voice data. A user terminal, a content management server that provides the user's voice data to the audio output device, and a book recommendation server that recommends a book associated with the user's voice data using the user's voice data, the book recommendation server Converts the user's voice data into user text data in the form of text data, processes the user text data into natural language, extracts keywords for the user text data, and creates a keyword cluster based on the distribution of the keywords. and recommending books related to the user's voice data using the keyword cluster.

An audio output system according to some embodiments of the present invention for solving the above problems includes a first sound doll corresponding to the first sound content, a second sound doll corresponding to a second sound content different from the first sound content, and a sound It includes an audio output station that outputs sound content corresponding to the doll, a user terminal into which the user's voice data is input, and a sound doll recommendation server that recommends a sound doll associated with the user's voice data. The sound doll recommendation server includes, Analyzing first text data for the first sound content to generate a first keyword cluster, analyzing second text data for the second sound content to generate a second keyword cluster, and The user's voice Analyzing data, generating a third keyword cluster, and comparing the first to third keyword clusters to determine a sound doll associated with the user's voice data.

The audio output system and method of the present invention can stimulate the curiosity of infants and young children and provide an environment in which infants and young children can select various sounds on their own without going through their guardian's digital video device. In other words, it is possible to have a beneficial effect on the growth and development of infants and young children by providing them with a variety of auditory stimulation while minimizing exposure to digital imaging devices.

Additionally, the audio output system and method according to embodiments of the present invention can provide emotional stability to infants and young children by using the voice of their guardian, which the infant has grown up listening to since the fetus, as sound content.

In addition to the above-described content, specific effects of the present invention are described below while explaining specific details for carrying out the invention.

1 is a schematic diagram illustrating an audio output system according to some embodiments of the present invention.

FIG. 2 is a block diagram for explaining the audio output device of FIG. 1.

FIG. 3 is an example diagram for explaining the audio output station of FIG. 2.

Figure 4 is an example diagram for explaining the audio output station and sound doll of Figure 2.

FIG. 5 is a block diagram for explaining the server of FIG. 1.

FIG. 6 is a block diagram for explaining the relationship between the audio output station of FIG. 2 and the content management server of FIG. 5.

Figure 7 is a block diagram for explaining the book recommendation server of Figure 5.

FIG. 8 is a block diagram for explaining the text analysis engine of FIG. 7.

FIGS. 9A and 9B are exemplary diagrams for explaining the book matching unit of FIG. 7.

Figure 10 is a flowchart for explaining an audio output method according to some embodiments of the present invention.

FIG. 11 is a flowchart for explaining the first keyword cluster generation step of FIG. 10.

FIG. 12 is a flowchart for explaining the second keyword cluster creation step of FIG. 10.

Figure 13 is a schematic diagram for explaining an audio output system according to some embodiments of the present invention.

Figure 14 is a block diagram for explaining a server of an audio output system according to some embodiments of the present invention.

Terms or words used in this specification and patent claims should not be construed as limited to their general or dictionary meaning. According to the principle that the inventor can define terms or word concepts in order to explain his or her invention in the best way, it should be interpreted with a meaning and concept consistent with the technical idea of the present invention. In addition, the embodiments described in this specification and the configurations shown in the drawings are only one embodiment of the present invention and do not completely represent the technical idea of the present invention, so they cannot be replaced at the time of filing the present application. It should be understood that there may be various equivalents, variations, and applicable examples.

Terms such as first, second, A, and B used in the present specification and claims may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be named a second component, and similarly, the second component may also be named a first component without departing from the scope of the present invention. The term 'and/or' includes any of a plurality of related stated items or a combination of a plurality of related stated items.

The terms used in the specification and claims are merely used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as "include" or "have" should be understood as not precluding the existence or addition possibility of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification. .

Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person of ordinary skill in the technical field to which the present invention pertains.

Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and unless clearly defined in the present application, should not be interpreted in an ideal or excessively formal sense. No.

Additionally, each configuration, process, process, or method included in each embodiment of the present invention may be shared within the scope of not being technically contradictory to each other.

Hereinafter, an audio output system and method according to some embodiments of the present invention will be described with reference to FIGS. 1 to 13.

Referring to FIG. 1, an audio output system 10 according to some embodiments of the present invention includes an audio output device 100, a server 200, and a user terminal 300.

The audio output device 100 is configured to output sound content for auditory stimulation of infants and young children. The audio output device 100 can download and update sound content by exchanging information with the server 200 and the user terminal 300.

The server 200 provides a service environment in which the user terminal 300 controls the audio output device 100. Additionally, the server 200 may provide sound content output by the audio output device 100. That is, the server 200 may provide sound content matching the information recognized through the audio output device 100 to the audio output device 100 .

The user terminal 300 is connected to the audio output device 100 and can control the audio output device 100. Additionally, the user terminal 300 may input the user's voice data. The user terminal 300 is a terminal that can use applications or web services provided through the server 200. The user terminal 300 may be, for example, a user's personal computer or smartphone.

Here, the user of the audio output device 100 and the user of the user terminal 300 may include infants and toddlers and their guardians. The main user of the audio output device 100 may be an infant or young child, and the main user of the user terminal 300 may be a guardian of an infant or young child. However, this embodiment is not limited to this.

The communication network can connect the audio output device 100, the server 200, and the user terminal 300. Communication networks may include networks based on wired Internet technology, wireless Internet technology, and short-distance communication technology. Wired Internet technology may include, for example, at least one of a local area network (LAN) and a wide area network (WAN).

Wireless Internet technologies include, for example, Wireless LAN (WLAN), DLNA (Digital Living Network Alliance), Wibro (Wireless Broadband), Wimax (World Interoperability for Microwave Access: Wimax), and HSDPA (High Speed Downlink Packet). Access), HSUPA (High Speed Uplink Packet Access), IEEE 802.16, Long Term Evolution (LTE), LTE-A (Long Term Evolution-Advanced), Wireless Mobile Broadband Service (WMBS) and 5G NR (New Radio) technology. However, this embodiment is not limited to this.

Short-range communication technologies include, for example, Bluetooth, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra-Wideband (UWB), ZigBee, and Near Field Communication. At least one of NFC), Ultrasound Communication (USC), Visible Light Communication (VLC), Wi-Fi, Wi-Fi Direct, and 5G NR (New Radio) may include. However, this embodiment is not limited to this.

The audio output device 100, server 200, and user terminal 300 that communicate through a communication network can comply with technical standards and standard communication methods for mobile communication. For example, standard communication methods include GSM (Global System for Mobile communication), CDMA (Code Division Multi Access), CDMA2000 (Code Division Multi Access 2000), and EV-DO (Enhanced Voice-Data Optimized or Enhanced Voice-Data Only). , at least one of Wideband CDMA (WCDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced (LTEA), and 5G New Radio (NR) may include. However, this embodiment is not limited to this.

Hereinafter, an audio output device of an audio output system according to some embodiments of the present invention will be described with reference to FIGS. 2 to 4.

FIG. 2 is a block diagram for explaining the audio output device of FIG. 1, and FIG. 3 is an example diagram for explaining the audio output station of FIG. 2. Figure 4 is an example diagram for explaining the audio output station and sound doll of Figure 2.

Referring to FIG. 2, the audio output device 100 of the audio output system 10 according to some embodiments of the present invention may include an audio output station 110 and a sound doll 120.

The audio output station 110 may recognize the sound doll 120 and output sound content corresponding to the sound doll 120. Here, the sound content may include children's songs, traditional fairy tales, English fairy tales, etc. recorded for auditory stimulation of infants and young children. Alternatively, referring to FIG. 1 , sound content may include the user's voice data input through the user terminal 300. Sound content may be stored in the audio output station 110 or in the sound doll 120.

The audio output station 110 may include a data processing unit that checks the overall operating state of the audio output station 110, a data storage unit that stores sound content, and a speaker unit that outputs sound content. Here, the data processor may check whether the sound doll 120 is recognized, manage power of the audio output station 110, and check whether sound content is stored.

The sound doll 120 may be a device that allows the user to recognize the audio output station 110 to play sound content. The sound doll 120 may correspond to sound content for auditory stimulation of infants and young children. For example, sound content may include content for language learning such as counting, bilingual repetition, onomatopoeia repetition, and onomatopoeia repetition, and content that enhances the user's imagination and creativity such as melodies, sound theater, and folk tales. . In addition, the sound content may be at least one of content that allows the user to perform physical activities through movement, such as animal nursery rhymes and children's rhythmic songs, and content in which a guardian records a voice for the user's emotional development.

Referring to FIG. 3, the audio output station 110 may have an external appearance with a docking space (D) in which the sound doll 120 can be seated, a speaker (S) through which sound content is output, and the volume of the speaker. It may include a volume control device (V) to adjust, an operating status light (L) to indicate the operating state of the audio output station through color change, and a playback track controller (C) to change the output sound track. For example, the operation status light L may be turned on or off depending on whether the sound doll 120 is docked. Additionally, the audio output station 110 may have an appearance that stimulates the curiosity of infants and young children.

For example, the audio output station 110 may have an appearance like a house with a docking space (D) formed, the volume control device (V) is composed of a chimney of the house, and the playback track controller (C) may be composed of part of the roof of a house. Through this shape, infant users can purely perceive the sound doll 120 as residing in the audio output station 110. The above house-shaped audio output station 110 can provide emotional stability and satisfaction to users of infants and young children.

Referring to FIG. 4 , infant users can play sound content by placing the sound doll 120 in the docking space (D) of the audio output station 110. At this time, the audio output station 110 can identify the sound doll 120 placed in the docking space (D) through near field communication (NFC). Additionally, the docking space (D) of the audio output station 110 and the sound doll 120 may each include a magnet, and the docking space (D) and the sound doll 120 may be fixed by magnetic force. However, this embodiment is not limited to this.

The sound doll 120 may be sized so that infants and young children can hold it with one hand. Accordingly, the docking space D may be formed in a size that can sufficiently accommodate the sound doll 120. The sound doll 120 may be configured in the shape of an animal or a character that can stimulate the curiosity of infants and young children. Additionally, the sound doll 120 may be rounded without sharp parts for the safety of infants and young children. At this time, the sound doll 120 may be composed of a plurality of sound dolls 120 with different appearances, and different sound content may correspond to each sound doll 120.

For example, the first sound doll 120A may correspond to content recorded by the user's guardian, and the second sound doll 120B may correspond to animal nursery rhyme content. The third sound doll 120C, the fourth sound doll 120D, and the fifth sound doll 120E may each correspond to traditional fairy tales, children's songs, and language learning content. Accordingly, when the user wants to play the desired sound content, the sound doll 120 corresponding to the sound content can be recognized by the audio output station 110 and the sound content can be played. However, the specific appearance of the sound doll 120 and the corresponding sound content may vary depending on need.

The user can safely select the sound doll 120 with one hand and easily place it in the docking space (D) of the audio output station 110. Additionally, the user may remove the mounted sound doll 120 with one hand. Through this, when the user docks another sound doll 120, the audio output station 110 can output sound content corresponding to the newly docked sound doll 120.

The sound content output process of the audio output device 100 is comprised of a simple docking and detachment process, so even infants and young children can sufficiently operate it. Additionally, sound content may be stored in the audio output station 110 in advance. Accordingly, output of sound content corresponding to the sound doll 120 may be possible only through the audio output device 100 even without connection to other devices.

Hereinafter, with reference to FIGS. 5 and 6, the server 200 of the audio output system 10 according to some embodiments of the present invention will be described.

FIG. 5 is a block diagram for explaining the server of FIG. 1, and FIG. 6 is a block diagram for explaining the relationship between the audio output station of FIG. 2 and the content management server of FIG. 5.

Referring to FIG. 5, the server 200 of the audio output system 10 according to some embodiments of the present invention may include a content management server 210 and a book recommendation server 220.

The user's voice data (D_voice) may be input through the user terminal 300 and transmitted to the server 200 through a communication network. Here, voice data (D_voice) may be data recorded by a guardian through the user terminal 300 for the emotional development of infants and young children. For example, voice data (D_voice) may be recorded data about a nursery rhyme sung by a guardian to an infant or toddler, a fairy tale read by a guardian to an infant or toddler, or a message delivered by a guardian to an infant or toddler. Voice data (D_voice) transmitted to the server 200 may be delivered to the content management server 210 and the book recommendation server 220.

The content management server 210 can store and manage sound content. Referring to FIG. 2, when the user first recognizes the sound doll 120 to the audio output station 110, the content management server 210 sends sound content corresponding to the sound doll 120 to the audio output station ( 110).

Additionally, the user can access the service environment using the user terminal 300 and change or edit sound content stored in the content management server 210. For example, the user can change at least one of the number, type, and order of sound tracks stored in the content management server 210 through the user terminal 300. Additionally, the user can select part of the voice data (D_voice) and edit it, such as using it as sound content. However, in this specification, it is assumed that voice data (D_voice) input from the user is used as sound content.

According to some embodiments, the sound doll 120 may include a first sound doll and a second sound doll. The first sound doll can re-record the corresponding sound content, but the second sound doll may not be able to change the corresponding sound content while it is specified. A change in the corresponding sound content may be performed while the first sound doll is docked at the audio output station 101, and a change in the corresponding sound content may be performed while the second sound doll is docked at the audio output station 101. This may not be done. However, the description of the first sound doll and the second sound doll is only an illustrative description and the embodiments are not limited thereto. Of course, the second sound doll can also be implemented to change the sound content. For example, the first sound doll may have recorded sound content recorded by the user uploaded, and the second sound doll may be implemented to change the sound content through subscription to a content server or download from a content server. Hereinafter, for convenience of explanation, it is assumed that the sound doll 120 is a first sound doll capable of outputting sound content recorded from a user.

Referring to FIG. 6 , the content management server 210 may receive voice data (D_voice) and transmit it to the audio output station 110.

The user can generate voice data (D_voice) using the user terminal 300. The user may provide voice data (D_voice) to the content management server 210 through the user terminal 300. The user can access the service environment using the user terminal 300 and change the sound content stored in the content management server 210 to include voice data (D_voice). When the sound content stored in the content management server 210 is changed to include voice data (D_voice), that is, when the sound content managed by the content management server 210 is updated, the content management server 210 An update message (MSG_update) may be transmitted to the output station 110. Here, the update message (MSG_update) may include information indicating that the sound content managed by the content management server 210 has been updated.

Subsequently, the audio output station 110 may provide a voice data request signal (RQ_D_voice) to the content management server 210. In other words, the content management server 210 may receive a voice data request signal (RQ_D_voice) from the audio output station 110. Here, the voice data request signal (RQ_D_voice) may include a request for a download URL for voice data (D_voice) or a direct request for voice data (D_voice).

Subsequently, the content management server 210 may transmit a voice data return signal (RT_D_voice) to the audio output station 110. The voice data return signal (RT_D_voice) may be a signal including a URL from which voice data (D_voice) can be downloaded, or may be a signal including the voice data (D_voice) itself.

Hereinafter, the book recommendation server 220 of the audio output system 10 according to some embodiments of the present invention will be described with reference to FIGS. 7 to 9B.

FIG. 7 is a block diagram for explaining the book recommendation server of FIG. 5, and FIG. 8 is a block diagram for explaining the text analysis engine of FIG. 7. Figure 9a is an example diagram for explaining the book matching unit of Figure 7.

Referring to FIG. 7, the book recommendation server 220 of the audio output system 10 according to some embodiments of the present invention includes a communication unit 221, a voice analysis unit 222, a text analysis engine 223, and book matching. It may include a unit 224 and a database (DB).

The communication unit 221 may receive voice data (D_voice) and book selection data (D_select) from the user terminal 300. Additionally, the communication unit 221 may transmit a recommended book list (LoB) and first book text data (S_ToB) to the user terminal 300. Through this, the book recommendation server 220 can transmit and receive information with the user terminal 300.

The database (DB) can store book text data (ToB). Here, book text data (ToB) may include the text of various books that may be helpful in the development of infants and toddlers. Although not shown in the drawing, the database (DB) may update book text data (ToB) through a communication network. Additionally, the database (DB) may transmit first book text data (S_ToB) corresponding to book selection data (D_select) to the communication unit 221. The first book text data (S_ToB) may be included in the book text data (ToB). Through this, the user can check the text of the selected book on the user terminal 300.

The database (DB) can transmit book text data (ToB) to the text analysis engine 223. Subsequently, the database (DB) may receive the first keyword cluster (KC_B) and store it together with the book text data (ToB).

The voice analysis unit 222 may receive voice data (D_voice) and convert the voice data (D_voice) in sound into user text data (D_text) in text. For example, voice data (D_voice) may be a recording of a fairy tale that the user wants to tell to infants and toddlers. Therefore, the user text data (D_text) may be a fairy tale that the user wants to tell to infants and toddlers converted into text.

The voice analysis unit 222 may transmit user text data (D_text) to the text analysis engine 223. The voice analysis unit 222 may receive the second keyword cluster (KC_T) from the text analysis engine 223. Additionally, the voice analysis unit 222 may transmit the second keyword cluster (KC_T) to the book matching unit 224.

The text analysis engine 223 may receive user text data (D_text) from the voice analysis unit 222. Additionally, the text analysis engine 223 may receive book text data (ToB) from the database (DB).

The text analysis engine 223 can generate keyword clusters by analyzing text data. The text analysis engine 223 may generate a first keyword cluster (KC_B) by analyzing book text data (ToB). According to some embodiments, the text analysis engine 223 may select keywords from book text data (ToB). The text analysis engine 223 may generate a first keyword cluster (KC_B) based on the selected keywords.

Similarly, the text analysis engine 223 may generate a second keyword cluster (KC_T) by analyzing user text data (D_text). According to some embodiments, the text analysis engine 223 may select keywords from user text data (D_text). The text analysis engine 223 may generate a second keyword cluster (KC_T) based on the selected keywords.

The first keyword cluster (KC_B) generated by the text analysis engine 223 may be provided to the database (DB). Additionally, the second keyword cluster (KC_T) generated by the text analysis engine 223 may be provided to the book matching unit 224. Although Figure 7 illustrates that the second keyword cluster (KC_T) generated in the text analysis engine 223 is provided to the book matching unit 224 through the voice analysis unit 222, the embodiments are not limited thereto. . For example, the second keyword cluster (KC_T) generated by the text analysis engine 223 may be provided to the book matching unit 224 through a separate path.

The book matching unit 224 may receive the first keyword cluster (KC_B) from the database (DB). Additionally, the book matching unit 224 may receive the second keyword cluster (KC_T) from the text analysis engine 223 through the voice analysis unit 222. However, as described above, the embodiments are not limited to this, and the book matching unit 224 may receive the second keyword cluster (KC_T) from the text analysis engine 223 without going through the voice analysis unit 222. . The book matching unit 224 may compare the received first keyword cluster (KC_B) and the second keyword cluster (KC_T) to generate a recommended book list (LoB). Additionally, the book matching unit 224 may generate publisher information (PI) by comparing the first keyword cluster (KC_B) and the second keyword cluster (KC_T). The book matching unit 224 may transmit a recommended book list (LoB) and/or publisher information (PI) to the user terminal 300 through the communication unit 221.

Referring to FIG. 8, the text analysis engine 223 of the audio output system 10 according to some embodiments of the present invention may include a natural language processing unit 223_1 and a keyword cluster generating unit 223_2.

The natural language processing unit 223_1 can identify words and extract keywords from text data through natural language processing. The natural language processing unit 223_1 may generate a book keyword (Keyword_1) by processing book text data (ToB) into natural language and extracting keywords. Here, the book keyword (Keyword_1) may be an analysis of the number and frequency of occurrences of a specific word in the book text data (ToB).

Additionally, the natural language processing unit 223_1 may process user text data (D_text) into natural language, extract keywords, and generate user keywords (Keyword_2). Here, the user keyword (Keyword_2), similar to the book keyword (Keyword_1), may be an analysis of the number and frequency of occurrences of a specific word in the user text data (D_text).

The keyword cluster generator 223_2 may generate a first keyword cluster (KC_B) based on the distribution of the book keyword (Keyword_1). The first keyword cluster (KC_B) selects words from the book keyword (Keyword_1) based on a predetermined number of times (e.g., 5 or more times) or a predetermined frequency (e.g., 1 or more occurrences per 10 words). You may. Alternatively, the first keyword cluster (KC_B) may select five words with a high number of appearances or high frequencies from the book keyword (Keyword_1). However, this embodiment is not limited to this, and the specific method of generating the first keyword cluster (KC_B) may vary.

Additionally, the keyword cluster generator 223_2 may generate a second keyword cluster (KC_T) based on the user keyword (Keyword_2). The second keyword cluster (KC_T) can be created in a similar way to the first keyword cluster (KC_B). However, the embodiments are not limited to this, and when the number of words in the user text data (D_text) is significantly different from the number of words in the book text data (ToB), the method for generating the second keyword cluster (KC_T) is 1 The method of creating a keyword cluster (KC_B) may be different.

Referring to Figure 9a, the book matching unit 224 matches the first keyword cluster (KC_B) and the second keyword cluster (KC_T), and accordingly selects some of the books stored in the database (DB) to create a list of recommended books. can be created.

For example, the 2-1 keyword cluster (KC_T1) is 'animal, princess.' When generated as 'English', the book matching unit 224 determines that the 2-1 keyword cluster (KC_T1) has a high matching rate with the 1-1 keyword cluster (KC_B1) and the 1-2 keyword cluster (KC_B2). can do. The 1-1 keyword cluster (KC_B1) may include the keywords 'animal, prince, princess', and the 1-2 keyword cluster (KC_B2) may include the keywords 'animal, princess, love'. Accordingly, the book matching unit 224 recommends books including the Frog Prince (ToB1) corresponding to the 1-1 keyword cluster (KC_B1) and the Little Mermaid (ToB2) corresponding to the 1-2 keyword cluster (KC_B2). A list (LoB) can be created.

Referring to Figure 9b, the book matching unit 224 matches the first keyword cluster (KC_B) and the second keyword cluster (KC_T) to generate publisher information (PI) of the book related to the second keyword cluster (KC_T). can do. According to some embodiments, even if the content is the same, different keywords may be used depending on the publisher. Therefore, by comparing the first keyword cluster (KC_B) and the second keyword cluster (KC_T), it is possible to know which publisher's book the voice data (D_voice) recorded from the user was created using.

For example, the database (DB) contains the first text data (ToB4) of the book 'The Frog Prince' published by the first publisher (P1), and the second text data of the 'Frog Prince' book published by the second publisher (P2). It may include data (ToB5) and third text data (ToB6) of the book 'The Frog Prince' published by a third publisher (P3). The first to fourth keyword clusters (KC_B4) derived from the first text data (ToB4) may be 'animal, prince, princess'. Additionally, the 1st to 5th keyword clusters (KC_B5) derived from the second text data (ToB5) may be 'frog, prince, princess'. Additionally, the 1st-6th keyword cluster (KC_B6) derived from the third text data (ToB6) may be 'animal, prince, princess'. In this way, keyword clusters generated from the same topic, 'Frog Prince', may be different for each publisher. At this time, the book matching unit 224 compares the first keyword cluster (KC_B) and the second keyword cluster (KC_T), and determines whether the voice data (D_voice) related to the second keyword cluster (KC_T) originates from a book of a certain publisher. Information can be generated depending on the In other words, the book matching unit 224 may generate publisher information (PI) related to voice data (D_voice).

If the book matching unit 224 fails to generate information about the publisher related to the voice data (D_voice) through comparison of the first keyword cluster (KC_B) and the second keyword cluster (KC_T), the book matching unit 224 ) can request the user to directly enter the publisher information (PI) of the book the user used.

The book matching unit 224 may transmit a recommended book list (LoB) and publisher information (PI) to the user terminal 300 through the communication unit 221. Users can receive recommendations for fairy tales similar to what they recorded through the recommended book list (LoB). When selecting one from the recommended book list (LoB), that is, when providing book selection data (D_select) to the communication unit 221 through the user terminal 300, the guardian user receives the communication unit through the user terminal 300 ( You can check the first book text data (S_ToB) provided by 221). Through this, guardian users can easily access and record a variety of fairy tales that are helpful for the emotional development of infants and toddlers or on topics preferred by infants and toddlers, and infant users can listen to various fairy tales recorded in the guardian's voice.

As described above, users can create sound content through their own recorded data. At this time, if profits are generated through sound content recorded by the user, problems may arise regarding distribution of the profits. This is because the text data used when recording by the user may be copyrighted by a specific publisher. Therefore, the book matching unit 224 can provide the user with publisher information (PI) if a copyright problem may occur with the corresponding voice data (D_voice), and the book matching unit 224 provides the publisher information If the (PI) cannot be created, the user can be asked to directly enter the publisher information (PI). Users can use publisher information (PI) to share profits generated by using sound content with publishers related to the publisher information (PI). This profit sharing can be performed through this system, and may also be performed through a separate system.

Hereinafter, an audio output method according to some embodiments of the present invention will be described with further reference to FIGS. 10 to 12. Parts that overlap with the above-described embodiments are omitted or simplified.

FIG. 10 is a flowchart for explaining a book recommendation method according to some embodiments of the present invention, and FIG. 11 is a flowchart for explaining the first keyword cluster generation step of FIG. 10. FIG. 12 is a flowchart for explaining the second keyword cluster creation step of FIG. 10.

Referring to FIG. 10, the book recommendation server 220 analyzes the database and creates a first keyword cluster (S100).

In detail, referring to FIG. 11, the text analysis engine 223 receives book text data from the database (S110).

According to some embodiments, the text analysis engine 223 may include a natural language processing unit 223_1 and a keyword cluster generating unit 223_2. The natural language processing unit 223_1 may receive book text data (ToB) from the database (DB).

Next, the text analysis engine 223 can extract book keywords from book text data through natural language processing (S120).

According to some embodiments, the natural language processing unit 223_1 may analyze book text data (ToB) and extract a book keyword (Keyword_1) based on the number of occurrences and frequencies of words.

A first keyword cluster is created according to the distribution of book keywords (S130).

According to some embodiments, the keyword cluster generator 223_2 may receive a book keyword (Keyword_1). Subsequently, the keyword cluster generator 223_2 may generate a first keyword cluster (KC_B) in which keywords related to the topic are selected based on the distribution of the book keyword (Keyword_1).

The text analysis engine 223 stores the first keyword cluster in the database (S140).

According to some embodiments, the keyword cluster generator 223_2 may store the first keyword cluster (KC_B) in the database (DB).

Referring again to FIG. 10, the book recommendation server 220 receives voice data from the user terminal, analyzes the voice data, and generates a second keyword cluster (S200).

In detail, referring to FIG. 12, the voice analysis unit 222 converts the user's voice data into user text data (S210).

According to some embodiments, the voice analysis unit 222 may receive the user's voice data (D_voice) from the communication unit 221. The voice analysis unit 222 may convert voice data (D_voice) into user text data (D_text) in the form of text data.

Next, the text analysis engine 223 extracts user keywords from user text data through natural language processing (S220).

According to some embodiments, the natural language processing unit 223_1 may receive user text data (D_text). The natural language processing unit 223_1 may analyze user text data (D_text) and extract user keywords (Keyword_2) based on the number of occurrences and frequencies of words.

A second keyword cluster is created according to the distribution of user keywords (S230).

According to some embodiments, the keyword cluster generator 223_2 may receive a user keyword (Keyword_2). Subsequently, the keyword cluster generator 223_2 may generate a second keyword cluster (KC_T) in which keywords related to the topic are selected based on the distribution of the user keyword (Keyword_2).

The text analysis engine 223 provides the second keyword cluster to the book matching unit (S240).

According to some embodiments, the keyword cluster generator 223_2 may provide the second keyword cluster (KC_T) to the book matching unit 224 to generate a recommended book list (LoB).

Again, referring to FIG. 10, the book recommendation server 220 generates a recommended book list by matching the first keyword cluster and the second keyword cluster (S300).

According to some embodiments, the book matching unit 224 may receive a first keyword cluster (KC_B) from the database (DB) and a second keyword cluster (KC_T) from the voice analysis unit 222. At this time, the book matching unit 224 may obtain a matching rate by comparing the first keyword cluster (KC_B) and the second keyword cluster (KC_T).

For example, the matching rate can be obtained by comparing the 2-2 keyword cluster (KC_T2) and the 1-1 to 1-5 keyword clusters (KC_B1 to KC_B5), respectively. Since the 2-2 keyword cluster (KC_T2) includes 'Hangul, prince, stepmother', the matching rate with the 1-1 to 1-5 keyword clusters (KC_B1 to KC_B5) is 33%, 0%, and 0%, respectively. , 66%, and 0%. Accordingly, the book matching unit 224 can generate a recommended book list (LoB) in order of high matching rate.

The book recommendation server 220 transmits a list of recommended books to the user terminal (S400).

According to some embodiments, the book matching unit 224 sends a recommended book list (LoB) generated based on the matching rate of the first keyword cluster (KC_B) and the second keyword cluster (KC_T) to the user through the communication unit 221. It can be transmitted to the terminal 300. At this time, the recommended book list (LoB) may include the title of the recommended book and the matching rate, and items expected to be preferred by the user may be displayed at the top.

The book recommendation server 220 receives book selection data from the user terminal (S500).

According to some embodiments, the communication unit 221 may receive book selection data (D_select) in which the user selects one from the recommended book list (LoB). In other words, the user can check the recommended book list (LoB) on the user terminal 300 and select at least one of the recommended book list (LoB). The communication unit 221 may transmit book selection data (D_select) to the database (DB).

The book recommendation server 220 transmits book text data corresponding to the book selection data to the user terminal (S600).

According to some embodiments, the database (DB) may transmit first book text data (S_ToB) corresponding to book selection data (D_select) to the communication unit 221. The communication unit 221 may transmit the first book text data (S_ToB) to the user terminal 300. Through this, users can easily access other fairy tales with similar themes to the fairy tale they recorded, and use them to further arouse the child's interest and develop their emotions.

Hereinafter, with reference to FIG. 13, an audio output system according to some embodiments of the present invention will be described.

Figure 13 is a schematic diagram for explaining an audio output system according to some embodiments of the present invention. For convenience of explanation, content that is the same or similar to the content described above is omitted or briefly explained.

Referring to FIG. 13 , the audio output system 10 according to some embodiments of the present invention may include a microphone 400.

The microphone 400 may be a device through which the user's voice data is input.

Referring to <A1> in FIG. 13, the microphone 400 may be connected to the user terminal 300 wired/wireless. For example, the microphone 400 may be connected to the user terminal 300 via Bluetooth, but this is only an example and the embodiments are not limited thereto. The user can generate voice data (D_voice) through the microphone 400 and provide it to the user terminal 300. The user terminal 300 may provide voice data (D_voice) generated by the microphone 400 to the server 200 through a communication network.

Referring to <A2> in FIG. 13, the microphone 400 may be directly connected to the server 200 through a communication network. The user can generate voice data (D_voice) through the microphone 400. Voice data (D_voice) generated by the microphone 400 may be provided to the server 200 through a communication network.

According to some embodiments, a user can produce recorded sound content with clearer sound quality by using the microphone 400.

Hereinafter, with reference to FIG. 14, the server of the audio output system according to some other embodiments of the present invention will be described. Parts that overlap with the above-described embodiments are omitted or simplified.

Figure 14 is a block diagram for explaining a server of an audio output system according to some other embodiments of the present invention.

Referring to FIG. 14, the first server 201 of the audio output system according to some other embodiments of the present invention may include a sound doll recommendation server 230.

The sound doll recommendation server 230 may receive the user's voice data (D_voice). The sound doll recommendation server 230 may analyze the received voice data (D_voice) and recommend the sound doll 120 associated with the voice data (D_voice).

The sound doll recommendation server 230 may analyze a plurality of sound contents corresponding to a plurality of sound dolls. Specifically, the sound doll recommendation server 230 may analyze text data for a plurality of sound contents and generate a keyword cluster for each sound content. For example, the sound doll recommendation server 230 receives text data such as song lyrics and fairy tale text for sound content included in each of the plurality of sound dolls 120 and analyzes the text data, so that each of the plurality of sound dolls 120 A third keyword cluster can be created.

Additionally, the sound doll recommendation server 230 may analyze the voice data (D_voice) and generate a second keyword cluster (KC_T) for the voice data (D_voice). The sound doll recommendation server 230 may compare the second keyword cluster (KC_T) and the third keyword cluster and recommend a sound doll associated with the voice data (D_voice).

For example, the sound doll recommendation server 230 may analyze the first sound content corresponding to the first sound doll. At this time, the sound doll recommendation server 230 may analyze the first text data for the first sound content and generate the 3-1 keyword cluster.

Additionally, the sound doll recommendation server 230 may analyze the second sound content corresponding to the second sound doll. At this time, the second sound content may be different from the first sound content. The sound doll recommendation server 230 may analyze the second text data for the second sound content and generate a 3-2 keyword cluster.

When receiving the user's voice data (D_voice) in the same manner as shown in FIG. 7, the sound doll recommendation server 230 may convert the voice data (D_voice) into user text data (D_text) in the form of text data. The sound doll recommendation server 230 may analyze user text data (D_text) and generate a fourth keyword cluster.

Next, the sound doll recommendation server 230 may compare the 3-1, 3-2, and 4th keyword clusters and determine the sound doll with a high keyword cluster matching rate as the recommended sound doll. For example, when the matching rate of the 3-1st keyword cluster and the 4th keyword cluster is higher than the matching rate of the 3-2nd keyword cluster and the 4th keyword cluster, the sound doll recommendation server 230 selects the first sound doll. A recommended sound doll may be determined, and information about the first sound doll may be provided to the user terminal 300. At this time, information about the first sound doll may include the product name, sound content, purchase link, etc. of the sound doll. Through this, the user can be encouraged to purchase a sound doll that corresponds to sound content that matches well with the user's personality.

According to the embodiments of the present invention described so far, by providing an audio output system that operates by docking a sound doll to an audio station, it is possible to provide an environment in which infants and young children can select various sounds on their own. Additionally, it is possible to provide infants and young children with a variety of auditory stimulation while minimizing their exposure to digital imaging devices. In particular, it is possible to provide emotional stability to infants and toddlers by using sound content using the voice of the guardian. Additionally, by recommending content similar to the sound content input by the user, it is possible to provide an environment in which infants and young children can easily access various sound content.

The above description is merely an illustrative explanation of the technical idea of the present embodiment, and those skilled in the art will be able to make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are not intended to limit the technical idea of the present embodiment, but rather to explain it, and the scope of the technical idea of the present embodiment is not limited by these examples. The scope of protection of this embodiment should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of this embodiment.

Claims

An audio output device that outputs sound content for auditory stimulation of infants and young children;

A user terminal into which the user's voice data is input;

a content management server that receives the user's voice data and provides the user's voice data to the audio output device; and

A book recommendation server that receives the user's voice data and recommends books associated with the user's voice data using the user's voice data,

The audio output device outputs the user's voice data as the sound content,

The book recommendation server is,

Generate a first keyword cluster based on a plurality of book text data stored in the database,

Generating a second keyword cluster based on the user's voice data,

Comparing the first keyword cluster and the second keyword cluster to recommend a book related to the user's voice data

Audio output system.
According to claim 1,

The audio output device includes an audio output station including a sound doll and a docking space where the sound doll is docked,

The audio output station recognizes a sound doll docked in the docking space and outputs sound content corresponding to the recognized sound doll.

Audio output system.
According to claim 1,

The content management server is,

When receiving the user's voice data, transmitting a message about a content update to the audio output device,

When receiving a request signal for the user's voice data from the audio output device, transmitting the user's voice data to the audio output device

Audio output system.
According to claim 1,

The book recommendation server is,

a communication unit that receives book selection data and the user's voice data from the user terminal, and transmits a recommended book list and first book text data associated with the book selection data to the user terminal;

a voice analysis unit that converts the user's voice data into user text data;

a text analysis engine that generates the first keyword cluster and the second keyword cluster;

Comprising a book matching unit that compares the first and second keyword clusters and generates the recommended book list.

Audio output system.
According to clause 4,

The text analysis engine is,

a natural language processing unit that receives the user text data and the plurality of book text data, processes them into natural language, and extracts keywords;

A keyword cluster generator that generates the first and second keyword clusters based on the distribution of the keywords.

Audio output system.
In an audio output method that is performed on a server linked to a user terminal and an audio output device and outputs different sound content depending on the sound doll,

Generating a first keyword cluster by analyzing a plurality of book text data stored in a database included in the server;

Receiving the user's voice data from the user terminal and analyzing the user's voice data to generate a second keyword cluster;

Generating a recommended book list by matching the first and second keyword clusters;

Transmitting the recommended book list to the user terminal;

Receiving book selection data for the recommended book list from the user terminal; and

Comprising the step of transmitting first book text data corresponding to the book selection data to the user terminal.

Audio output method.
According to clause 6,

The step of generating the first keyword cluster is,

Receiving the plurality of book text data from the database;

extracting book keywords for each of the plurality of book text data through natural language processing;

generating the first keyword cluster according to the distribution of the book keywords;

Comprising the step of storing the first keyword cluster in the database.

Audio output method.
According to clause 6,

The step of generating the second keyword cluster is,

converting the user's voice data into user text data;

extracting user keywords from the user text data through natural language processing;

Comprising the step of generating the second keyword cluster according to the distribution of the user keywords.

Audio output method.
An audio output device that outputs sound content for auditory stimulation of infants and young children;

A microphone into which the user's voice data is input;

a user terminal that edits the user's voice data;

a content management server that provides the user's voice data to the audio output device; and

A book recommendation server that recommends books related to the user's voice data using the user's voice data,

The book recommendation server is,

Converting the user's voice data into user text data in the form of text data,

Processing the user text data in natural language to extract keywords for the user text data,

Based on the distribution of the keywords, generate a keyword cluster,

Including recommending books related to the user's voice data using the keyword cluster.

Audio output system.
a first sound doll corresponding to the first sound content;

a second sound doll corresponding to second sound content different from the first sound content;

an audio output station that outputs sound content corresponding to the sound doll;

A user terminal into which the user's voice data is input; and

A sound doll recommendation server that recommends a sound doll associated with the user's voice data,

The sound doll recommendation server is,

Analyzing first text data for the first sound content to generate a first keyword cluster,

Analyzing second text data for the second sound content to generate a second keyword cluster,

By analyzing the user's voice data, a third keyword cluster is generated,

Comparing the first to third keyword clusters to determine a sound doll associated with the user's voice data.

Audio output system.