WO2023132534A1 - Dispositif électronique et son procédé de fonctionnement - Google Patents

Dispositif électronique et son procédé de fonctionnement Download PDF

Info

Publication number
WO2023132534A1
WO2023132534A1 PCT/KR2022/021048 KR2022021048W WO2023132534A1 WO 2023132534 A1 WO2023132534 A1 WO 2023132534A1 KR 2022021048 W KR2022021048 W KR 2022021048W WO 2023132534 A1 WO2023132534 A1 WO 2023132534A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
sound source
tags
electronic device
tag
Prior art date
Application number
PCT/KR2022/021048
Other languages
English (en)
Korean (ko)
Inventor
박별
장정록
Original Assignee
삼성전자 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자 주식회사 filed Critical 삼성전자 주식회사
Publication of WO2023132534A1 publication Critical patent/WO2023132534A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/44Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/487Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/64Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/687Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location

Definitions

  • Various disclosed embodiments relate to an electronic device and an operating method thereof, and more particularly, to an electronic device and an operating method thereof that automatically generate music based on an image displayed on a screen or a surrounding situation.
  • a user can enjoy photos, images, digital works, and the like using electronic devices.
  • a user may prefer to view an image while listening to background music suitable for a photograph or a surrounding situation, rather than viewing a photograph or image in a static environment.
  • Music that has already been created has a limitation in that it is difficult to reflect all the various situations that change every time, such as user tastes, photos currently displayed on the screen, or current surroundings. Accordingly, there is a need for a technology for generating music suitable for an image by automatically considering an image currently output from an electronic device, a surrounding situation, a user's taste, and the like, and providing the music to the user.
  • An electronic device includes a memory that stores one or more instructions and a processor that executes the one or more instructions stored in the memory, and by executing the one or more instructions, the processor obtains image information, surrounding situation information, and Sound source generation information including at least one of user taste information may be acquired, sound source generation tags mapped to the sound source generation information may be acquired, and a sound source may be generated based on the sound source generation tags.
  • 1 is a diagram for explaining generating music based on sound source generation information and providing it to a user according to an embodiment.
  • FIG 2 is an internal block diagram of an example of an electronic device according to an embodiment.
  • FIG. 3 is an internal block diagram of the processor of FIG. 2 according to an embodiment.
  • FIG. 4 is an internal block diagram of a sound source generation information acquisition unit of FIG. 3 according to an embodiment.
  • FIG. 5 is a diagram for explaining a method of obtaining image information by the image information obtaining unit of FIG. 4 according to an embodiment.
  • FIG. 6 is an internal block diagram of a sound generating tag acquisition unit of FIG. 3 according to an embodiment.
  • FIG. 7 is a diagram for explaining that the tag filtering unit of FIG. 6 filters tags in consideration of scores for each tag according to an embodiment.
  • FIG. 8 is a diagram illustrating a relationship between sound source creation information and a tag according to an embodiment.
  • FIG. 9 is a diagram for explaining a neural network learned to acquire a sound source from a tag according to an embodiment.
  • FIG 10 is an internal block diagram of an electronic device according to an embodiment.
  • FIG. 11 is a flowchart illustrating a method of generating a sound source according to an embodiment.
  • FIG. 12 is a flowchart illustrating a method of filtering sound source generation tags according to an embodiment.
  • FIG. 13 is a flowchart illustrating a method of generating a sound source by filtering sound source generation tags for each sound source generation information according to an embodiment.
  • FIG. 14 is a flowchart illustrating a method of obtaining a weight for each tag according to an embodiment.
  • the processor may filter sound source generation tags having high scores among the sound source generation tags by executing the one or more instructions, and generate the sound source using the filtered sound source generation tags.
  • the processor obtains a score for each tag generated by the sound source mapped to the image information based on at least one of the accuracy of the recognition result, the degree of redundancy for each tag, and the weight for each tag. And, among the sound source generation tags mapped to the image information, sound source generation tags having a high score may be filtered to obtain first tags.
  • the processor obtains a score for each sound source generation tags mapped to the surrounding context information based on a weight for each context-based tag indicating user preference according to the context, and the surrounding Second tags may be obtained by filtering sound source generation tags having a high score among sound source generation tags mapped to situation information.
  • the processor may generate the sound source using at least one of the first tags and the second tags by executing the one or more instructions.
  • the processor additionally filters the filtered sound source generation tags based on at least one of the surrounding context information and user identification information by executing the one or more instructions, and uses the additionally filtered tags. By doing so, the sound source can be created.
  • the processor may obtain a weight for each tag representing user preference based on at least one of the user taste information and music play history information by executing the one or more instructions.
  • the processor may reproduce music according to the generated sound source and update the weight for each tag according to music reproduction information by executing the one or more instructions.
  • the music reproduction information may include information about the music reproduction frequency, total music listening level, playback stop level, fast-forward level, and skip level.
  • the electronic device further includes a display, and by executing the one or more instructions, the processor executes additional information about an image output to the display, a color or style identified in the image, and a color or style identified in the image.
  • the image information may be obtained based on at least one of a type of object and, when the identified object is a person, a person's facial expression.
  • the electronic device further includes at least one of a camera, a sensor, and a communication module
  • the processor executes the one or more instructions, thereby obtaining a user, obtained from at least one of the camera, the sensor, and the communication module.
  • the surrounding situation information may be obtained from at least one of presence/absence information, weather information, date information, time information, season information, holiday information, anniversary information, temperature information, illuminance information, and location information.
  • the processor may obtain the user taste information from at least one of user profile information, user viewing history information, and preferred music information selected by the user by executing the one or more instructions.
  • An operating method of an electronic device includes obtaining sound source creation information including at least one of image information, surrounding situation information, and user taste information, and acquiring sound source creation tags mapped to the sound source creation information. and generating a sound source based on the sound source creation tags.
  • the computer-readable recording medium In the computer-readable recording medium according to the embodiment, acquiring sound source creation information including at least one of image information, surrounding situation information, and user taste information, and acquiring sound source creation tags mapped to the sound source creation information. It may be a computer-readable recording medium on which a program for implementing a method of operating an electronic device, including the step of performing and generating a sound source based on the sound source generation tags, is recorded.
  • Some embodiments of the present disclosure may be represented as functional block structures and various processing steps. Some or all of these functional blocks may be implemented as a varying number of hardware and/or software components that perform specific functions.
  • functional blocks of the present disclosure may be implemented by one or more microprocessors or circuit configurations for a predetermined function.
  • the functional blocks of this disclosure may be implemented in various programming or scripting languages.
  • Functional blocks may be implemented as an algorithm running on one or more processors.
  • the present disclosure may employ prior art for electronic environment setting, signal processing, and/or data processing. Terms such as “mechanism”, “element”, “means” and “composition” may be used broadly and are not limited to mechanical and physical components.
  • connecting lines or connecting members between components shown in the drawings are only examples of functional connections and/or physical or circuit connections. In an actual device, connections between components may be represented by various functional connections, physical connections, or circuit connections that can be replaced or added.
  • ...unit and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .
  • the term “user” means a person who uses an electronic device, and may include a consumer, an evaluator, a viewer, an administrator, or an installer.
  • 1 is a diagram for explaining generating music based on sound source generation information and providing it to a user according to an embodiment.
  • the electronic device 100 may output an image on the screen.
  • the electronic device 100 may be implemented as various types of display devices including screens.
  • FIG. 1 illustrates a case where the electronic device 100 is a digital TV.
  • the electronic device 100 may output an image on a screen by executing an ambient service.
  • the ambient service may refer to a service that allows a meaningful image such as a picture, photo, or clock to be displayed instead of a black screen when a display device such as a digital TV is in an off state.
  • the electronic device 100 may display an image pre-stored inside the electronic device 100 on the screen or receive an image for executing an ambient service from an external server and display the image on the screen.
  • the electronic device 100 may perform wired/wireless communication with a peripheral device to output a photo or work image stored in the peripheral device through the screen of the electronic device 100 .
  • the electronic device 100 transmits photos or pictures stored in a user terminal such as a USB (not shown), a PC (not shown), a tablet (not shown), a mobile phone (not shown), etc. to the electronic device 100. can be output to the screen.
  • the electronic device 100 may obtain sound source creation information.
  • the sound source generation information is information collected for generating a sound source, and may mean information affecting the generation of a sound source.
  • the information affecting the generation of a sound source may include an image output on a screen, information indicating external conditions or conditions around a user, or information about a user's preference or taste.
  • the electronic device 100 converts information about an image output on the screen into image information, information representing circumstances or conditions around it as surrounding situation information, and information representing a user's preference or taste as user taste information. can be obtained
  • the electronic device 100 may obtain image information from an image output on a screen.
  • the image information may be information about unique characteristics of an image itself output on a screen.
  • the image information may include at least one of a color or style identified in an image output on the screen, a type of object identified in the image, and a human expression when the identified object is a person.
  • the image information may include additional information about the image.
  • FIG. 1 illustrates, for example, that the electronic device 100 executes the ambient service and outputs Vincent van Gogh's famous painting 'Sunflower' on the screen.
  • the electronic device 100 analyzes the image output on the screen, the object included in the image is a sunflower, the style information of the image is that it is a Vincent van Gogh-style masterpiece, the color is dark yellow, and other descriptions of Vincent van Gogh. At least one of the additional information on the or sunflower work may be obtained.
  • the electronic device 100 may obtain surrounding context information.
  • the surrounding situation information may refer to information indicating a situation around or outside the place where the electronic device 100 and the user are located.
  • the surrounding situation information may be acquired through a camera or sensor provided in the electronic device 100 or obtained by receiving it from an external server.
  • the surrounding situation information may include at least one of information about whether or not a user exists, weather information, date information, time information, season information, holiday information, anniversary information, temperature information, illuminance information, and location information.
  • the electronic device 100 obtains information that the ambient temperature is 20 degrees Celsius through a temperature sensor (not shown) or information that the illuminance is 300 lux (lx) through an illuminance sensor (not shown). can be obtained Alternatively, the electronic device 100 may receive information from an external server or the like through a communication module (not shown) such that the current time is afternoon, the surrounding weather is warm, the season is autumn, today's date is September 5, and the electronic device 100 ) may obtain information such as the location of Seattle, Washington, USA.
  • the electronic device 100 may obtain user preference information.
  • User preference information may refer to information indicating a user's hobby or preferred direction.
  • user preference information may be obtained from user profile information or user viewing history information.
  • the user taste information may be acquired by directly selecting preferred music information from the user.
  • the user taste information may be acquired based on a previous music listening history when the user has a previous music listening history.
  • the electronic device 100 obtains information that the user is a woman in her 30s from the user's profile information and that the user's preferred program is a melodrama from the viewing history of the electronic device 100, and the like. preferences can be inferred.
  • the electronic device 100 may infer the user's taste from music information that the user prefers or music that the user has previously listened to is a classic song, a quiet song, or a song played with piano and violin instruments. can
  • the electronic device 100 may obtain sound source creation information including at least one of image information, surrounding situation information, and user taste information, and acquire sound source creation tags mapped to the sound source creation information.
  • the electronic device 100 may filter sound source creation tags having high scores among sound source creation tags.
  • the electronic device 100 may obtain a weight for each tag in order to filter sound source generation tags.
  • the electronic device 100 may obtain a weight for each tag representing user preference for each tag based on at least one of user taste information and music playback history information.
  • the electronic device 100 obtains a score for each sound generating tag mapped to image information based on at least one of the accuracy of the recognition result, the degree of overlap for each tag, and the weight for each tag, and maps the score to the image information.
  • the first tags may be obtained by filtering sound source generation tags having a high score among the sound source generation tags.
  • the electronic device 100 may obtain a weight for each context-based tag representing user preference according to context.
  • the electronic device 100 acquires scores for each sound source generation tags mapped to the surrounding context information based on the weight of each context-based tag, and selects sound source generation tags having a high score among the sound source generation tags mapped to the surrounding context information.
  • Second tags may be obtained by filtering.
  • the electronic device 100 may generate a sound source using at least one of the first tags and the second tags.
  • the electronic device 100 may additionally filter the filtered sound source generation tags based on at least one of surrounding situation information and user identification information, and may create a sound source using the additionally filtered tags.
  • the electronic device 100 may generate a sound source from sound source generation tags using at least one neural network.
  • the neural network used by the electronic device 100 may be a neural network trained using tags and sound sources as learning data sets.
  • the electronic device 100 may play music according to the generated sound source.
  • the electronic device 100 may obtain music play information representing the degree to which the user plays music, and update the weight for each tag according to the music play information.
  • the music playback information may include information about a music playback frequency, a total listening level of music, a playback stop level, a fast-forward level, and a skip level.
  • the electronic device 100 acquires various types of sound source generation information, and based on the sound source generation information, generates a sound source suitable for an image, a surrounding situation, a user's taste, etc. It is possible to provide music suitable for surrounding situations and other user preferences.
  • FIG 2 is an internal block diagram of an example of an electronic device according to an embodiment.
  • the electronic device 100a of FIG. 2 may be an example of the electronic device 100 of FIG. 1 .
  • the electronic device 100a may be implemented as various types of display devices capable of outputting images through a screen.
  • the display device may be a device that visually outputs an image to a user.
  • the electronic device 100a may include a digital television, a wearable device, a smart phone, and various personal computers (PCs), such as a desktop, tablet PC, laptop computer, personal digital assistant (PDA), and global positioning (GPS). system) device, smart mirror, e-book reader, navigation, kiosk, digital camera, wearable device, smart watch, home network device, security device, medical device, etc. It may be an electronic device of The electronic device 100a may be a fixed type or a mobile type.
  • the electronic device 100a may be in the form of a display inserted into the front of various types of home appliances such as a refrigerator or a washing machine.
  • the electronic device 100a may be implemented as an electronic device connected to a display device including a screen through a wired or wireless communication network.
  • the electronic device 100a may be implemented in the form of a media player, a set-top box, or an artificial intelligence (AI) speaker.
  • AI artificial intelligence
  • the electronic device 100a may include the aforementioned digital television, wearable device, smart phone, various personal computers (PCs), such as a desktop, tablet PC, laptop computer, and PDA (personal digital assistant), media player, micro server, global positioning system (GPS) device, smart mirror, e-reader, navigation, kiosk, digital camera, wearable device, smart watch , home network devices, security devices, medical devices, displays inserted into the front of refrigerators, washing machines, and other home appliances, media players, set-top boxes, or AI speakers.
  • PCs personal computers
  • PCs personal computers
  • PDA personal digital assistant
  • the electronic device 100a may include a processor 210 and a memory 220 .
  • the memory 220 may store at least one instruction.
  • the memory 220 may store at least one program executed by the processor 210 .
  • Predefined operation rules or programs may be stored in the memory 220 .
  • the memory 220 may store data input to or output from the electronic device 100a.
  • the memory 220 may be a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg SD or XD memory, etc.), RAM (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk , an optical disk, and at least one type of storage medium.
  • RAM Random Access Memory
  • SRAM Static Random Access Memory
  • ROM Read-Only Memory
  • EEPROM Electrical Erasable Programmable Read-Only Memory
  • PROM Program Memory
  • magnetic memory magnetic disk , an optical disk, and at least one type of storage medium.
  • the memory 220 may include one or more instructions for obtaining sound source creation information.
  • the memory 220 may include one or more instructions for obtaining a weight for each tag.
  • the memory 220 may store a weight for each tag.
  • the memory 220 may include one or more instructions for updating the weight for each tag.
  • software for generating a sound source from a tag may be stored in the memory 220 .
  • At least one neural network and/or a predefined operating rule or AI model may be stored in the memory 220 .
  • at least one neural network and/or a predefined operating rule or AI model stored in the memory 220 may include one or more instructions for generating a sound source from a tag.
  • the processor 210 controls the overall operation of the electronic device 100a.
  • the processor 210 may control the electronic device 100a to function by executing one or more instructions stored in the memory 220 .
  • the processor 210 may obtain sound source generation information.
  • the sound source generation information may include at least one of image information, surrounding situation information, and user taste information.
  • the electronic device 100a may further include a display (not shown).
  • the processor 210 may be based on at least one of additional information about an image output on a display, a color or style identified in the image, a type of object identified in the image, and a person's expression when the identified object is a person. image information can be obtained.
  • the electronic device 100a may further include at least one of a camera (not shown), a sensor (not shown), and a communication module (not shown).
  • the processor 210 may obtain information about the presence or absence of a user, weather information, date information, time information, season information, holiday information, anniversary information, temperature information, and illuminance, obtained from at least one of a camera, a sensor, and a communication module.
  • Surrounding situation information may be obtained from at least one of information and location information.
  • the processor 210 may obtain user taste information from at least one of user profile information, user viewing history information, and preferred music information selected by the user.
  • the processor 210 may filter sound source generation tags having high scores among sound source generation tags and generate a sound source using the filtered sound source generation tags.
  • the processor 210 obtains a score for each tag generated by a sound source mapped to image information based on at least one of the accuracy of the recognition result, the degree of overlap for each tag, and the weight for each tag representing user preference, and First tags may be obtained by filtering sound source generation tags having a high score among sound source generation tags mapped to information.
  • the processor 210 obtains a score for each sound source generation tag mapped to surrounding context information based on the weight of each context-based tag indicating user preference according to the context, and the sound source creation tag mapped to the surrounding context information.
  • Second tags may be obtained by filtering sound source generation tags having a high score among the tags.
  • the processor 210 may generate a sound source using at least one of the first tags and the second tags.
  • the processor 210 may additionally filter the filtered sound source creation tags based on at least one of surrounding situation information and user identification information, and generate a sound source using the additionally filtered tags.
  • the processor 210 may obtain a weight for each tag representing user preference based on at least one of user taste information and music playback history information.
  • the processor 210 may play music according to the generated sound source and update weights for each tag according to music play information.
  • the music playback information may include information about a music playback frequency, a total listening level of music, a playback stop level, a fast-forward level, and a skip level.
  • the processor 210 may obtain a score for each sound source generation tag using the updated weight for each tag when obtaining a score for each tag thereafter.
  • the processor 210 may filter tags having high scores for each sound source generation tags and generate a sound source using the filtered sound source generation tags.
  • the processor 210 may obtain a sound source from sound source generation tags using at least one neural network.
  • the processor 210 may use artificial intelligence (AI) technology.
  • the processor 210 may store at least one AI model.
  • the processor 210 may generate output data from input data using a plurality of AI models.
  • the memory 220 rather than the processor 210 may store AI models, that is, neural networks.
  • the neural network used by the processor 210 may be a neural network trained to acquire sound sources from tags.
  • the processor 210 may obtain a sound source from sound source generation tags using a neural network.
  • the neural network may include Star Generative Adversarial Networks (GANs).
  • FIG. 3 is an internal block diagram of the processor of FIG. 2 according to an embodiment.
  • the processor 210 may include a sound source creation information acquisition unit 310, a sound source generation tag acquisition unit 320, a sound source generation unit 330, and a music playback unit 340.
  • the sound source generation information acquisition unit 310 may acquire various sound source generation information to generate a sound source.
  • the sound source generation information may include at least one of image information, surrounding situation information, and user preference information. A method for obtaining the sound source information by the sound source creation information acquisition unit 310 will be described in more detail in the detailed description of FIGS. 4 and 5 .
  • the sound source generation tag acquisition unit 320 may receive sound source generation information from the sound source generation information acquisition unit 310 and derive sound source generation tags from the sound source generation information.
  • the sound source generation tag acquisition unit 320 may acquire sound source generation tags mapped to sound source generation information.
  • a tag may mean metadata such as keywords or words assigned to information. Allocating tags to information may mean that tags of various fields or properties are associated with information.
  • a plurality of pieces of information and tags assigned to each piece of information may be stored in the memory 220 or a database (not shown) of the electronic device 100a.
  • a plurality of pieces of information and tags assigned to the pieces of information may be stored in an external server instead of the electronic device 100a.
  • the sound source generation tag acquisition unit 320 receives the sound source generation information from the sound source generation information acquisition unit 310, and searches for sound source generation information among numerous pieces of information stored in the memory 220, a database, or an external server. and search for tags mapped to sound source generation information.
  • tags mapped to sound source creation information will be referred to as sound source creation tags.
  • the sound source generation tag acquisition unit 320 may filter sound source generation tags. In an embodiment, the sound source generation tag acquisition unit 320 may assign a score to each sound source generation tag in order to filter the sound source generation tags. The sound source generation tag acquisition unit 320 may acquire weights for each tag in order to assign scores to the sound source generation tags. The weight for each tag may indicate user preference for each tag. In an embodiment, the sound source generation tag acquisition unit 320 may obtain a weight for each tag based on at least one of user taste information and music reproduction history. The sound source generation tag acquisition unit 320 may acquire a score for each tag in consideration of the weight of each tag or the degree of overlap of tags, and may filter the sound source generation tags according to the score for each tag.
  • the sound source generation tag acquisition unit 320 may further filter the filtered sound source generation tags according to surrounding situation information or user profile information.
  • the sound source generation tag acquisition unit 320 may transmit the filtered sound source generation tags to the sound source generation unit 330 .
  • the sound source generation unit 330 may receive sound source generation tags from the sound source generation tag acquisition unit 320 and obtain a sound source by using the sound source generation tags or obtain a sheet music for generating a sound source.
  • the sound source may mean music data in a form that can be downloaded or played through real-time streaming.
  • the sound source may be in the form of a playable music file such as mp3, midi, or wav.
  • the sound source generator 330 may obtain a sound source using a neural network.
  • the neural network used by the sound generator 330 may be a neural network trained using music content and tags suitable for the music content as a learning data set. More specifically, music content may be encoded as text information such as composition, chord, melody, beat, time signature, tempo, rhythm, genre, atmosphere, and the like. Tags suitable for each music content may be labeled and used as a learning data set.
  • the trained neural network may receive a tag and acquire music content, that is, a sound source suitable for the input tag.
  • the music reproducing unit 340 may reproduce music according to the sound source generated by the sound generating unit 330 .
  • the music reproducing unit 340 may receive sheet music from the sound source generator 330 and reproduce the sound source according to the sheet music, or may receive the sound source itself from the sound source generator 330 and reproduce the sound source using a music player.
  • a user may listen to music reproduced by the music reproducing unit 340 .
  • the user may listen to the music several times because he likes the music being played, or may stop playing the music before the music ends because he does not like the music.
  • information on the user's music listening may be fed back to the sound generating tag acquisition unit 320 and transmitted.
  • the sound source creation tag acquisition unit 320 may obtain music play information and update weights for each tag using the music play information.
  • the sound source creation tag acquisition unit 320 raises the weight for a tag related to a sound source that the user has repeatedly listened to based on the user's music playback history, and lowers the weight for a tag related to a sound source that the user has stopped, thereby generating weight for each tag. can be updated.
  • the sound source generation tag acquisition unit 320 then obtains a score for each tag using the updated weight for each tag, and filters the tag based on the obtained score, so that music in which the user's music playback history is well reflected can be generated.
  • FIG. 4 is an internal block diagram of a sound source generation information acquisition unit of FIG. 3 according to an embodiment.
  • the sound source creation information acquisition unit 310 may include an image information acquisition unit 311 , a surrounding situation information acquisition unit 313 and a user taste information acquisition unit 315 .
  • the image information acquisition unit 311 may obtain image information from an image output to the electronic device 100a.
  • the image information acquiring unit 420 may acquire image information by capturing an image output on a screen and analyzing the captured image.
  • the image information may be information about unique characteristics of an image itself output on a screen.
  • the image information may be obtained based on at least one of a color or style identified in the image, a type of object identified in the image, and a person's facial expression when the identified object is a person.
  • the image information acquisition unit 311 receives additional information about an image displayed on the screen together with the image or separately from the image, from an internal memory of the electronic device 100a, an external server, or an external user terminal, and obtains the additional information. It can also be used as image information.
  • the surrounding situation information obtaining unit 313 may receive at least one of a communication signal, a sensor signal, and a camera signal.
  • the surrounding situation information acquisition unit 313 obtains at least one of a communication signal, a sensor signal, and a camera signal at a predetermined time, at a random time interval, at a predetermined time, or whenever an event such as a sudden temperature change or date change occurs. You can get a new one.
  • the communication signal is a signal obtained from an external server or the like through a communication network, and is information representing an external situation, for example, at least one of external weather information, date information, time information, season information, illuminance information, temperature information, location information, and holiday information. may contain one.
  • the surrounding situation information acquisition unit 313 may obtain sensor signals for external conditions around the electronic device 100a using various sensors.
  • the sensor signal is a signal sensed through a sensor, and may include various types of signals according to the type of sensor.
  • the surrounding situation information acquisition unit 313 may detect ambient temperature or humidity using a temperature/humidity sensor. Alternatively, the surrounding situation information acquisition unit 313 may detect the ambient light around the electronic device 100a using the light sensor. The illuminance sensor may measure brightness according to the amount of light by measuring the amount of ambient light. Alternatively, the surrounding situation information acquisition unit 313 may detect the location of the electronic device 100a using a location sensor.
  • the surrounding situation information acquisition unit 313 may detect a distance between the electronic device 100a and the user by using a location sensor and/or a proximity sensor.
  • the surrounding situation information acquisition unit 313 uses a presence sensor to sense whether or not there is a person nearby according to whether an IR signal emitted from the presence sensor is reflected or a time interval after which the IR signal is reflected and returned.
  • the surrounding situation information obtaining unit 313 may use a camera instead of a presence sensor to identify whether there is a user around the electronic device 100a.
  • the surrounding context information acquisition unit 313 may determine whether or not there is a user and acquire the user presence as surrounding context information.
  • the user taste information acquisition unit 315 may obtain user taste information.
  • User taste information may be obtained to infer the user's preferred music.
  • the user taste information may be inferred from at least one of user profile information, user viewing history information, and preferred music information.
  • User profile information is information for identifying a user and may be generated based on the user's account.
  • the user profile information may include anniversary information such as the user's gender, age, marital status, child status, number of family members, occupation, and birthday.
  • anniversary information such as the user's gender, age, marital status, child status, number of family members, occupation, and birthday.
  • the electronic device 100a matches profile information input by the user when creating an account with the user account and stores it in the electronic device 100a, or It can be stored in an interlocked external server that provides services to the electronic device 100a.
  • the electronic device 100a is equipped with a camera
  • the electronic device 100a recognizes the user's face captured through the camera to identify the user's age group or the user's gender and infer user preference information therefrom. may be
  • the user preference information acquisition unit 315 may acquire history information on the user's viewing of programs or content using the electronic device 100a and infer user information from this.
  • the user taste information acquisition unit 315 may infer that the user prefers bright and warm music.
  • the user taste information may be obtained from preferred music information selected by the user.
  • the user inputs information about preferred music when creating an account on the electronic device 100a, or when using the program for the first time to execute a program that automatically creates music using the electronic device 100a.
  • Information on preferred music may be directly input into the electronic device 100a only for .
  • the user taste information may be obtained or updated based on a previous music listening history of the user, if there is a previous music listening history.
  • user preference information is obtained from information about mood, velocity, instrument, key, chord, melody, beat, time signature, tempo, rhythm, genre, atmosphere, etc. of music that the user prefers or previously listened to It can be.
  • the user preference information acquisition unit 315 determines the degree to which the user prefers a specific music from the music playback time indicating whether the user has played the specific music, whether the user has listened to all or only a part of the corresponding music, and the like. may be obtained and user preference information may be inferred therefrom.
  • User taste information may be updated periodically or whenever a music listening event occurs according to the user's previous music listening history. Therefore, more accurate user taste information can be obtained as the previous music listening history increases.
  • the electronic device 100a may generate a sound source using various types of sound source generation information such as image information, surrounding situation information, and user preference information. Accordingly, even when the same image is output on the screen, the electronic device 100a may generate and provide a different sound source to the user according to surrounding situation information or user preference information.
  • FIG. 5 is a diagram for explaining a method of obtaining image information by the image information obtaining unit of FIG. 4 according to an embodiment.
  • the image information acquisition unit 311 may receive an image and obtain image information from the image.
  • Image information may be information representing unique characteristics of the image itself.
  • the image information may obtain image information from at least one of additional information about the image, a color or style identified in the image, a type of object identified in the image, and a human expression when the identified object is a person.
  • the image information acquisition unit 311 may acquire image information from an image using at least one neural network.
  • At least one neural network may be a neural network based on a Convolution Neural Network (CNN), Deep Convolution Neural Network (DCNN), or Capsnet, but is not limited thereto.
  • CNN Convolution Neural Network
  • DCNN Deep Convolution Neural Network
  • Capsnet Capsnet
  • the image information obtaining unit 311 may obtain color information from an image.
  • the color information may be RGB values of colors frequently used in the image.
  • the image information acquisition unit 311 may group the RGB values of each pixel into similar colors through a color difference algorithm.
  • the image information acquisition unit 311 may acquire RGB values corresponding to one or a plurality of dominant colors for each image by clustering dominant colors from the grouped colors.
  • the image information acquisition unit 311 may identify blue, sky blue, and white as dominant colors in the image and additionally obtain emotion information suitable for these colors, such as coolness and coolness.
  • the image information obtaining unit 311 may obtain style information from an image.
  • the style information may be information indicating whether the style of the image is noir, vintage, romantic, or horror.
  • the style information may include a painting style representing a painting style.
  • Style information may indicate a drawing method or style, such as watercolor, oil painting, ink painting, pointillism, or three-dimensional painting, or may indicate tendencies and characteristics of a specific artist, such as Van Gogh style, Monet style, Manet style, or Picasso style.
  • the style information is a characteristic classified by era, such as the Middle Ages, the Renaissance, the Modern Age, or contemporary painting, or a characteristic classified by region, such as Eastern painting, Western painting, or the like, or a characteristic of a painting style, such as Impressionism, Abstractism, or Realism.
  • the style information may include information on brightness, hue, saturation, and the like, which are three elements of texture, color, atmosphere, contrast, gloss, or color of an image.
  • the style information may include information about a camera shooting technique.
  • the style information is information on whether the technique used when taking a picture is a panning shot, tilting shot, zooming shot, macro shot, night view shot, etc. can include
  • the style information may include, but is not limited to, composition of a subject, angle of view, degree of exposure, type of lens, degree of blurring, focal length, and the like.
  • the image information acquisition unit 311 may perform object detection on an image.
  • the image information acquisition unit 311 may detect an object from an image using image processing technology or artificial intelligence technology.
  • the image information acquisition unit 311 recognizes that an object exists in an image using a deep neural network (DNN) including two or more hidden layers, classifies what the object is, and locates the object.
  • DNN deep neural network
  • object detection can be performed.
  • the image information acquisition unit 311 may detect objects such as people, clouds, sky, and sea from an image.
  • the image information obtaining unit 311 may identify that the object is a person and that each object is an adult and a child.
  • the image information acquisition unit 311 may further acquire emotion information such as happiness and pleasure from the detected object type being a family composed of children and adults, a cloud, the sky, and the sea.
  • the image information acquisition unit 311 may identify a person's facial expression when the object is a person. For example, the image information acquisition unit 311 may detect a face from an image using at least one neural network. The image information acquisition unit 311 may extract a feature from the detected face and recognize a facial expression using the extracted feature. The image information obtaining unit 311 may infer emotion from the recognized facial expression. In FIG. 5 , the image information obtaining unit 311 may identify a person's expression included in the image and infer emotions such as joy or happiness from the expression of a smile.
  • the image information acquisition unit 311 may acquire additional information about an image as image information. Additional information about the image may include the size of the image, the year the image was produced, a description of the image, and the like.
  • the description of the image includes the creation history of the image, description of the subject or atmosphere of the image, description of the type, style, color, texture, etc. of the image, description of the photographer or painter who created the image, and description of the image created. It can include a variety of information, such as the place where it was created, the award history of the image, and so on.
  • the image information acquisition unit 311 may obtain additional information about an image together with the image.
  • the image information acquisition unit 311 may acquire additional information about an image by receiving it from a server (not shown) or a user terminal around the electronic device 100a.
  • the electronic device 100a may acquire additional information about an image by receiving it from a server or a user terminal together with the image.
  • the electronic device 100a may retrieve and obtain information about an image from a server that has received the image or a server separate from the user terminal. For example, if the image is a famous work, the electronic device 100a may receive only the title of the image together with the image, and obtain detailed additional information about the image from a separate server using the title.
  • the image information acquiring unit 311 may acquire various types of image information from an image. Also, the image information obtaining unit 311 may obtain emotion information corresponding to the image information.
  • FIG. 6 is an internal block diagram of a sound generating tag acquisition unit of FIG. 3 according to an embodiment.
  • a sound source generation tag acquisition unit 320 may include a tag mapping unit 321 , a database 322 , a tag filtering unit 323 , and a weight acquisition unit 324 .
  • the sound source creation tag acquisition unit 320 may include a database (data base, DB) 322.
  • the database 322 may store a plurality of pieces of information related to sound sources and tags mapped to the pieces of information in the form of data.
  • a tag may be an identifier that categorizes information, indicates boundaries, or indicates properties or identities of information. Tags can take the form of words, images or other identifying marks. One or more tags may be allocated and stored in the database 322 for each of a plurality of pieces of information related to a sound source. A tag is information used to effectively manage or retrieve a large amount of information, and can be assigned to each information and used to classify the information.
  • the tag mapping unit 321 may search for and use tags corresponding to information by using the characteristics of these tags.
  • the tag mapping unit 321 may receive sound source generation information from the sound source generation information acquisition unit 310 and retrieve and obtain sound source generation tags mapped to the sound source generation information from the database 322 . That is, the tag mapping unit 321 searches for the sound source generation information received from the sound source generation information acquisition unit 310 from the information stored in the database 322 and searches for sound source generation tags that are tags mapped to the sound source creation information. there is.
  • the tag mapping unit 321 may transmit the searched sound source generation tags to the tag filtering unit 323 .
  • the tag mapping unit 321 may transmit sound source generation information to an external server through a communication unit (not shown), and receive and obtain sound source generation tags mapped to the sound source generation information from the external server.
  • the weight acquisition unit 324 may obtain weights for each tag.
  • the weight for each tag may be information representing a user's preference for each tag.
  • the weight acquisition unit 324 may generate weights for each tag based on user preference information.
  • the weight acquisition unit 324 may receive sound source generation information for user taste information from the sound source generation information acquisition unit 310, digitize user preference for each tag using the received sound source generation information, and generate weights for each tag.
  • the tag filtering unit 323 may additionally update the weight for each tag according to the user's music listening history information, that is, music reproduction information.
  • the weight acquisition unit 324 uses music playback frequency, total listening level of music, playback stop level, fast-forward level, and skip level information as music play information, and updates the weight for each tag based on this. can For example, the weight acquisition unit 324 assigns a higher weight to tags used to generate the music for music that has been played to the end and has been repeatedly listened to by the user, and conversely, when the user stops or skips playback.
  • the weight for each tag can be updated by assigning a lower weight to the tags used to create the piece of music.
  • the weight acquisition unit 324 may generate a weight for each tag based on a situation.
  • the context-based weight for each tag may refer to a weight for each tag in consideration of a surrounding situation.
  • the weight acquisition unit 324 may generate a context-based weight for each tag by once again assigning a weight according to a surrounding situation to the weight for each tag. That is, the weight acquisition unit 324 assigns a weight according to the surrounding situation to tags used to generate the sound source with respect to music that the user prefers in a specific surrounding situation or a sound source that has been reproduced in a specific surrounding situation. By doing so, weights can be created for each tag based on the situation.
  • the weight acquisition unit 324 determines the specific value of a spring rainy morning.
  • a weight for each tag based on the situation may be generated by assigning a higher weight to tags used to generate music having the above characteristics.
  • the weight acquisition unit 324 may create and store weights for each tag and/or weights for each tag based on context in the form of a table.
  • the weight acquisition unit 324 may continuously update the weight for each tag and/or the weight for each tag based on the situation in consideration of the degree to which the user plays music.
  • the tag filtering unit 323 may filter the sound source generation tags received from the tag mapping unit 321 to obtain only tags to be directly used for sound source generation.
  • the tag filtering unit 323 may assign scores to sound source generation tags in order to filter the tags.
  • the tag filtering unit 323 may receive weights for each tag from the weight acquisition unit 324 to assign scores to tags for generating sound sources, and generate scores for each tag in consideration of the weight.
  • the tag filtering unit 323 may generate a score for each tag in consideration of the degree of overlap when there are overlapping tags among sound source generation tags.
  • the degree of overlap may be information indicating the degree of overlap when there are overlapping tags among tags mapped to sound source generation information. For example, if there are identical tags among the tags received from the tag mapping unit 321, the tag filtering unit 323 may determine the degree of overlap for each tag in consideration of the number of identical tags.
  • the tag filtering unit 323 may generate a score for each tag using various information other than the weight or overlap for each tag.
  • information used by the tag filtering unit 323 to assign scores for each tag may vary according to sound source generation information.
  • the tag filtering unit 323 may further consider the accuracy of the image recognition result in addition to the weight and overlap for each tag.
  • Accuracy of a result of recognizing an image may mean reliability of a result of performing object detection on an image. That is, the accuracy of the image recognition result may be information indicating how accurately an object is detected in the image.
  • the accuracy of the result of recognizing the image may be determined according to the weight of the object in the image. For example, if the objects in the image are a person's face and a car, the person's face occupies 70% of the entire image, and the car occupies 10% of the entire image, the accuracy of the image recognition result is 70% and 10% weighting can be given to each car.
  • the tag filtering unit 323 filters tags with high scores in consideration of at least one of the weight for each tag, the degree of redundancy, and the accuracy of the image recognition result for each of the sound generating tags mapped to the image information.
  • the tag filtering unit 323 may assign a score to each sound source generation tag mapped to the surrounding context information using a weight for each context-based tag.
  • the tag filtering unit 323 obtains weights for each context-based tag from the weight acquisition unit 324 in order to assign scores to sound source generation tags obtained based on surrounding context information, and uses the weights for each context-based tag. Scores can be generated for each matching tag.
  • the tag filtering unit 323 may acquire scores for each sound source generating tags mapped to surrounding context information based on the weight of each tag based on context, and filter tags with high scores.
  • the tag filtering unit 323 may filter out sound source creation tags having a high score among sound source creation tags.
  • the tag filtering unit 323 may filter the sound source generation tags so that the number of tags is less than or equal to the reference value when the number of the plurality of sound source generation tags is greater than or equal to the reference value.
  • the reference value may be determined in proportion to the number of sound source generation tags received from the tag mapping unit 321 or may be a predetermined number. For example, if there are 200 sound source generation tags mapped to sound source generation information, the tag filtering unit 323 may filter only 40 tags corresponding to 20%, which is a predetermined percentage, in order of highest scores.
  • the tag filtering unit 323 may filter only 30 tags, which are a predetermined number, in order of highest score among 100 sound source generation tags. Alternatively, the tag filtering unit 323 may filter only tags having a score equal to or higher than a predetermined score among a plurality of sound source generating tags. Alternatively, the tag filtering unit 323 may filter only tags with a predetermined score or more and within a predetermined number in order of high scores.
  • the tag filtering unit 323 may transfer the filtered tags to the sound source generating unit 330 .
  • the tag filtering unit 323 transmits tags obtained by filtering among sound source generation tags mapped to image information and tags obtained by filtering among sound source generation tags mapped to surrounding context information to the sound source generation unit 330. can be conveyed
  • the tag filtering unit 323 may further filter filtered sound source generation tags. For example, when the number of filtered sound source creation tags is still large, the tag filtering unit 323 may further filter the filtered sound source creation tags.
  • the tag filtering unit 323 combines tags obtained by filtering among sound source generation tags mapped to image information and tags obtained by filtering among sound source generation tags mapped to surrounding context information, and further filters them. Afterwards, only the filtered tags may be delivered to the sound source generating unit 330 .
  • the tag filtering unit 323 may filter sound source generation tags once more in order to generate sound sources of different music styles according to surrounding situation information.
  • the tag filtering unit 323 assigns different scores according to surrounding situation information obtained through a sensor or a server, for example, when the current surrounding situation is winter and the time zone is night and when it is midday in summer and sunlight is intense.
  • tags By assigning tags to sound source creation tags, other sound source creation tags can be filtered according to surrounding conditions.
  • the tag filtering unit 323 transmits the sound source generation tags filtered once more according to the surrounding environment to the sound source generation unit 330, so the sound source generation unit 330 has an atmosphere or genre more suitable for different surrounding environments. It is possible to create sound sources of different music styles with , tempo, equalizer, etc.
  • the tag filtering unit 323 may filter sound source generation tags once more according to user identification information. If the electronic device 100a includes a camera, the tag filtering unit 323 may filter other sound source generation tags according to users using user identification information recognized by the camera. For example, the tag filtering unit 323 generates different sound sources when the user is identified as a male in his teens and when the user is identified as a female in his 50s according to user identification information obtained using a camera or the like. For this purpose, different scores may be assigned to sound source creation tags.
  • the tag filtering unit 323 may obtain user profile information corresponding to the user account, and filter sound source creation tags more suitable for the user profile by using the acquired user profile information.
  • the tag filtering unit 323 filters different sound source generation tags according to generations or genders and delivers them to the sound source generation unit 330. etc. can create sound sources of different styles.
  • FIG. 7 is a diagram for explaining that the tag filtering unit of FIG. 6 filters tags in consideration of scores for each tag according to an embodiment.
  • the tag filtering unit 323 may assign scores to sound source generation tags and filter the sound source generation tags according to the scores.
  • the tag filtering unit 323 may list sound source generation tags, receive weights for each tag for each tag from the weight acquisition unit 324, and assign a weight to each tag.
  • the tag filtering unit 323 may assign a degree of overlap to each tag when there are overlapping tags among sound source generation tags.
  • the tag filtering unit 323 may additionally use other information according to the types of sound source generation tags in filtering tags. For example, when sound source generation tags are tags corresponding to image information, the tag filtering unit 323 may further consider the accuracy of object detection in addition to weight and overlap for each tag.
  • a first table 710 indicates that the tag filtering unit 323 assigns scores to sound source generation tags mapped to image information.
  • the first table 710 represents the weight, accuracy, and number of duplicates for each of the sound source generation tags, that is, Tag 1 to Tag 5.
  • Tag 1 to Tag 5 represent sound source generation tags mapped to image information.
  • the tag filtering unit 323 may calculate a score for each tag by considering the weight, accuracy, and overlap of each tag for each of the sound generating tags mapped to the image information.
  • the tag filtering unit 323 when calculating the score for each tag, the tag filtering unit 323 considers the weight, accuracy, and redundancy of each tag with the same weight or assigns different weights to each item to calculate the final score.
  • the second table 720 indicates that the tag filtering unit 323 assigns scores to sound source generation tags mapped to surrounding context information.
  • Tag 1 to Tag 4 represent sound source generation tags mapped to surrounding situation information.
  • the tag filtering unit 323 may generate a score for each tag suitable for the situation by considering the weight for each tag based on the situation received from the weight acquisition unit 324 for the sound source generation tags mapped to the surrounding situation information.
  • the tag filtering unit 323 may retrieve and use weights for each situation-based tag from the weight acquisition unit 324 in response to the specific situation of a spring rainy morning. there is. That is, the tag filtering unit 323 may retrieve weights for each tag based on the situation corresponding to Tag 1 to Tag 4 from the weight obtaining unit 324 and use them as scores for Tag 1 to Tag 4.
  • the tag filtering unit 323 may filter tags to be used directly for generating sound sources based on scores for each tag.
  • FIG. 8 is a diagram illustrating a relationship between sound source creation information and a tag according to an embodiment.
  • the electronic device may use the similarity between the sound source and the tag. For example, the electronic device may convert a sound source into a feature vector using a MelFrequency Cepstral Coefficient (MFCC) algorithm. The electronic device may cut the wave form in the frequency domain by time, take a Fourier transform for each piece, connect them, and extract the MFCC by grouping them by frequency band. The electronic device may measure similarity between sound sources based on MFCC.
  • MFCC MelFrequency Cepstral Coefficient
  • information that can be used for generating a sound source may be represented by dots on a graph 800 .
  • each dot may represent information related to a sound source.
  • the information related to the sound source is characteristics extracted from the sound source, and may include, for example, information such as genre, composition, code, melody, beat, time signature, and tempo.
  • distances between points may indicate a degree of relevance or similarity between information. That is, the closer the distance between points, the higher the relationship.
  • a graph 800 of FIG. 8 may be a graph of genres among features extracted from a sound source.
  • each dot may mean information having different genres according to a shape.
  • each of a triangle, a square, and a star shape may represent sound source-related information having different genres.
  • the X-axis and Y-axis values of the graph 800 may represent genres by classifying them according to types.
  • the genre of a sound source may change to soul, blues, jazz, and the like as you move to the right along the X-axis of the graph 800 .
  • the genre of a sound source may represent rock, hip hop, R&B, etc. as it goes upward along the Y-axis in the graph.
  • each graph may be generated for other information related to the sound source, such as composition, chord, melody, beat, time signature, tempo, and the like.
  • Tags may be mapped to information related to sound sources. Conversely, this may mean that information related to a sound source mapped to the tag can be searched using the tag.
  • the sound generator 330 may search for a sound source 810 corresponding to the combination of Tag 1 and Tag 4.
  • the sound source generator 330 may generate a sound source having a classical genre.
  • the sound source generator 330 may search for a sound source having a composition, code, melody, beat, time signature, tempo, etc. corresponding to the combination of Tag 1 and Tag 4, and generate a sound source by combining them.
  • FIG. 9 is a diagram for explaining a neural network learned to acquire a sound source from a tag according to an embodiment.
  • a process of generating a sound source from a tag using artificial intelligence may consist of two processes.
  • a neural network that is, a neural network model 912 may be trained by using a plurality of training data 911 as inputs.
  • the output data 913 that is a training result may be fed back to the neural network model 912 and used to update the weight of the neural network model 912 .
  • the neural network model 912 may learn and/or train a method of detecting data having different properties from the plurality of training data in response to input of a plurality of training data, based on the result of learning and/or training. can be created by
  • the training data 911 may include a plurality of tags and a sound source highly related to the tags.
  • a plurality of tags may be tags corresponding to various sound source generation information.
  • the neural network model 912 may be learned by grouping a set of sound sources with high relevance for each tag. That is, the neural network model 912 may be trained to acquire characteristic information such as composition, code, rhythm, mood, genre, etc. representing characteristics of tags, and generate a sound source from the obtained characteristic information.
  • the neural network model 912 may be a Generative Adversarial Network (GAN).
  • GAN Generative Adversarial Network
  • the neural network model 912 may learn by converting the sound source itself into an image.
  • the neural network model 912 may frequency-convert the sound source using a MelFrequency Cepstral Coefficient (MFCC) algorithm and obtain feature information from the sound source.
  • MFCC MelFrequency Cepstral Coefficient
  • the MFCC algorithm may be a technique of dividing a sound source into small frames of about 20 ms to 40 ms and extracting features by analyzing the spectrum of the divided frames.
  • the neural network model 912 may measure the similarity between sound sources using the similarity of waveforms between features in the frequency domain acquired using the MFCC algorithm.
  • the neural network model 912 may also measure the similarity between tags mapped to sound sources.
  • the neural network model 912 may receive domain information to be acquired and original data, that is, filtered tags as input values.
  • GAN can learn the mapping between all possible domains through one generator. For example, the GAN can receive both tags and domain information about the genre of a sound source as training data 911 and learn a sound source of an appropriate genre according to the tags.
  • the neural network model 912 may include a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and a neural network operation may be performed through an operation between an operation result of a previous layer and a plurality of weight values.
  • a plurality of weights possessed by a plurality of neural network layers may be optimized by a learning result of an artificial intelligence model. For example, a plurality of weights may be updated so that a loss value or a cost value obtained from an artificial intelligence model is reduced or minimized during a learning process.
  • the artificial neural network may include a deep neural network (DNN), for example, a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a Restricted Boltzmann Machine (RBM), A deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), or deep Q-networks, but is not limited to the above examples.
  • the neural network model 912 may include a mapping network.
  • the mapping network is non-linear, which can reduce biased correlations between features.
  • a mapping network may include a plurality of layers. Each layer may be represented by at least one node, and nodes between layers are connected by edges. Nodes may be fully connected to nodes included in previous and subsequent layers.
  • the neural network model 912 may obtain an intermediate vector by passing the input information through a mapping network.
  • the intermediate vector may be a weight containing tag attribute information. For example, if a feature vector extracted from attribute information relates to a feature corresponding to a genre of a sound source, the neural network model 912 may generate an intermediate vector having this feature. For example, if the feature vector extracted from the attribute information relates to a feature corresponding to an attribute related to the tempo of a sound source, the neural network model 912 may generate an intermediate vector having the feature of this tempo.
  • the neural network model 912 may synthesize output data by applying information about a sound source to each of a plurality of layers using the generated intermediate vector.
  • the neural network model 912 may receive tensors.
  • a tensor may be a data structure containing information of a deep learning model.
  • a tensor is base information on which properties of learning data are not reflected, and may be information corresponding to an average sound source.
  • a tensor may mean a layout of an additional information area having a basic sound source.
  • the neural network model 912 may include a plurality of layers starting with a tensor of 4X4X512 and ending with a tensor of 1024X1024X3. Each layer may be connected to the next layer through convolution and upsampling.
  • Weights may be input to respective layers of the neural network model 912 .
  • the neural network model 912 may be trained to express properties of each layer or characteristics of a sound source using an intermediate vector, that is, a weight.
  • the neural network model 912 may obtain result data based on features acquired from a lower level to an upper level.
  • the resulting data may be a sound source or a sheet music used to generate a sound source.
  • Each training result may be derived as output data 913 from the neural network model 912 .
  • Output data 913 can be used to update the weights of neural network model 912 .
  • the model may be used as the trained neural network model 922 .
  • the application data 921 may be input to the trained neural network model 922, and result data 923 may be obtained from the input application data 921.
  • the application data 921 are filtered sound source generation tags
  • the resultant data 923 output from the neural network model 922 may be a sound source or a score for generating a sound source.
  • An operation of learning how to acquire a sound source from tags filtered using the neural network model 912 may be performed by the electronic device 100a.
  • this learning operation may be performed in an external computing device separate from the electronic device 100a.
  • an operation of learning how to acquire a sound source from a tag using the neural network model 912 may require a relatively complex amount of computation.
  • the external computing device performs a learning operation, and the electronic device 100a receives the trained neural network model 912 from the external computing device, thereby reducing the amount of calculations to be performed in the electronic device 100a.
  • the electronic device 100a may receive the neural network model 912 from an external server, store it in a memory, and acquire a sound source from a tag using the stored neural network model 912 .
  • FIG 10 is an internal block diagram of an electronic device according to an embodiment.
  • the electronic device 1000 of FIG. 10 may be an example of the electronic device 100a of FIG. 2 .
  • the electronic device 1000 of FIG. 10 may include components of the electronic device 100a of FIG. 2 .
  • an electronic device 1000 includes a processor 210 and a memory 220, a tuner unit 1010, a communication unit 1020, a sensing unit 1030, an input/output unit 1040, and a video It may include a processing unit 1050, a display unit 1060, an audio processing unit 1070, an audio output unit 1080, and a user input unit 1090.
  • the tuner unit 1010 selects only the frequency of a channel to be received by the electronic device 1000 from many radio wave components through amplification, mixing, resonance, etc. of broadcasting contents received through wired or wireless means. It can be selected by tuning.
  • the content received through the tuner unit 1010 is decoded and separated into audio, video and/or additional information.
  • the separated audio, video and/or additional information may be stored in the memory 220 under the control of the processor 210 .
  • the communication unit 1020 may connect the electronic device 1000 to a peripheral device, an external device, a server, or a mobile terminal under the control of the processor 210 .
  • the communication unit 1020 may include at least one communication module capable of performing wireless communication.
  • the communication unit 1020 may include at least one of a wireless LAN module 1021, a Bluetooth module 1022, and a wired Ethernet 1023 corresponding to the performance and structure of the electronic device 1000.
  • the Bluetooth module 1022 may receive a Bluetooth signal transmitted from a peripheral device according to the Bluetooth communication standard.
  • the Bluetooth module 1022 may be a Bluetooth Low Energy (BLE) communication module and may receive a BLE signal.
  • the Bluetooth module 1022 may continuously or temporarily scan a BLE signal to detect whether a BLE signal is received.
  • the wireless LAN module 1021 may transmit and receive Wi-Fi signals with neighboring devices according to Wi-Fi communication standards.
  • the communication unit 1020 uses a communication module to provide various information indicating an external situation from an external device or server, such as information on weather, time, date, etc., or user profile information linked to a user account. may be obtained and transmitted to the processor 210.
  • the sensing unit 1030 detects a user's voice, a user's image, or a user's interaction, and may include a microphone 1031, a camera unit 1032, a light receiving unit 1033, and a sensing unit 1034.
  • the microphone 1031 may receive an audio signal including a user's utterance or noise, convert the received audio signal into an electrical signal, and output the converted electrical signal to the processor 210 .
  • the camera unit 1032 may include a sensor (not shown) and a lens (not shown), take and capture an image formed on the screen, and transmit it to the processor 210 .
  • the light receiving unit 1033 may receive light signals (including control signals).
  • the light receiving unit 1033 may receive an optical signal corresponding to a user input (eg, touch, pressure, touch gesture, voice, or motion) from a control device such as a remote controller or a mobile phone.
  • a user input eg, touch, pressure, touch gesture, voice, or motion
  • the sensing unit 1034 may detect a state around the electronic device 100a and transmit the sensed information to the communication unit 1020 or the processor 210 .
  • the sensing unit 1034 may include, for example, at least one of a temperature/humidity sensor, a presence sensor, an illuminance sensor, a location sensor (eg, GPS), a pressure sensor, and a proximity sensor, but is not limited thereto.
  • the input/output unit 1040 receives video (e.g., a moving image signal or still image signal), audio (e.g., audio signal) from an external device of the electronic device 1000 under the control of the processor 210. B, music signal, etc.) and additional information can be received.
  • video e.g., a moving image signal or still image signal
  • audio e.g., audio signal
  • B music signal, etc.
  • the input/output unit 1040 includes one of a High-Definition Multimedia Interface port (1041), a component jack (1042), a PC port (1043), and a USB port (1044). can include The input/output unit 1040 may include a combination of an HDMI port 1041 , a component jack 1042 , a PC port 1043 , and a USB port 1044 .
  • the video processing unit 1050 processes image data to be displayed by the display unit 1060, and performs various image processing operations such as decoding, rendering, scaling, noise filtering, frame rate conversion, and resolution conversion for image data. can do.
  • the display unit 1060 may display content received from a broadcasting station, an external server, or an external storage medium on a screen.
  • the content is a media signal and may include a video signal, an image, a text signal, and the like.
  • the display unit 1060 may be used as an input device such as a user interface in addition to an output device.
  • the display unit 1060 may include a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, It may include at least one of a 4D display and an electrophoretic display. Also, depending on the implementation form of the display unit 1060, two or more display units 1060 may be included.
  • the audio processing unit 1070 processes audio data.
  • the audio processing unit 1070 may perform various processes such as decoding or amplifying audio data and filtering noise.
  • the audio output unit 1080 controls audio included in the content received through the tuner unit 1010 under the control of the processor 210, audio input through the communication unit 1020 or the input/output unit 1040, and memory ( 220) can output audio stored in it.
  • the audio output unit 1080 may include at least one of a speaker 1081, headphones 1082, and a Sony/Philips Digital Interface (S/PDIF) 1083.
  • S/PDIF Sony/Philips Digital Interface
  • the audio output unit 1080 may play and output music according to the sound source generated by the processor 210 .
  • the user input unit 1090 may receive a user input for controlling the electronic device 1000 .
  • the user input unit 1090 includes a touch panel that detects a user's touch, a button that receives a user's push manipulation, a wheel that receives a user's rotation manipulation, a keyboard, and a dome switch, and voice recognition. It may include various types of user input devices including a microphone for sensing motion, a motion sensor for sensing motion, and the like, but is not limited thereto.
  • the user input unit 1090 can receive a control signal received from the mobile terminal.
  • FIG. 11 is a flowchart illustrating a method of generating a sound source according to an embodiment.
  • the electronic device may acquire sound source generation information (step 1110).
  • the sound source generation information may include at least one of image information, surrounding situation information, and user taste information.
  • the electronic device may acquire sound source creation tags mapped to sound source creation information (step 1120).
  • the electronic device may search for information mapped to sound source creation information among information related to sound sources stored in an internal database or an external server, and search and obtain tags mapped to sound source creation information.
  • the electronic device may create a sound source based on sound source generation tags (step 1130).
  • the electronic device may input sound source generation tags to the neural network and acquire the sound source from the neural network by using the neural network that has learned the relationship between the tag and the sound source.
  • FIG. 12 is a flowchart illustrating a method of filtering sound source generation tags according to an embodiment.
  • the electronic device may obtain points for each sound source generation tags (step 1210).
  • the electronic device may obtain scores for sound source generation tags based on at least one of the degree of redundancy for each tag, the accuracy of the recognition result, the weight for each tag, and the weight for each context-based tag.
  • the electronic device may filter sound source generation tags having high scores (step 1220).
  • the electronic device may filter sound source generation tags having scores equal to or higher than a reference value among sound source generation tags, or may filter a predetermined number of sound source generation tags in order of high scores.
  • the electronic device may additionally filter the filtered sound source generation tags based on at least one of surrounding situation information and user identification information.
  • the electronic device may create a sound source using the filtered sound source creation tags (step 1230).
  • the electronic device may input the filtered tags to the neural network that has learned the relationship between tags and sound sources, and obtain sound sources corresponding to the filtered tags from the neural network.
  • FIG. 13 is a flowchart illustrating a method of generating a sound source by filtering sound source generation tags for each sound source generation information according to an embodiment.
  • the electronic device may acquire scores for each sound source generation tags mapped to image information (step 1310).
  • the electronic device may obtain sound source creation tags mapped to image information acquired from an image, among sound source creation information.
  • the electronic device may acquire points for each sound source generation tags mapped to image information.
  • the electronic device sets a score for each sound source generation tags mapped to the image information based on at least one of the accuracy of the recognition result, the degree of overlap for each tag, and the weight for each tag.
  • the electronic device may filter sound source creation tags having high scores among sound source creation tags mapped to image information.
  • the electronic device may obtain the first tags (operation 1320).
  • the electronic device may acquire scores for each sound source generation tags mapped to surrounding context information (step 1330).
  • the electronic device may acquire sound source creation tags mapped to the surrounding context information among the sound source creation information, and obtain scores for each sound source creation tags mapped to the surrounding context information.
  • the electronic device may obtain a weight for each context-based tag representing user preference according to the context, and based on the weight for each context-based tag, score for each sound generating tag mapped to surrounding context information.
  • the electronic device may acquire scores for each sound source generation tags mapped to surrounding context information by further considering the degree of redundancy for each tag in addition to the weight for each tag based on context.
  • the electronic device may filter out sound source creation tags having a high score among sound source creation tags mapped to surrounding context information.
  • the electronic device may acquire the second tags (operation 1340).
  • the electronic device may generate a sound source using at least one of the first tags and the second tags (step 1340).
  • the electronic device may merge the first tags and the second tags and filter them once more according to scores.
  • the electronic device may filter only overlapping tags among the first tags and the second tags.
  • the electronic device may obtain final tags by considering both the first tags and the second tags in various ways, and generate a sound source based on the final tags.
  • FIG. 14 is a flowchart illustrating a method of obtaining a weight for each tag according to an embodiment.
  • the electronic device may obtain a weight for each tag (1410).
  • the weight for each tag may be information representing a user's preference for each tag.
  • the electronic device may generate weights for each tag based on at least one of user taste information and music playback history. For example, if the user's music playback history is not present or insufficient, the electronic device may generate weights for each tag only based on user taste information.
  • the electronic device may play music according to the sound source (step 1420).
  • the electronic device may update the weight for each tag according to music reproduction (step 1430).
  • the electronic device may update the weight of each tag by increasing the weight of the tags used to generate the specific piece of music.
  • the electronic device may assign a score for each tag using the updated weight for each tag.
  • the electronic device can generate different music according to music play information by filtering tags using scores for each tag that are changed according to the updated weight for each tag.
  • Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism, and includes any information delivery media.
  • unit may be a hardware component such as a processor or a circuit, and/or a software component executed by the hardware component such as a processor.
  • the electronic device and its operating method include obtaining sound source generation information including at least one of image information, surrounding situation information, and user taste information, and mapping the sound source generation information to the sound source generation information.
  • the device-readable storage medium may be provided in the form of a non-transitory storage medium.
  • 'non-transitory storage medium' only means that it is a tangible device and does not contain signals (e.g., electromagnetic waves), and this term refers to the case where data is semi-permanently stored in the storage medium and temporary It does not discriminate if it is saved as .
  • the 'non-temporary storage medium' may include a buffer in which data is temporarily stored.
  • the method according to various embodiments disclosed in this document may be included and provided in a computer program product.
  • Computer program products may be traded between sellers and buyers as commodities.
  • a computer program product is distributed in the form of a device-readable storage medium (eg compact disc read only memory (CD-ROM)), or via an application store or between two user devices (eg smartphones). It can be distributed (e.g., downloaded or uploaded) directly or online.
  • a computer program product eg, a downloadable app
  • a device-readable storage medium such as a memory of a manufacturer's server, an application store server, or a relay server. It can be temporarily stored or created temporarily.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé de fonctionnement d'un dispositif électronique, le procédé comprenant les étapes consistant à : obtenir des informations de génération de piste musicale comprenant des informations d'image et/ou des informations de situation environnante et/ou des informations de goût d'utilisateur; obtenir des étiquettes de génération de piste musicale mappées sur les informations de génération de piste musicale; et générer une piste musicale sur la base des étiquettes de génération de piste musicale.
PCT/KR2022/021048 2022-01-07 2022-12-22 Dispositif électronique et son procédé de fonctionnement WO2023132534A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2022-0002953 2022-01-07
KR1020220002953A KR20230107042A (ko) 2022-01-07 2022-01-07 전자 장치 및 그 동작 방법

Publications (1)

Publication Number Publication Date
WO2023132534A1 true WO2023132534A1 (fr) 2023-07-13

Family

ID=87073727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/021048 WO2023132534A1 (fr) 2022-01-07 2022-12-22 Dispositif électronique et son procédé de fonctionnement

Country Status (2)

Country Link
KR (1) KR20230107042A (fr)
WO (1) WO2023132534A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008186444A (ja) * 2007-01-05 2008-08-14 Yahoo Japan Corp 感性マッチング方法、装置及びコンピュータ・プログラム
KR20130103243A (ko) * 2012-03-09 2013-09-23 (주)네오위즈게임즈 음성 인식을 이용한 음악 선곡 서비스 제공 방법 및 장치
US20140074269A1 (en) * 2012-09-11 2014-03-13 Google Inc. Method for Recommending Musical Entities to a User
KR20140136592A (ko) * 2013-05-20 2014-12-01 동덕여자대학교 산학협력단 사용자 청취 습관과 태그 정보를 이용한 음악 추천 시스템 및 방법
KR20200038688A (ko) * 2018-10-04 2020-04-14 서희 음악 서비스 제공 장치 및 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008186444A (ja) * 2007-01-05 2008-08-14 Yahoo Japan Corp 感性マッチング方法、装置及びコンピュータ・プログラム
KR20130103243A (ko) * 2012-03-09 2013-09-23 (주)네오위즈게임즈 음성 인식을 이용한 음악 선곡 서비스 제공 방법 및 장치
US20140074269A1 (en) * 2012-09-11 2014-03-13 Google Inc. Method for Recommending Musical Entities to a User
KR20140136592A (ko) * 2013-05-20 2014-12-01 동덕여자대학교 산학협력단 사용자 청취 습관과 태그 정보를 이용한 음악 추천 시스템 및 방법
KR20200038688A (ko) * 2018-10-04 2020-04-14 서희 음악 서비스 제공 장치 및 방법

Also Published As

Publication number Publication date
KR20230107042A (ko) 2023-07-14

Similar Documents

Publication Publication Date Title
WO2020105948A1 (fr) Appareil de traitement d'images et son procédé de commande
WO2018128362A1 (fr) Appareil électronique et son procédé de fonctionnement
WO2017007206A1 (fr) Appareil et procédé de fabrication d'une vidéo relationnelle avec le spectateur
EP3545436A1 (fr) Appareil électronique et son procédé de fonctionnement
WO2016117836A1 (fr) Appareil et procédé de correction de contenu
WO2020214006A1 (fr) Appareil et procédé de traitement d'informations d'invite
WO2021261836A1 (fr) Appareil de détection d'image et procédé de fonctionnement de celui-ci
WO2019139301A1 (fr) Dispositif électronique et procédé d'expression de sous-titres de celui-ci
WO2019124963A1 (fr) Dispositif et procédé de reconnaissance vocale
WO2020235852A1 (fr) Dispositif de capture automatique de photo ou de vidéo à propos d'un moment spécifique, et procédé de fonctionnement de celui-ci
WO2015174743A1 (fr) Appareil d'affichage, serveur, système et leurs procédés de fourniture d'informations
EP3532990A1 (fr) Appareil de construction de modèle de reconnaissance de données et procédé associé pour construire un modèle de reconnaissance de données, et appareil de reconnaissance de données et procédé associé de reconnaissance de données
WO2021029497A1 (fr) Système d'affichage immersif et procédé associé
WO2019054792A1 (fr) Procédé et terminal de fourniture de contenu
WO2020060130A1 (fr) Appareil d'affichage et procédé de commande associé
WO2021132922A1 (fr) Dispositif informatique et procédé de fonctionnement associé
WO2021085812A1 (fr) Appareil électronique et son procédé de commande
WO2020050508A1 (fr) Appareil d'affichage d'image et son procédé de fonctionnement
WO2015170799A1 (fr) Procédé et dispositif de fourniture de message
WO2021251632A1 (fr) Dispositif d'affichage pour générer un contenu multimédia et procédé de mise en fonctionnement du dispositif d'affichage
WO2019190142A1 (fr) Procédé et dispositif de traitement d'image
WO2019088627A1 (fr) Appareil électronique et procédé de commande associé
WO2021075705A1 (fr) Dispositif électronique et son procédé de commande
EP3707678A1 (fr) Procédé et dispositif de traitement d'image
WO2023132534A1 (fr) Dispositif électronique et son procédé de fonctionnement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919071

Country of ref document: EP

Kind code of ref document: A1