WO2022041182A1 - 音乐推荐方法和装置 - Google Patents

音乐推荐方法和装置 Download PDF

Info

Publication number
WO2022041182A1
WO2022041182A1 PCT/CN2020/112414 CN2020112414W WO2022041182A1 WO 2022041182 A1 WO2022041182 A1 WO 2022041182A1 CN 2020112414 W CN2020112414 W CN 2020112414W WO 2022041182 A1 WO2022041182 A1 WO 2022041182A1
Authority
WO
WIPO (PCT)
Prior art keywords
attention
unit
user
duration
music
Prior art date
Application number
PCT/CN2020/112414
Other languages
English (en)
French (fr)
Inventor
方舒
张立斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080092641.8A priority Critical patent/CN114930319A/zh
Priority to EP20950851.4A priority patent/EP4198772A4/en
Priority to PCT/CN2020/112414 priority patent/WO2022041182A1/zh
Publication of WO2022041182A1 publication Critical patent/WO2022041182A1/zh
Priority to US18/175,097 priority patent/US20230206093A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • G06F16/436Filtering based on additional data, e.g. user or group profiles using biological or physiological data of a human being, e.g. blood pressure, facial expression, gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path

Definitions

  • the present application relates to the field of artificial intelligence, and more particularly, to a music recommendation method and apparatus.
  • Personalized music recommendation technology can enhance the user's music experience.
  • the traditional method is to recommend music through data mining technology based on the user's historical music playback information.
  • This approach fails to take into account the user's current state information.
  • Some current methods can collect the user's current state information through different sensors, such as recommending related music by sensing environmental information, including location, weather, time, season, ambient sound, and ambient pictures; or by measuring the user's current status For example, by collecting brain waves to analyze the user's current psychological state, or collecting the pictures the user sees, or obtaining the user's heart rate, etc., to recommend related music.
  • music recommendation is performed after collecting images seen by the user according to shooting, which involves a process of matching music and images.
  • the environment may contain many scenes. If music is recommended only based on the image as a whole, the matching degree of music will be reduced.
  • the present application provides a music recommendation method and device, which can determine the user's attention pattern in a complex environment through the user's viewpoint information, so as to match the music more accurately.
  • a music recommendation method comprising: receiving visual data of a user; acquiring at least one attention unit and an attention duration of the at least one attention unit according to the visual data; The attention duration of the user determines the attention mode of the user; and the recommended music information is determined according to the attention mode.
  • the user's attention pattern is determined according to the user's visual information, which can more accurately determine the user's attention content, thereby recommending more suitable music, so that the recommended music is in line with the things that the user is really interested in.
  • the user's real behavioral state improves the user's experience.
  • the visual data includes viewpoint information of the user and picture information viewed by the user, and the viewpoint information includes the position of the viewpoint and the position of the viewpoint. Pay attention to the duration.
  • acquiring at least one attention unit and an attention duration of the at least one attention unit according to the visual data includes: acquiring the at least one attention unit according to the picture information An attention unit; acquiring the sum of attention durations of the viewpoints in the at least one attention unit as the attention duration of the at least one attention unit.
  • the initial attention unit is determined according to the acquired picture information, and the duration of each attention unit is determined according to the user's viewpoint information.
  • the viewpoint information can accurately represent where the user is interested in the attention content, so that the recommended music can be more in line with the user's needs.
  • acquiring at least one attention unit and an attention duration of the at least one attention unit according to the visual data further includes: judging whether the at least one attention unit is in the at least one attention unit.
  • the similarity of the first attention unit and the second attention unit, the first attention unit and the second attention unit are attention units at different times; if the similarity is greater than or equal to the first threshold, the second attention unit
  • the attention duration of the attention unit is equal to the sum of the attention duration of the first attention unit and the attention duration of the second attention unit.
  • the first attention unit and the second attention unit may be attention units in frame images at different times within a preset time period, or may be a history library and a newly acquired attention unit, respectively.
  • determining the attention mode of the user according to the attention duration of the at least one attention unit includes: if the standard of the attention duration of the at least one attention unit is If the difference is greater than or equal to the second threshold, it is determined that the user's attention mode is staring; if the standard deviation of the attention duration of the at least one attention unit is less than the second threshold, the user's attention mode is determined to be scanning.
  • determining the music information according to the attention mode includes: if the attention mode is scan look, determining the music information according to the picture information; The attention mode is staring, and music information is determined according to the attention unit with the highest attention degree among the attention units.
  • the music information suitable for recommending to the user within the preset period of time can be determined according to the user's attention pattern within the preset period of time.
  • the user's attention mode is scanning, it is considered that the user is mainly perceiving the environment during this preset period of time, and music can be recommended according to the picture information (environment);
  • the user's attention mode is staring, it is considered that the user is During this preset period of time, the object of interest is mainly perceived, and music can be recommended according to the attention unit (object of interest) with the highest attention.
  • determining the music information according to the attention pattern further includes: determining, according to the attention pattern, the user's music information at each moment in the first time period. behavior state; determine the behavior state of the user in the first time period according to the state at each moment; determine music information according to the behavior state in the first time period.
  • the music information may also be determined first, but the behavior state of the user during the preset period of time is determined. , and then determine the total behavioral state of the user in the first time period according to the behavioral state of multiple preset time periods, which can more accurately determine the actual behavioral state of the user, and recommend music according to the total behavioral state, making the recommended music more accurate. In line with the actual behavior of the user.
  • an apparatus for music recommendation includes: a transceiver module for receiving visual data of a user; a determination module for acquiring at least one attention unit and the at least one attention unit according to the visual data The determining module is further configured to determine the user's attention mode according to the attention duration of the at least one attention unit; the determining module is further configured to determine recommended music information according to the attention mode.
  • Embodiments of the present application provide a device for music recommendation, which is used to implement the method for music recommendation in the first aspect.
  • the visual data includes viewpoint information of the user and picture information viewed by the user, and the viewpoint information includes the position of the viewpoint and the position of the viewpoint. Pay attention to the duration.
  • the determining module acquires at least one attention unit and an attention duration of the at least one attention unit according to the visual data, including: acquiring all the attention units according to the picture information.
  • the at least one attention unit is obtained; the sum of the attention durations of the viewpoints in the at least one attention unit is obtained as the attention duration of the at least one attention unit.
  • the determining module acquires at least one attention unit and an attention duration of the at least one attention unit according to the visual data, and further includes: judging the at least one attention unit The similarity between the first attention unit and the second attention unit in the unit, the first attention unit and the second attention unit are attention units at different times; if the similarity is greater than or equal to the first threshold, the The attention duration of the second attention unit is equal to the sum of the attention duration of the first attention unit and the attention duration of the second attention unit.
  • the determining module determines the attention mode of the user according to the attention duration of the at least one attention unit, including: if the attention duration of the at least one attention unit If the standard deviation of the at least one attention unit is less than or equal to the second threshold, it is determined that the user's attention mode is staring; if the standard deviation of the attention duration of the at least one attention unit is less than the second threshold, it is determined that the user's attention mode is scanning. .
  • the determining module is configured to determine the music information according to the attention mode, including: if the attention mode is scan look, determining the music information according to the picture information ; If the attention mode is staring, music information is determined according to the attention unit with the highest attention degree among the attention units.
  • the determining module determines the music information according to the attention pattern, further comprising: determining, according to the attention pattern, each The behavior state of the moment; the behavior state of the user in the first time period is determined according to the state of each moment; the music information is determined according to the behavior state in the first time period.
  • a computer-readable storage medium is provided, and program instructions are stored in the computer-readable storage medium.
  • program instructions are executed by a processor, any one of the first aspect and the first aspect can be implemented. method of implementation.
  • a computer program product includes computer program code, when the computer program code is run on a computer, to implement the first aspect and the method of any one of the implementation manners of the first aspect .
  • a fifth aspect provides a music recommendation system, the system includes a data collection device and a terminal device, the terminal device includes a processor and a memory, the memory stores one or more programs, the one or more The computer program includes instructions, wherein the data acquisition device is used to acquire visual data of the user; when the instructions are executed by the one or more processors, the terminal device is caused to perform the above-mentioned first aspect and the first A method of any implementation of the aspect.
  • Fig. 1 is the system architecture of the music recommendation method application of the embodiment of the present application
  • FIG. 2 is a schematic block diagram of a first wearable device in a system architecture to which the music recommendation method according to an embodiment of the present application is applied;
  • FIG. 3 is a schematic block diagram of a terminal device in a system architecture to which the music recommendation method according to an embodiment of the present application is applied;
  • FIG. 4 is a schematic block diagram of a second wearable device in a system architecture to which the music recommendation method according to an embodiment of the present application is applied;
  • FIG. 5 is a schematic flowchart of a music recommendation method according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a music recommendation method according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a music recommendation apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a music recommendation device according to an embodiment of the present application.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • the existing image and application matching methods mainly include: firstly, extracting the traditional underlying features of the two modalities of music and image, and then establishing the connection between the two through the relational model, the recommended music and image matching degree of this method is not high; secondly It first collects the matching data of music and image, and automatically learns the matching model of music and image based on deep neural network. This method can recommend suitable music in simple scenarios.
  • the environment may contain many scenes and different style elements, and the above existing methods do not consider the user's interest in the current environment, which reduces the music matching degree. For example, when the user pays attention to the clouds in the scene, and pays attention to the animals in the scene, the matching music should be different.
  • the embodiment of the present application provides a music recommendation method, which obtains the user's attention area in a complex environment by acquiring the user's viewpoint information, so as to know the real interest of the user in the current environment and improve the music matching degree.
  • FIG. 1 shows a system architecture of an application of a music recommendation method according to an embodiment of the present application.
  • the first wearable device is a wearable device that can collect user visual data and record user head movement data, such as smart glasses, etc., on which are installed an advanced photo system (APS) camera, a dynamic vision sensor ( dynamic vision sensor, DVS) cameras, eye trackers, and inertial measurement unit (IMU) sensors.
  • the second wearable device is a wearable device that can play music, such as headphones.
  • the mobile terminal device can be a mobile phone, a tablet computer, a wearable device (for example, a smart watch), a vehicle-mounted device, an augmented reality (AR) device, a virtual reality (VR) device, a notebook computer, a super mobile personal computer (ultra-mobile personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA) and other devices.
  • the terminal device in this embodiment of the present application may include a touch screen for displaying service content to a user.
  • the specific type of the terminal device is not limited in any way in this embodiment of the present application.
  • the above is only an example of the device in FIG. 1 in the embodiment of the present application, and does not constitute a limitation on the embodiment of the present application.
  • the device in FIG. 1 in the embodiment of the present application may also be other devices that can implement the same function. device of.
  • the mobile terminal device sends a data collection instruction to the first wearable device.
  • the first wearable device collects full-frame data at a certain frequency, records screen change data, and simultaneously records user viewpoint data and partial screen data, as well as head rotation acceleration and angle data, and continuously sends them to the mobile terminal device.
  • the mobile terminal device determines the user's attention area and attention pattern, extracts corresponding features according to the user's attention pattern and attention area, and matches the music.
  • the mobile terminal device sends audio data to the second wearable device, and the second wearable device plays music.
  • FIG. 2 shows the modules included in the first wearable device when the music recommendation method according to the embodiment of the present application is applied.
  • the wireless module is used to establish a wireless link and communicate with other nodes, wherein the wireless communication can adopt communication methods such as wifi, bluetooth and cellular network.
  • the video frame collection module is used to drive the APS camera on the first wearable device to collect video frames describing the environment.
  • the viewpoint acquisition module is used to drive the eye tracker on the glasses to collect viewpoint data.
  • the viewpoint data includes viewpoint position, acquisition time, gaze time, and pupil diameter;
  • the head motion acquisition module is used to drive the IMU module on the glasses to collect the speed and acceleration of the head rotation.
  • the screen change capture module is used to drive the DVS camera on the glasses to collect screen change data.
  • the data receiving module is used to receive the data sent by the mobile terminal device.
  • the data sending module is used to send the collected data to the mobile terminal device.
  • FIG. 3 shows the modules included in the mobile terminal device when the music recommendation method according to the embodiment of the present application is applied.
  • the wireless module is used to establish a wireless link and communicate with other nodes, wherein the wireless communication can adopt communication methods such as wifi, bluetooth and cellular network.
  • the attention pattern discriminating module is used to calculate the attention area and attention pattern according to the data collected by the glasses.
  • Feature extraction and music matching modules are used to extract features and match music based on attention pattern categories.
  • the data receiving module is used for receiving data sent from the first wearable device.
  • the data sending module is used for sending the audio data and playing instructions of the music to the second wearable device.
  • FIG. 4 shows the modules included in the second wearable device when the music recommendation method according to the embodiment of the present application is applied.
  • the wireless module is used to establish a wireless link and communicate with other nodes, wherein the wireless communication can adopt communication methods such as wifi, bluetooth and cellular network.
  • the data receiving module is used for receiving audio data and playing instructions sent by the mobile terminal device.
  • the audio playback module is used to play music according to the audio data and playback instructions sent by the mobile terminal device.
  • FIG. 5 shows a schematic flowchart of a music recommendation method according to an embodiment of the present application, including steps 501 to 504, and these steps will be introduced in detail below.
  • the music recommendation method in FIG. 5 may be executed by the terminal device in FIG. 1 .
  • the terminal device receives the user's visual data sent by the first wearable device, and the first wearable device collects the user's visual data within a preset period of time (for example, 1 second), where the user's visual data includes the user's viewpoint Information and the picture information viewed by the user, the viewpoint information includes the position coordinates (x, y) of the viewpoint and the attention duration of the viewpoint; the picture information includes the video frame image collected by the APS camera and the screen change data collected by the DVS camera.
  • a preset period of time for example, 1 second
  • the user's visual data includes the user's viewpoint Information and the picture information viewed by the user
  • the viewpoint information includes the position coordinates (x, y) of the viewpoint and the attention duration of the viewpoint
  • the picture information includes the video frame image collected by the APS camera and the screen change data collected by the DVS camera.
  • S502 Acquire at least one attention unit and an attention duration of the at least one attention unit according to the visual data.
  • At least one attention unit is acquired according to the picture information, for example, a macroblock in a video frame image is used as an attention unit, wherein the macroblocks may be overlapping or non-overlapping;
  • the algorithm of an object (such as the objectness algorithm) extracts one or more object rectangles as the attention unit; it can also obtain the moving rectangles at different times according to the picture change data, and then use the moving rectangles as the attention unit.
  • each attention unit may use the image data at the same position as the attention unit in the frame image at the latest moment as the content of the attention unit.
  • the DVS camera does not collect the picture change data, and the attention duration of the attention unit can be voted for the attention unit according to all viewpoints, that is, when a viewpoint is located in an attention unit , then the attention duration of the viewpoint is accumulated into the fixation duration of the attention unit.
  • acquiring at least one attention unit and the attention duration of the at least one attention unit according to the visual data further includes: when the screen viewed by the user changes, and the DVS camera collects screen change data, then a frame of image The attention units in the frame image are still voted according to the above method, so that the attention duration of the attention units in each frame image can be obtained.
  • N attention units of the attention units For the attention units in the images of any two adjacent moments, take an attention unit in the image at the next moment as an example, and name it as the second attention unit, find that the distance from the second attention unit in the image at the previous moment is less than the preset value N attention units of , where the distance between the attention units is the Euclidean distance between the center coordinates of the two attention units, and N can be an artificially specified value or the maximum number of attention units that satisfy the conditions.
  • the feature matching method of the first attention unit and the second attention unit may be any existing image feature matching method, which is not specifically limited in this embodiment of the present application.
  • the attention duration of the second attention unit equal to the sum of the attention duration of the first attention unit and the attention duration of the second attention unit, then let the attention duration of the first attention unit be zero; if it is determined that the features of the first unit and the second unit are not similar, that is, the first attention unit If the similarity with the second attention unit is less than the first threshold, the attention durations of the first unit and the second unit are retained.
  • the attention units in any two adjacent time images are determined according to the above method.
  • acquiring at least one attention unit and the attention duration of the at least one attention unit according to the visual data further includes: establishing a history library of attention units, and the size of the history library is fixed, for example, only 10 attention units can be stored. Judging the similarity between the newly acquired attention unit and the attention unit in the historical database, for example, judging the similarity between the newly acquired second attention unit and the first attention unit in the historical database, the first attention unit and the second attention unit can be extracted respectively. The visual features of the unit, and then the similarity between the visual features is calculated.
  • the attention duration of the second attention unit is equal to the attention of the first attention unit
  • the sum of the duration and the attention duration of the second attention unit and then replace the first attention unit with the second attention unit and store it in the history database; if it is determined that the characteristics of the first unit and the second unit are not similar, that is, the first attention unit and the If the similarity of the two attention units is less than the third threshold, the first attention unit in the history database is retained.
  • the attention units in the history library and the attention time of each attention unit can be obtained within a preset period of time, for example, 1 second, and then the user's attention pattern in this 1 second is determined according to the method in S503. Then delete the attention units that exist for more than 1 second and the attention time is less than 600 milliseconds in the historical database, and supplement the newly acquired attention units.
  • S503 Determine the user's attention mode according to the attention duration of the at least one attention unit.
  • the standard deviation of the attention durations of all attention units is greater than or equal to the second threshold, it is determined that the user's attention mode is staring; if the standard deviation of the attention durations of all attention units is less than the second threshold If the threshold is set, it is determined that the user's attention mode is scanning.
  • the frame image captured by the APS camera is directly used as the user's attention content; if the user's attention mode is staring, the highest attention unit among all attention units in a preset period of time will be used as the user's attention content.
  • the attention unit serves as the user's attention content.
  • the degree of attention can be determined according to the attention duration, for example, the attention unit with the longest attention time is regarded as the attention unit with the highest degree of attention; or it can be determined according to the degree of pupil dilation of the user, for example, the attention unit with the largest pupil dilation degree of the user is regarded as the attention unit.
  • the attention unit with the highest degree of attention or it can be determined according to the number of times the user revisits, for example, after the user looks at an attention unit many times, if the number of times is greater than the preset value, the attention unit is regarded as the attention unit.
  • the attention unit with the highest degree; or the attention degree of the attention unit is estimated by considering the three at the same time. For example, the attention degree is the product of the user's pupil dilation degree multiplied by the attention duration multiplied by the number of repetitions.
  • the music information according to the attention content which can be the existing method of matching music according to the image, for example, the attention content (frame image or the attention unit with the highest attention) is used as the input of the neural network model,
  • the music category with the largest probability value output by the network model is used as the judgment result. For example, when the probability value is greater than 0.8, it is considered that the matching degree between the image and the music is high enough.
  • the music information may also be determined first, but the behavior state of the user within the preset period of time is determined.
  • the method of determining the behavior state according to the attention content can use the existing classification machine learning method, for example, the attention content is used as the input of the neural network model, and then the behavior state category with the largest output probability value of the neural network model is used as the judgment result.
  • behavioral states include driving, learning, traveling, and sports.
  • the behavior states of the user at multiple preset times within the first time period can be determined. For example, when the first time period is 10 seconds and a preset time period is 1 second, it can be determined that the user has 10 behaviors within the 10 seconds.
  • the music is matched according to the behavior state of the user in the first time period.
  • the method of matching the music according to the behavior state may be an existing method.
  • the music may be matched according to the tag information of the behavior state, which is not made in this embodiment of the present application. Specific restrictions.
  • the terminal device can send a music playing instruction to the second wearable device according to the music information, and the second wearable device plays the specified music.
  • the terminal device can also play music according to the music information.
  • the user's attention pattern is determined according to the user's visual information, which can more accurately determine the user's attention content, thereby recommending more suitable music, so that the recommended music is in line with the things that the user is really interested in.
  • the user's real behavioral state improves the user's experience.
  • the music recommendation method will be described in detail below according to a specific example.
  • the first wearable device takes smart glasses as an example
  • the second wearable device takes an earphone as an example
  • the mobile terminal device takes a mobile phone as an example.
  • FIG. 6 shows a schematic block diagram of a music recommendation method provided by an embodiment of the present application. As shown in FIG. 6 , the method includes the following steps.
  • the mobile phone sends data collection instructions to the smart glasses. After receiving the data collection instruction sent by the mobile phone, the smart glasses start to collect data, and continuously transmit the collected data to the mobile phone.
  • the collected data includes:
  • Frame data collect the frame data of the entire image that the user can see through the smart glasses at a certain frequency (eg 30Hz);
  • Viewpoint data record the user's viewpoint position coordinates (x, y), pupil diameter, acquisition time and fixation time;
  • Head movement data the angle and acceleration of head rotation
  • the above-mentioned period of time may be 1 second.
  • Shoot an APS frame at the beginning of 1 second, start recording picture changes and eye movement data, analyze the data at the end of this moment, and extract features to match the music. If the situation changes during this 1 second, for example, at 500 milliseconds, the user's head turns a lot, you can also analyze only the data of this 500 milliseconds, but if the above period of time is less than 100 milliseconds, it is not enough to generate One fixation point, data is lost.
  • the attention unit can be a macroblock, an object rectangle or a motion rectangle.
  • the macroblocks can be overlapping or non-overlapping;
  • the attention unit at the initial moment can use an algorithm to quantify whether there is an object in a region (for example, objectness algorithm) to extract one or more object rectangles as attention units;
  • the attention units are moving rectangles, the moving rectangles at each moment can be obtained based on the event data collected by the DVS camera.
  • the event data collected by the DVS camera is first expressed as frame data, that is, the gray value of the pixel position of the event is 255, and the gray value of the remaining pixel positions is 0, and then the frame data is first corroded and then expanded to obtain the motion area. , and finally the smallest rectangular box that can cover the entire connected motion area is used as the attention unit.
  • the pursuit behavior may occur:
  • N is a positive integer greater than or equal to 1, where two The distance between attention units is the Euclidean distance of the center coordinates of the two attention units.
  • N is a positive integer greater than or equal to 1, where two The distance between attention units is the Euclidean distance of the center coordinates of the two attention units.
  • Each of the N attention units is matched with the attention unit A. If the features of the attention unit B at the previous moment are similar to the attention unit A, it is considered that the two attention units are the same object at different times. , delete the attention unit B at the previous moment, and accumulate the attention time of the attention unit B at the previous moment to the attention unit A; if the characteristics of the attention unit B and the attention unit A at the previous moment are not similar, then Keep these two attention units.
  • the music recommendation method of the embodiment of the present application is suitable for when the user's head is not moving. If the user's head is moving, no music matching is performed at this time, and the music recommendation of the embodiment of the present application is performed when the user's head is not moving. method.
  • the embodiments of the present application provide two methods and strategies for extracting features and matching music according to attention patterns and attention content.
  • the attention content is used as the input of the deep convolutional neural network, and the category with the largest probability value in the output of the neural network is used as the judgment result. For example, when the probability value is greater than 0.8, the visual feature is judged If the matching degree with the music is high, the music is in line with the user's current perception.
  • the process of matching music based on an image may be any existing method for matching music based on an image, which is not specifically limited in this embodiment of the present application.
  • the state category can be "driving", “learning”, “travel”, “sports” and other high-frequency scenes of listening to songs.
  • the process of judging the user's state category according to the content of the user's attention area at a certain moment can use a classification machine learning method, for example, the attention content is used as the input of the deep convolutional neural network, and the category with the largest probability value in the output of the network is used as the judgment. result.
  • Associating state category information at different times either the time-independent voting method with the highest vote can be used, or the time-related time-weighting method can be used. Learning, there are 2 moments that are judged as exercise, then it can be concluded that the state of the user during this period is learning.
  • the earphone terminal After receiving the audio data and playback instructions sent by the mobile phone, the earphone terminal plays the corresponding music.
  • the embodiment of the present application further provides another method for analyzing the collected data and extracting features to match the music.
  • the other method is described below.
  • a history database of attention units is established, wherein the size of the history database is fixed. For example, it is set that the history database can store 10 attention units.
  • the history database is empty when it is first established, and the attention units generated by the user are put into the history database until the history database is full, wherein the attention duration of the attention units can be determined according to the viewpoint voting in the above method.
  • each newly generated attention unit is matched with each attention unit in the history database, wherein the attention duration of the newly generated attention unit can also be determined according to the viewpoint voting in the above method.
  • the attention unit A has the highest similarity with the newly generated attention unit B, then the attention time corresponding to the attention unit A is accumulated to the attention time corresponding to the attention unit B, then the attention unit A is deleted, and the attention unit B is put into the history library .
  • the process of matching the similarity of different attention units is to extract visual features for different units respectively, and calculate the similarity between different unit features according to the algorithm of speeded up robust features (SURF). If there is an attention unit that exists for more than 1 second and the attention time is less than 600 milliseconds in the history database, delete the attention unit and randomly fill in a newly generated attention unit.
  • SURF speeded up robust features
  • the embodiments of the present application provide two methods and strategies for extracting features and matching music according to attention patterns and attention content.
  • the attention content is used as the input of the deep convolutional neural network, and the category with the largest probability value in the output of the neural network is used as the judgment result. For example, when the probability value is greater than 0.8, the visual feature is judged If the matching degree with the music is high, the music is in line with the user's current perception.
  • the process of matching music based on an image may be any existing method for matching music based on an image, which is not specifically limited in this embodiment of the present application.
  • the state category can be "driving", “learning”, “travel”, “sports” and other high-frequency scenes of listening to songs.
  • the process of judging the user's state category according to the content of the user's attention area at a certain moment can use a classified machine learning method, for example, the attention content is used as the input of the deep convolutional neural network, and the category with the largest probability value in the output of the network is used as the judgment. result.
  • Associating state category information at different times either the time-independent voting method with the highest vote can be used, or the time-related time-weighting method can be used. Learning, there are 2 moments that are judged as exercise, then it can be concluded that the state of the user during this period is learning.
  • the method for data collection and the method for playing music at the headphone end are consistent with the method for data collection in the previous music recommendation method and the method for playing music at the headphone end.
  • the embodiments of the present application will not be repeated here.
  • the music recommendation method of the embodiment of the present application recommends different music according to when the user pays attention to different content, thereby providing a better music experience.
  • the music recommendation method of the embodiment of the present application determines the user's current attention mode by acquiring the user's viewpoint data, head motion data, and environmental data, and selects a full-frame image or a local attention area as a basis for matching music according to the judgment result.
  • the music recommendation method of the embodiment of the present application is described above, and the music recommendation apparatus of the embodiment of the present application is described below.
  • FIG. 7 shows a schematic block diagram of a music recommendation apparatus according to an embodiment of the present application. As shown in FIG. 7 , it includes a transceiver module 710 and a determination module 720 . The functions of the transceiver module 710 and the determination module 720 are respectively introduced below.
  • the transceiver module 710 is configured to receive visual data of the user.
  • a determination module 720 configured to acquire at least one attention unit and an attention duration of the at least one attention unit according to the visual data.
  • the determining module 720 is further configured to determine the attention mode of the user according to the attention duration of the at least one attention unit.
  • the determining module 720 is further configured to determine recommended music information according to the attention pattern.
  • the visual data includes viewpoint information of the user and picture information viewed by the user, and the viewpoint information includes a position of the viewpoint and an attention duration of the viewpoint.
  • the determining module 720 obtains at least one attention unit and an attention duration of the at least one attention unit according to the visual data, including: obtaining the at least one attention unit according to the picture information; obtaining the at least one attention unit; The sum of the attention durations of the viewpoints in the attention unit is used as the attention duration of the at least one attention unit.
  • the determining module 720 acquires at least one attention unit and the attention duration of the at least one attention unit according to the visual data, and further includes: judging the first attention unit and the second attention unit in the at least one attention unit. unit similarity, the first attention unit and the second attention unit are attention units at different times; if the similarity is greater than or equal to the first threshold, the attention duration of the second attention unit is equal to the first attention unit The sum of the attention duration of an attention unit and the attention duration of the second attention unit.
  • the determining module 720 determines the attention mode of the user according to the attention duration of the at least one attention unit, including: if the standard deviation of the attention duration of the at least one attention unit is greater than or equal to a second threshold, determining The user's attention mode is staring; if the standard deviation of the attention duration of the at least one attention unit is less than a second threshold, it is determined that the user's attention mode is scanning.
  • the determining module 720 is configured to determine music information according to the attention mode, including: if the attention mode is scanning, determining music information according to the picture information; if the attention mode is staring, Music information is determined according to the attention unit with the highest attention degree among the attention units.
  • the determining module 720 determines the music information according to the attention mode, and further includes: determining the behavior state of the user at each moment in the first time period according to the attention mode; the behavior state of the user in the first time period; music information is determined according to the behavior state in the first time period.
  • the transceiver module 710 in the music recommendation apparatus 700 in the embodiment of the present application may be used to execute the method of S501 in FIG. 5
  • the determination module 720 may be used to execute the methods of S502 to S504 in FIG. 5 .
  • the embodiment of the present application will not be repeated here.
  • FIG. 8 is a schematic block diagram of a music recommendation device 800 according to an embodiment of the present application.
  • the music recommendation device 800 can be used to perform the method for music recommendation provided in the above embodiments, and for brevity, details are not repeated here.
  • the music recommendation device 800 includes: a processor 810, the processor 810 is coupled with a memory 820, the memory 820 is used for storing computer programs or instructions, and the processor 810 is used for executing the computer programs or instructions stored in the memory 820, so that the above method embodiments are method is executed.
  • Embodiments of the present application further provide a computer-readable storage medium, where program instructions are stored in the computer-readable storage medium, and when the program instructions are executed by a processor, the method for music recommendation of the embodiments of the present application is implemented.
  • Embodiments of the present application also provide a computer program product, characterized in that the computer program product includes computer program codes, and when the computer program codes are run on a computer, the method for music recommendation in the embodiments of the present application is implemented.
  • An embodiment of the present application further provides a music recommendation system, characterized in that the system includes a data collection device and a terminal device, the terminal device includes a processor and a memory, and the memory stores one or more programs, and the The one or more computer programs include instructions, wherein the data acquisition device is used to collect visual data of the user; when the instructions are executed by the one or more processors, the terminal device is made to execute the present application A method of music recommendation of an embodiment.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Social Psychology (AREA)
  • Mathematical Physics (AREA)
  • Psychiatry (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Physiology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

一种音乐推荐方法和装置,通过用户的视点信息来确定用户在复杂环境中的注意模式,更加精准地匹配音乐。第一方面,提供了一种音乐推荐方法,该方法包括:接收用户的视觉数据(S501);根据视觉数据获取至少一个注意单元和至少一个注意单元的注意时长(S502);根据至少一个注意单元的注意时长确定用户的注意模式(S503);根据注意模式确定推荐的音乐信息(S504)。

Description

音乐推荐方法和装置 技术领域
本申请涉及人工智能领域,并且更具体的,涉及一种音乐推荐方法和装置。
背景技术
个性化音乐推荐技术能提升用户的音乐体验。传统的方法是基于用户的历史音乐播放信息,通过数据挖掘技术来推荐音乐。这种方法无法考虑用户当前的状态信息。目前的一些方法可以通过不同的传感器来收集用户的当前状态信息,例如通过感知环境信息,包括位置、天气、时间、季节、环境声音和环境画面等信息来推荐相关的音乐;或者通过测量用户当前的状态信息,例如通过采集脑电波来分析用户当前的心理状态,或者采集用户看到的画面,或者获取用户的心率等来推荐相关的音乐。
目前的方法中,根据拍摄采集用户看到的图像后进行音乐推荐,涉及音乐与图像的匹配过程。而在现实场景中,环境中可能包含许多景物,如果只根据图像整体来推荐音乐,降低了音乐匹配度。
发明内容
本申请提供一种音乐推荐方法和装置,通过用户的视点信息来确定用户在复杂环境中的注意模式,更加精准地匹配音乐。
第一方面,提供了一种音乐推荐方法,该方法包括:接收用户的视觉数据;根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长;根据所述至少一个注意单元的注意时长确定所述用户的注意模式;根据所述注意模式确定推荐的音乐信息。
本申请实施例的音乐推荐方法,根据用户的视觉信息来判断用户的注意模式,可以更加精确判断用户的注意内容,从而推荐更加适合的音乐,使得推荐的音乐符合用户真正感兴趣的事物,符合用户的真正的行为状态,提升用户的使用感受。
结合第一方面,在第一方面的一种可能的实施方式中,视觉数据包括所述用户的视点信息和所述用户所视的画面信息,所述视点信息包括视点的位置和所述视点的注意时长。
结合第一方面,在第一方面的一种可能的实施方式中,根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,包括:根据所述画面信息获取所述至少一个注意单元;获取所述至少一个注意单元中的所述视点的注意时长之和,以作为所述至少一个注意单元的注意时长。
本申请实施例的音乐推荐方法,根据获取的画面信息确定初始的注意单元,根据用户的视点信息确定每个注意单元的时长,相比于现有技术中的仅根据用户所视的整个画面来推荐音乐,视点信息可以精确表示用户感兴趣的注意内容的所在,从而可以实现推荐的音乐更加符合用户的所需。
结合第一方面,在第一方面的一种可能的实施方式中,根据所述视觉数据获取至少一 个注意单元和所述至少一个注意单元的注意时长,还包括:判断所述至少一个注意单元中的第一注意单元和第二注意单元的相似度,所述第一注意单元和所述第二注意单元为不同时刻的注意单元;如果所述相似度大于或等于第一阈值,所述第二注意单元的注意时长等于所述第一注意单元的注意时长和所述第二注意单元的注意时长之和。
本申请实施例的音乐推荐方法中,第一注意单元和第二注意单元可以是一段预设时间内不同时刻帧图像中的注意单元,也可以分别是历史库和新获取的注意单元。
结合第一方面,在第一方面的一种可能的实施方式中,根据所述至少一个注意单元的注意时长确定所述用户的注意模式,包括:如果所述至少一个注意单元的注意时长的标准差大于或等于第二阈值,确定所述用户的注意模式为盯着看;如果所述至少一个注意单元的注意时长的标准差小于第二阈值,确定所述用户的注意模式为扫描看。
结合第一方面,在第一方面的一种可能的实施方式中,根据所述注意模式确定音乐信息,包括:如果所述注意模式为扫描看,根据所述画面信息确定音乐信息;如果所述注意模式为盯着看,根据所述注意单元中关注度最高的注意单元确定音乐信息。
本申请实施例的音乐推荐方法中,在确定了用户的注意模式后,即可根据用户在这一段预设时间内的注意模式确定在这一段预设时间内适合推荐给用户的音乐信息。当用户的注意模式为扫描看时,则认为用户在这一段预设时间主要在感知环境,则可以根据画面信息(环境)来推荐音乐;当用户的注意模式为盯着看时,则认为用户在这一段预设时间主要在感知感兴趣的事物,则可以根据关注度最高的注意单元(感兴趣的事物)来推荐音乐。
结合第一方面,在第一方面的一种可能的实施方式中,根据所述注意模式确定音乐信息,还包括:根据所述注意模式确定所述用户在第一时间段内的每个时刻的行为状态;根据所述每个时刻的状态确定所述用户在第一时间段内的行为状态;根据所述第一时间段内的行为状态确定音乐信息。
本申请实施例的音乐推荐方法中,在根据用户在一段预设时间内的注意模式确定了注意内容后,也可以先不确定音乐信息,而是确定用户在这一段预设时间内的行为状态,然后根据多个预设时间段的行为状态确定用户在第一时间段内的总的行为状态,可以更加精确判断用户实际的行为状态,根据总的行为状态来推荐音乐,使得推荐的音乐更加符合用户的实际行为状态。
第二方面,提供了一种音乐推荐的装置,该装置包括:收发模块,用于接收用户的视觉数据;确定模块,用于根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长;所述确定模块还用于根据所述至少一个注意单元的注意时长确定所述用户的注意模式;所述确定模块还用于根据所述注意模式确定推荐的音乐信息。
本申请实施例提供一种音乐推荐的装置,用于实现第一方面中的音乐推荐的方法。
结合第二方面,在第二方面的一种可能的实施方式中,视觉数据包括所述用户的视点信息和所述用户所视的画面信息,所述视点信息包括视点的位置和所述视点的注意时长。
结合第二方面,在第二方面的一种可能的实施方式中,确定模块根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,包括:根据所述画面信息获取所述至少一个注意单元;获取所述至少一个注意单元中的所述视点的注意时长之和,以作为所述至少一个注意单元的注意时长。
结合第二方面,在第二方面的一种可能的实施方式中,确定模块根据所述视觉数据获 取至少一个注意单元和所述至少一个注意单元的注意时长,还包括:判断所述至少一个注意单元中的第一注意单元和第二注意单元的相似度,所述第一注意单元和所述第二注意单元为不同时刻的注意单元;如果所述相似度大于或等于第一阈值,所述第二注意单元的注意时长等于所述第一注意单元的注意时长和所述第二注意单元的注意时长之和。
结合第二方面,在第二方面的一种可能的实施方式中,确定模块根据所述至少一个注意单元的注意时长确定所述用户的注意模式,包括:如果所述至少一个注意单元的注意时长的标准差大于或等于第二阈值,确定所述用户的注意模式为盯着看;如果所述至少一个注意单元的注意时长的标准差小于第二阈值,确定所述用户的注意模式为扫描看。
结合第二方面,在第二方面的一种可能的实施方式中,确定模块用于根据所述注意模式确定音乐信息,包括:如果所述注意模式为扫描看,根据所述画面信息确定音乐信息;如果所述注意模式为盯着看,根据所述注意单元中关注度最高的注意单元确定音乐信息。
结合第二方面,在第二方面的一种可能的实施方式中,确定模块根据所述注意模式确定音乐信息,还包括:根据所述注意模式确定所述用户在第一时间段内的每个时刻的行为状态;根据所述每个时刻的状态确定所述用户在第一时间段内的行为状态;根据所述第一时间段内的行为状态确定音乐信息。
第三方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有程序指令,当所述程序指令由处理器运行时,以实现上述第一方面和第一方面任一种实现方式的方法。
第四方面,提供了一种计算机程序产品,该计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以实现上述第一方面和第一方面任一种实现方式的方法。
第五方面,提供了一种音乐推荐系统,该系统包括数据采集设备和终端设备,所述终端设备包括处理器和存储器,所述存储器中存储有一个或多个程序,所述一个或多个计算机程序包括指令,其中,所述数据采集设备,用于采集用户的视觉数据;当所述指令被所述一个或多个处理器执行时,使得所述终端设备执行上述第一方面和第一方面任一种实现方式的方法。
附图说明
图1是本申请实施例的音乐推荐方法应用的系统架构;
图2是本申请实施例的音乐推荐方法应用的系统架构中的第一可穿戴设备的示意性框图;
图3是本申请实施例的音乐推荐方法应用的系统架构中的终端设备的示意性框图;
图4是本申请实施例的音乐推荐方法应用的系统架构中的第二可穿戴设备的示意性框图;
图5是本申请实施例的音乐推荐方法的示意性流程图;
图6是本申请实施例的音乐推荐方法的示意性框图;
图7是本申请实施例的音乐推荐装置的示意性框图;
图8是本申请实施例的音乐推荐设备的示意性框图。
具体实施方式
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。还应当理解,在本申请以下各实施例中,“至少一个”、“一个或多个”是指一个、两个或两个以上。术语“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
下面将结合附图,对本申请中的技术方案进行描述。
现有的图像和应用匹配方法主要包括:一是提取音乐和图像两个模态的传统底层特征,再通过关系模型建立两者的联系,这种方法推荐的音乐与图像匹配度不高;二是先采集音乐与图像的匹配对数据,基于深度神经网络自动学习音乐与图像的匹配模型,这种方法在简单的场景下可以推荐合适的音乐。
然而在现实场景中,环境可能包含许多景物和不同的风格元素,以上现有的方法没有考虑用户在当前环境中的兴趣所在,降低了音乐匹配度。例如,当用户关注场景中的云朵,与关注场景中的动物时,匹配的音乐应当是不同的。
因此本申请实施例提供了一种音乐推荐方法,通过获取用户的视点信息来得到用户在复杂环境中的注意区域,从而知道用户在当前环境中的真正兴趣所在,提高音乐匹配度。
图1示出了本申请实施例的音乐推荐方法应用的系统架构,如图1所示,包括第一可穿戴设备、第二可穿戴设备、移动终端设备。其中,第一可穿戴设备为可以采集用户视觉数据并记录用户头部运动数据的可穿戴设备,例如智能眼镜等,其上安装有先进摄影系统(advanced photo system,APS)摄像头、动态视觉传感器(dynamic vision sensor,DVS)摄像头、眼动仪和惯性测量单元(inertial measurement unit,IMU)传感器。第二可穿戴设备为可以播放音乐的可穿戴设备,例如耳机等。移动终端设备可以是手机、平板电脑、可穿戴设备(例如,智能手表)、车载设备、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等设备。本申请实施例的终端设备可以包括触摸屏,用于向用户展示服务内容。本申请实施例中对终端设备的具体类型并不作任何限定。
应理解,上述只是对本申请实施例图1中的设备的举例,并不构成对本申请实施例的限定,除了上述举例的设备,本申请实施例图1中的设备还可以是其他可以实现相同功能的设备。
在应用本申请实施例的音乐推荐方法时,移动终端设备给第一可穿戴设备发送数据采集指令。第一可穿戴设备接收指令后,按一定频率采集全帧数据、记录画面变化数据,同时记录用户视点数据和局部画面数据,以及头部转动的加速度和角度数据,并不断发送到移动终端设备。移动终端设备接收数据后,判断用户的注意区域和注意模式,根据用户的注意模式和注意区域提取对应特征,匹配音乐。移动终端设备向第二可穿戴设备发送音频数据,第二可穿戴设备播放音乐。
图2示出了应用本申请实施例的音乐推荐方法时,第一可穿戴设备中包括的模块。
无线模块,用于建立无线链路,与其他节点通信,其中无线通信可以采用wifi、蓝牙和蜂窝网络等通信方式。
视频帧采集模块,用于驱动第一可穿戴设备上的APS摄像头,采集描述环境的视频帧。
视点采集模块,用于驱动眼镜上的眼动仪,采集视点数据。其中,视点数据包括视点位置、获取时间、注视时间、瞳孔直径;
头部运动采集模块,用于驱动眼镜上的IMU模块,采集头部转动的速度和加速度。
画面变化捕捉模块,用于驱动眼镜上的DVS摄像头,采集画面的变化数据。
数据接收模块,用于接收移动终端设备发送的数据。
数据发送模块,用于将采集到的数据发送给移动终端设备。
图3示出了应用本申请实施例的音乐推荐方法时,移动终端设备中包括的模块。
无线模块,用于建立无线链路,与其他节点通信,其中无线通信可以采用wifi、蓝牙和蜂窝网络等通信方式。
注意模式判别模块,用于根据眼镜采集到的数据,计算注意区域和注意模式。
特征提取和音乐匹配模块,用于根据注意模式类别,提取特征,并匹配音乐。
数据接收模块,用于接收来自第一可穿戴设备发送的数据。
数据发送模块,用于将音乐的音频数据和播放指令发送给第二可穿戴设备。
图4示出了应用本申请实施例的音乐推荐方法时,第二可穿戴设备中包括的模块。
无线模块,用于建立无线链路,与其他节点通信,其中无线通信可以采用wifi、蓝牙和蜂窝网络等通信方式。
数据接收模块,用于接收移动终端设备发送的音频数据和播放指令。
音频播放模块,用于根据移动终端设备发送的音频数据和播放指令播放音乐。
图5示出了本申请实施例的一种音乐推荐方法的示意性流程图,包括步骤501至步骤504,以下分别对这些步骤进行详细介绍。其中,图5中的音乐推荐方法可以由图1中的终端设备执行。
S501,接收用户的视觉数据。
具体的,可以是终端设备接收第一穿戴设备发送的用户的视觉数据,第一穿戴设备采集用户在一段预设时间内(例如1秒)的视觉数据,其中,用户的视觉数据包括用户的视点信息和用户所视的画面信息,视点信息包括视点的位置坐标(x,y)以及该视点的注意时长;画面信息包括APS摄像头采集到的视频帧图像和DVS摄像头采集到的画面变化数据。
S502,根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时 长。
具体的,根据画面信息获取至少一个注意单元,例如,将视频帧图像中的宏块作为注意单元,其中宏块可以是重叠的,也可以是不重叠的;或者根据用量化一个区域内是否存在一个物体的算法(例如objectness算法)来提取一个或多个对象矩形框作为注意单元;还可以根据画面变化数据获取不同时刻的运动矩形框,然后将运动矩形框作为注意单元。其中,每个注意单元可以以最近时刻的帧图像中与该注意单元同一位置的图像数据作为该注意单元的内容。
当用户所视画面静止或者在一帧图像中,此时DVS摄像头没有采集到画面变化数据,获取注意单元的注意时长可以根据所有的视点对注意单元投票,即当一个视点位于一个注意单元内时,则将该视点的注意时长累加到该注意单元的注视时长中。
可选的,根据视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,还包括:当用户所视画面是变化的,此时DVS摄像头采集到画面变化数据,则在一帧图像中时依然按照上述方法对该帧图像中的注意单元投票,由此可以获得每帧图像中的注意单元的注意时长。对于任意两个相邻时刻的图像中的注意单元,以后一时刻图像中的一个注意单元为例,命名为第二注意单元,在前一时刻图像中找到与第二注意单元距离小于预设值的N个注意单元,其中,注意单元之间的距离为两个注意单元中心坐标的欧氏距离,N可以是人为规定的值,也可以是满足条件的注意单元的个数的最大值。以N个注意单元中的一个注意单元为例,命名为第一注意单元,判断第一注意单元和第二注意单元的相似度,即匹配第一注意单元和第二注意单元的特征,其中,第一注意单元和第二注意单元的特征匹配方法可以是现有的任一种图像特征匹配方法,本申请实施例在此不作具体限定,如果判定第一注意单元和第二注意单元的特征相似,即第一注意单元和第二注意单元的相似度大于或等于第一阈值,则认为第一注意单元和第二注意单元是同一事物在不同时刻的呈现,则令第二注意单元的注意时长等于第一注意单元的注意时长和第二注意单元的注意时长之和,然后令第一注意单元的注意时长为零;如果判定第一单元和第二单元的特征不相似,即第一注意单元和第二注意单元的相似度小于第一阈值,则保留第一单元和第二单元的注意时长。对于任意两个相邻时刻图像中的注意单元均按照上述方法进行判定。
可选的,根据视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,还包括:建立注意单元的历史库,历史库的大小固定,例如只能存放10个注意单元。判断新获取的注意单元与历史库中的注意单元的相似度,例如判断新获取的第二注意单元与历史库中的第一注意单元的相似度,可以分别提取第一注意单元和第二注意单元的视觉特征,然后计算视觉特征之间的相似度。如果判定第一注意单元和第二注意单元的特征相似,即第一注意单元和第二注意单元的相似度大于或等于第三阈值,则第二注意单元的注意时长等于第一注意单元的注意时长和第二注意单元的注意时长之和,然后用第二注意单元替换第一注意单元存放在历史库中;如果判定第一单元和第二单元的特征不相似,即第一注意单元和第二注意单元的相似度小于第三阈值,则保留历史库中的第一注意单元。如此,可以在一段预设时间内,例如1秒中得到历史库中的注意单元和每个注意单元的注意时间,然后按照S503中的方法确定用户在这1秒内的注意模式。然后删除历史库中存在时间超过1秒且注意时长少于600毫秒的注意单元,并补充新获取的注意单元。
S503,根据所述至少一个注意单元的注意时长确定用户的注意模式。
在一段预设时间内,如果所有注意单元的注意时长的标准差大于或等于第二阈值,则确定所述用户的注意模式为盯着看;如果所有注意单元的注意时长的标准差小于第二阈值,则确定所述用户的注意模式为扫描看。
S504,根据所述注意模式确定推荐的音乐信息。
如果用户的注意模式为扫描看,则将APS摄像头采集的帧图像直接作为用户的注意内容;如果用户的注意模式为盯着看,则将一段预设时间内的所有注意单元中关注度最高的注意单元作为用户的注意内容。其中,关注度可以根据注意时长来判定,例如将注意时长最长的注意单元作为关注度最高的注意单元;或者可以根据用户瞳孔缩放程度来判定,例如将用户瞳孔放大程度最大的注意单元作为关注度最高的注意单元;或者可以根据用户复看次数来判定,例如用户在注视一个注意单元后,又多次复看该注意单元,如果复看次数大于预设值,则将该注意单元作为关注度最高的注意单元;或者同时考虑这三者来估计注意单元的关注度,例如关注度是用户瞳孔放大程度乘以注意时长乘以复看次数的积。
然后根据注意内容确定音乐信息,根据注意内容确定音乐信息可以是现有的根据图像匹配音乐的方法,例如将注意内容(帧图像或关注度最高的注意单元)作为神经网络模型的输入,将神经网络模型输出的概率值最大的音乐类别作为判断结果,例如当概率值大于0.8时,即认为图像和音乐的匹配度足够高。
可选的,在根据用户在一段预设时间内的注意模式确定了注意内容后,也可以先不确定音乐信息,而是确定用户在这一段预设时间内的行为状态。根据注意内容确定行为状态的方法可以采用现有的分类的机器学习方法,例如把注意内容作为神经网络模型的输入,然后将神经网络模型的输出概率值最大的行为状态类别作为判断结果。其中,行为状态包括开车、学习、旅行、运动等。由此,可以确定用户在第一时间段内的多个预设时间的行为状态,例如第一时间段为10秒,一段预设时间为1秒,则可以确定用户在10秒内的10个行为状态。对该10个行为状态进行投票,例如,10个行为状态中有7个被判定为学习,2个被判定为运动,1个被判定为旅行,则认为用户在这10秒中的行为状态为学习。最后根据用户在第一时间段内的行为状态来匹配音乐,根据行为状态来匹配音乐的方法可以是现有的方法,例如可以根据行为状态的标签信息来匹配音乐,本申请实施例对此不作具体限定。
确定了音乐信息后,终端设备可以根据音乐信息向第二可穿戴设备发送音乐播放指令,第二可穿戴设备则播放指定音乐。或者终端设备也可以根据音乐信息播放音乐。
本申请实施例的音乐推荐方法,根据用户的视觉信息来判断用户的注意模式,可以更加精确判断用户的注意内容,从而推荐更加适合的音乐,使得推荐的音乐符合用户真正感兴趣的事物,符合用户的真正的行为状态,提升用户的使用感受。
以下根据具体示例对本申请实施例的音乐推荐方法进行详细介绍。其中,第一可穿戴设备以智能眼镜为例,第二可穿戴设备以耳机为例,移动终端设备以手机为例。
图6示出了本申请实施例提供的音乐推荐方法的示意性框图,如图6所示,包括如下步骤。
1、数据采集
手机给智能眼镜发送数据采集指令。智能眼镜接收到手机发送的数据采集指令后,开始采集数据,并将采集到的数据不断传输给手机端。采集的数据包括:
(1)帧数据:按一定频率(例如30Hz)采集用户通过智能眼镜可以看到的整幅图像的帧数据;
(2)视点数据:记录用户的视点位置坐标(x,y)、瞳孔直径、获取时间和注视时间;
(3)头部运动数据:头部转动的角度和加速度;
(4)画面变化数据:DVS摄像头采集到的事件数。
2、基于采集到的数据进行分析,并提取特征匹配音乐
Ⅰ、判定用户在一段时间内的一个或多个注意单元和每个注意单元对应的注意时间。
具体的,上述一段时间可以是1秒。在1秒的开始时刻拍摄一次APS帧,开始记录画面变化和眼动数据,这段时刻的末尾分析数据,并提取特征匹配音乐。如果在这1秒中情况发生了变化,例如在500毫秒时,用户的头部发生了较大转动,则也可以只分析这500毫秒的数据,但是如果上述一段时间小于100毫秒,不足以生成一个注视点,则丢掉数据。
其中,注意单元可以是宏块、对象矩形框或运动矩形框。当注意单元是宏块时,宏块可以是重叠的,也可以是不重叠的;当注意单元是对象矩形框时,初始时刻的注意单元可以用量化一个区域内是否存在一个物体的算法(例如objectness算法)来提取一个或多个对象矩形框作为注意单元;当注意单元是运动矩形框时,可以基于DVS摄像头采集到的事件数据来获取每个时刻的运动矩形框,具体的,在每个时刻,先将DVS摄像头采集到的事件数据表示为帧数据,即事件的像素位置的灰度值为255,其余像素位置的灰度值为0,然后在帧数据上先腐蚀后膨胀得到运动区域,最后将能覆盖整个连通运动区域的最小矩形框作为注意单元。
当用户头部不动(头部转动角度小于等于5度时),且DVS摄像头在这1秒内无局部输出,即用户所见画面静止时:
(1)当一个注视点位于一个注意单元内,累积该注视点的注意时长到当前注意单元的注意时长。
(2)去掉注意时长为0的注意单元,根据非极大值抑制(non maximum suppression,NMS)方法去掉面积高度重合的注意单元。
当用户头部不动(头部转动角度小于等于5度时),且DVS摄像头在这1秒内有局部输出,即用户所见画面有变化,可能发生追视行为时:
(1)在同一时刻,当一个注视点位于一个注意单元内,累积该注视点的注意时长到当前注意单元的注意时长。
(2)去掉每个时刻中注意时长为0的注意单元,根据NMS方法去掉面积高度重合的注意单元。
(3)在相邻两个时刻,对于后一时刻的一个注意单元A,在前一时刻找到与该注意单元距离最近的N个注意单元,N为大于或等于1的正整数,其中两个注意单元之间的距离为这两个注意单元中心坐标的欧式距离。将N个注意单元中每个注意单元分别与该注意单元A进行特征匹配,如果前一时刻的注意单元B与注意单元A的特征相似,则认为这两个注意单元是同一个物体在不同时刻的呈现,则删掉前一时刻的注意单元B,并将前一时刻的注意单元B的注意时间累积到注意单元A;如果前一时刻的注意单元B与注意单元A的特征不相似,则保留这两个注意单元。
本申请实施例的音乐推荐方法适用于用户头部不动时,如果用户头部在动时,则此时不进行音乐匹配,等到用户头部不动时,再执行本申请实施例的音乐推荐方法。
Ⅱ、判定注意模式和注意内容。
注意模式:
根据上述判定方法:
(1)如果注意单元的个数为0时,则判定注意模式为“扫描看”;
(2)如果注意单元的个数不为0时,且不同注意单元的注意时间的均方差大于预设值,例如100ms,则判定注意模式为“盯着看”,否则为“扫描看”。
注意内容:
(1)当注意模式为“扫描看”时,此时认为用户主要在感知环境,因此将APS帧图像作为注意内容;
(2)当注意模式为“盯着看”时,此时认为用户在感知兴趣物体,并将关注度最高的注意单元作为注意内容。
Ⅲ、根据注意模式和注意内容提取特征、匹配音乐。
本申请实施例提供了两种根据注意模式和注意内容提取特征、匹配音乐的方法策略。
(1)短时(short term)策略
直接匹配当前时段的注意内容的视觉特征和音乐的音频特征。例如,采用分类的机器学习方法,将注意内容作为深度卷积神经网络的输入,并将神经网络的输出中概率值最大的类别作为判断结果,例如,当概率值大于0.8时,判断该视觉特征和该音乐的匹配度高,则该音乐是符合用户的当前感知的。根据图像匹配音乐的过程可以是根据现有的任一种图像匹配音乐的方法,本申请实施例在此不做具体限定。
(2)长时(long term)策略
判定每一个时刻用户注意区域内容属于的状态类别,关联不同时刻状态类别信息,得出一段时间内用户的状态,并根据状态的标签信息来匹配音乐。其中,状态类别可以是“开车”、“学习”、“旅行”、“运动”等听歌高频场景。根据某一时刻用户注意区域内容来判断用户状态类别的过程,可以采用分类的机器学习方法,例如将注意内容作为深度卷积神经网络的输入,并将网络的输出中概率值最大的类别作为判断结果。关联不同时刻状态类别信息,可以采用与时间无关的投票取最高的方法,也可以采用与时间相关的时间加权方法,例如在一段时间内分为十个时刻,其中用户有8个时刻被判定为学习,有2个时刻被判定为运动,则可以得出用户在这段时间内的状态为学习。
3、耳机端播放音乐
耳机端接收手机端发送的音频数据和播放指令后,播放相应的音乐。
可选的,在图6所示的音乐推荐方法中,本申请实施例还提供另外一种基于采集到的数据进行分析,并提取特征匹配音乐的方法。以下对该另一种方法进行介绍。
Ⅰ、判断用户在一段时间内的一个或多个注意单元和每个注意单元对应的注意时长。
建立注意单元的历史库,其中历史库的大小固定,例如设定为该历史库可以储存10个注意单元。历史库刚建立时为空,将用户产生的注意单元放入历史库直至历史库满,其中注意单元的注意时长可以根据上述方法中的视点投票确定。历史库满后,将每一个新产生的注意单元与历史库中的每一个注意单元进行匹配,其中,新产生的注意单元的注意时 长也可以根据上述方法中的视点投票确定,如果历史库中的注意单元A与新产生的注意单元B的相似度最高,则将注意单元A对应的注意时间累积到注意单元B对应的注意时间,然后删除注意单元A,将注意单元B放入历史库中。其中,匹配不同注意单元的相似性的过程为,对不同单元分别提取视觉特征,根据快速鲁棒性特征(speeded up robust features,SURF)的算法计算不同单元特征之间的相似度。如果历史库中有存在时间超过1秒且注意时间低于600毫秒的注意单元,则删除该注意单元,并随机填入一个新产生的注意单元。
Ⅱ、根据历史库内的注意单元和注意时间,判断注意模式和注意内容。
每隔1秒,根据历史库中的注意单元和注意时间,量化不同注意单元的注意分配均衡度。
当用户的转头角度大于90度且小于270度时,即用户的视角变化大,则清空注意单元的历史库,当用户头部不动后,再重新累积历史库,1秒后再次量化注意单元的均衡度。
注意模式:
根据上述判定方法:
(1)如果历史库中注意单元的个数为0时,则判定注意模式为“扫描看”;
(2)如果历史库中注意单元的个数不为0时,且不同注意单元的注意时间的均方差大于预设值,例如100ms,则判定注意模式为“盯着看”,否则为“扫描看”。
注意内容:
(1)当注意模式为“扫描看”时,此时认为用户主要在感知环境,因此将APS帧图像作为注意内容;
(2)当注意模式为“盯着看”时,此时认为用户在感知兴趣物体,并将关注度最高的注意单元作为注意内容。
Ⅲ、根据注意模式和注意内容提取特征、匹配音乐。
本申请实施例提供了两种根据注意模式和注意内容提取特征、匹配音乐的方法策略。
(1)短时(short term)策略
直接匹配当前时段的注意内容的视觉特征和音乐的音频特征。例如,采用分类的机器学习方法,将注意内容作为深度卷积神经网络的输入,并将神经网络的输出中概率值最大的类别作为判断结果,例如,当概率值大于0.8时,判断该视觉特征和该音乐的匹配度高,则该音乐是符合用户的当前感知的。根据图像匹配音乐的过程可以是根据现有的任一种图像匹配音乐的方法,本申请实施例在此不做具体限定。
(2)长时(long term)策略
判定每一个时刻用户注意区域内容属于的状态类别,关联不同时刻状态类别信息,得出一段时间内用户的状态,并根据状态的标签信息来匹配音乐。其中,状态类别可以是“开车”、“学习”、“旅行”、“运动”等听歌高频场景。根据某一时刻用户注意区域内容来判断用户状态类别的过程,可以采用分类的机器学习方法,例如将注意内容作为深度卷积神经网络的输入,并将网络的输出中概率值最大的类别作为判断结果。关联不同时刻状态类别信息,可以采用与时间无关的投票取最高的方法,也可以采用与时间相关的时间加权方法,例如在一段时间内分为十个时刻,其中用户有8个时刻被判定为学习,有2个时刻被判定为运动,则可以得出用户在这段时间内的状态为学习。
其中,数据采集的方法和耳机端播放音乐的方法与前一音乐推荐方法中数据采集的方 法和耳机端播放音乐的方法一致,为了简洁,本申请实施例在此不再赘述。
本申请实施例的音乐推荐方法根据用户注意不同的内容时,推荐不同的音乐,从而提供更好的音乐体验。本申请实施例的音乐推荐方法通过获取用户的视点数据、头部运动数据和环境数据,来判断用户当前的注意模式,根据判断结果选择全帧图像或者局部注意区域来作为匹配音乐的依据。
以上介绍了本申请实施例的音乐推荐方法,下面介绍本申请实施例的音乐推荐装置。
图7示出了本申请实施例的音乐推荐装置的示意性框图,如图7所示,包括收发模块710和确定模块720,以下对收发模块710和确定模块720的作用分别进行介绍。
收发模块710,用于接收用户的视觉数据。
确定模块720,用于根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长。
所述确定模块720还用于根据所述至少一个注意单元的注意时长确定所述用户的注意模式。
所述确定模块720还用于根据所述注意模式确定推荐的音乐信息。
可选的,所述视觉数据包括所述用户的视点信息和所述用户所视的画面信息,所述视点信息包括视点的位置和所述视点的注意时长。
可选的,所述确定模块720根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,包括:根据所述画面信息获取所述至少一个注意单元;获取所述至少一个注意单元中的所述视点的注意时长之和,以作为所述至少一个注意单元的注意时长。
可选的,所述确定模块720根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,还包括:判断所述至少一个注意单元中的第一注意单元和第二注意单元的相似度,所述第一注意单元和所述第二注意单元为不同时刻的注意单元;如果所述相似度大于或等于第一阈值,所述第二注意单元的注意时长等于所述第一注意单元的注意时长和所述第二注意单元的注意时长之和。
可选的,所述确定模块720根据所述至少一个注意单元的注意时长确定所述用户的注意模式,包括:如果所述至少一个注意单元的注意时长的标准差大于或等于第二阈值,确定所述用户的注意模式为盯着看;如果所述至少一个注意单元的注意时长的标准差小于第二阈值,确定所述用户的注意模式为扫描看。
可选的,所述确定模块720用于根据所述注意模式确定音乐信息,包括:如果所述注意模式为扫描看,根据所述画面信息确定音乐信息;如果所述注意模式为盯着看,根据所述注意单元中关注度最高的注意单元确定音乐信息。
所述确定模块720根据所述注意模式确定音乐信息,还包括:根据所述注意模式确定所述用户在第一时间段内的每个时刻的行为状态;根据所述每个时刻的状态确定所述用户在第一时间段内的行为状态;根据所述第一时间段内的行为状态确定音乐信息。
应理解,本申请实施例的音乐推荐装置700中的收发模块710可以用于执行图5中S501的方法,确定模块720可以用于执行图5中S502至S504的方法,具体描述可以参照上述对于图5的介绍,为了简洁,本申请实施例在此不再赘述。
图8是本申请实施例的音乐推荐设备800的示意性框图。该音乐推荐设备800可以用 于执行上文实施例提供的音乐推荐的方法,为了简洁,此处不再赘述。音乐推荐设备800包括:处理器810,处理器810与存储器820耦合,存储器820用于存储计算机程序或指令,处理器810用于执行存储器820存储的计算机程序或指令,使得上文方法实施例中的方法被执行。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质中存储有程序指令,当所述程序指令由处理器运行时,以实现本申请实施例的音乐推荐的方法。
本申请实施例还提供一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以实现本申请实施例的音乐推荐的方法。
本申请实施例还提供一种音乐推荐系统,其特征在于,所述系统包括数据采集设备和终端设备,所述终端设备包括处理器和存储器,所述存储器中存储有一个或多个程序,所述一个或多个计算机程序包括指令,其中,所述数据采集设备,用于采集用户的视觉数据;当所述指令被所述一个或多个处理器执行时,使得所述终端设备执行本申请实施例的音乐推荐的方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的 介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种音乐推荐方法,其特征在于,包括:
    接收用户的视觉数据;
    根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长;
    根据所述至少一个注意单元的注意时长确定所述用户的注意模式;
    根据所述注意模式确定推荐的音乐信息。
  2. 根据权利要求1所述的方法,其特征在于,所述视觉数据包括所述用户的视点信息和所述用户所视的画面信息,所述视点信息包括视点的位置和所述视点的注意时长。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,包括:
    根据所述画面信息获取所述至少一个注意单元;
    获取所述至少一个注意单元中的所述视点的注意时长之和,以作为所述至少一个注意单元的注意时长。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,还包括:
    判断所述至少一个注意单元中的第一注意单元和第二注意单元的相似度,所述第一注意单元和所述第二注意单元为不同时刻的注意单元;
    如果所述相似度大于或等于第一阈值,所述第二注意单元的注意时长等于所述第一注意单元的注意时长和所述第二注意单元的注意时长之和。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述根据所述至少一个注意单元的注意时长确定所述用户的注意模式,包括:
    如果所述至少一个注意单元的注意时长的标准差大于或等于第二阈值,确定所述用户的注意模式为盯着看;
    如果所述至少一个注意单元的注意时长的标准差小于第二阈值,确定所述用户的注意模式为扫描看。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述注意模式确定音乐信息,包括:
    如果所述注意模式为扫描看,根据所述画面信息确定音乐信息;
    如果所述注意模式为盯着看,根据所述注意单元中关注度最高的注意单元确定音乐信息。
  7. 根据权利要求1至5中任一项所述的方法,其特征在于,所述根据所述注意模式确定音乐信息,还包括:
    根据所述注意模式确定所述用户在第一时间段内的每个时刻的行为状态;
    根据所述每个时刻的状态确定所述用户在第一时间段内的行为状态;
    根据所述第一时间段内的行为状态确定音乐信息。
  8. 一种音乐推荐的装置,其特征在于,包括:
    收发模块,用于接收用户的视觉数据;
    确定模块,用于根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长;
    所述确定模块还用于根据所述至少一个注意单元的注意时长确定所述用户的注意模式;
    所述确定模块还用于根据所述注意模式确定推荐的音乐信息。
  9. 根据权利要求8所述的装置,其特征在于,所述视觉数据包括所述用户的视点信息和所述用户所视的画面信息,所述视点信息包括视点的位置和所述视点的注意时长。
  10. 根据权利要求9所述的装置,其特征在于,所述确定模块根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,包括:
    根据所述画面信息获取所述至少一个注意单元;
    获取所述至少一个注意单元中的所述视点的注意时长之和,以作为所述至少一个注意单元的注意时长。
  11. 根据权利要求10所述的装置,其特征在于,所述确定模块根据所述视觉数据获取至少一个注意单元和所述至少一个注意单元的注意时长,还包括:
    判断所述至少一个注意单元中的第一注意单元和第二注意单元的相似度,所述第一注意单元和所述第二注意单元为不同时刻的注意单元;
    如果所述相似度大于或等于第一阈值,所述第二注意单元的注意时长等于所述第一注意单元的注意时长和所述第二注意单元的注意时长之和。
  12. 根据权利要求8至11中任一项所述的装置,其特征在于,所述确定模块根据所述至少一个注意单元的注意时长确定所述用户的注意模式,包括:
    如果所述至少一个注意单元的注意时长的标准差大于或等于第二阈值,确定所述用户的注意模式为盯着看;
    如果所述至少一个注意单元的注意时长的标准差小于第二阈值,确定所述用户的注意模式为扫描看。
  13. 根据权利要求12所述的装置,其特征在于,所述确定模块用于根据所述注意模式确定音乐信息,包括:
    如果所述注意模式为扫描看,根据所述画面信息确定音乐信息;
    如果所述注意模式为盯着看,根据所述注意单元中关注度最高的注意单元确定音乐信息。
  14. 根据权利要求8至12中任一项所述的装置,其特征在于,所述确定模块根据所述注意模式确定音乐信息,还包括:
    根据所述注意模式确定所述用户在第一时间段内的每个时刻的行为状态;
    根据所述每个时刻的状态确定所述用户在第一时间段内的行为状态;
    根据所述第一时间段内的行为状态确定音乐信息。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序指令,当所述程序指令由处理器运行时,以实现权利要求1至7中任一项所述的方法。
  16. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以实现权利要求1至7中任一项所述的方法。
  17. 一种音乐推荐系统,其特征在于,所述系统包括数据采集设备和终端设备,所述 终端设备包括处理器和存储器,所述存储器中存储有一个或多个程序,所述一个或多个计算机程序包括指令,其中,
    所述数据采集设备,用于采集用户的视觉数据;
    当所述指令被所述一个或多个处理器执行时,使得所述终端设备执行权利要求1至7中任一项所述的方法。
PCT/CN2020/112414 2020-08-31 2020-08-31 音乐推荐方法和装置 WO2022041182A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202080092641.8A CN114930319A (zh) 2020-08-31 2020-08-31 音乐推荐方法和装置
EP20950851.4A EP4198772A4 (en) 2020-08-31 2020-08-31 METHOD AND DEVICE FOR MAKING A MUSIC RECOMMENDATION
PCT/CN2020/112414 WO2022041182A1 (zh) 2020-08-31 2020-08-31 音乐推荐方法和装置
US18/175,097 US20230206093A1 (en) 2020-08-31 2023-02-27 Music recommendation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/112414 WO2022041182A1 (zh) 2020-08-31 2020-08-31 音乐推荐方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/175,097 Continuation US20230206093A1 (en) 2020-08-31 2023-02-27 Music recommendation method and apparatus

Publications (1)

Publication Number Publication Date
WO2022041182A1 true WO2022041182A1 (zh) 2022-03-03

Family

ID=80354273

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112414 WO2022041182A1 (zh) 2020-08-31 2020-08-31 音乐推荐方法和装置

Country Status (4)

Country Link
US (1) US20230206093A1 (zh)
EP (1) EP4198772A4 (zh)
CN (1) CN114930319A (zh)
WO (1) WO2022041182A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647721A (zh) * 2022-05-23 2022-06-21 风林科技(深圳)有限公司 教育智能机器人控制方法、设备及介质
CN114723939A (zh) * 2022-04-12 2022-07-08 国网四川省电力公司营销服务中心 基于注意力机制的非极大值抑制方法、系统、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920648A (zh) * 2018-07-03 2018-11-30 四川大学 一种基于音乐-图像语义关系的跨模态匹配方法
CN109063163A (zh) * 2018-08-14 2018-12-21 腾讯科技(深圳)有限公司 一种音乐推荐的方法、装置、终端设备和介质
CN109151176A (zh) * 2018-07-25 2019-01-04 维沃移动通信有限公司 一种信息获取方法及终端
CN111241385A (zh) * 2018-11-29 2020-06-05 北京京东尚科信息技术有限公司 信息处理方法、装置以及计算机系统和介质
CN111400605A (zh) * 2020-04-26 2020-07-10 Oppo广东移动通信有限公司 基于眼球追踪的推荐方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5541529B2 (ja) * 2011-02-15 2014-07-09 株式会社Jvcケンウッド コンテンツ再生装置、楽曲推薦方法およびコンピュータプログラム
US20150271571A1 (en) * 2014-03-18 2015-09-24 Vixs Systems, Inc. Audio/video system with interest-based recommendations and methods for use therewith

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920648A (zh) * 2018-07-03 2018-11-30 四川大学 一种基于音乐-图像语义关系的跨模态匹配方法
CN109151176A (zh) * 2018-07-25 2019-01-04 维沃移动通信有限公司 一种信息获取方法及终端
CN109063163A (zh) * 2018-08-14 2018-12-21 腾讯科技(深圳)有限公司 一种音乐推荐的方法、装置、终端设备和介质
CN111241385A (zh) * 2018-11-29 2020-06-05 北京京东尚科信息技术有限公司 信息处理方法、装置以及计算机系统和介质
CN111400605A (zh) * 2020-04-26 2020-07-10 Oppo广东移动通信有限公司 基于眼球追踪的推荐方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723939A (zh) * 2022-04-12 2022-07-08 国网四川省电力公司营销服务中心 基于注意力机制的非极大值抑制方法、系统、设备和介质
CN114723939B (zh) * 2022-04-12 2023-10-31 国网四川省电力公司营销服务中心 基于注意力机制的非极大值抑制方法、系统、设备和介质
CN114647721A (zh) * 2022-05-23 2022-06-21 风林科技(深圳)有限公司 教育智能机器人控制方法、设备及介质
CN114647721B (zh) * 2022-05-23 2022-09-16 风林科技(深圳)有限公司 教育智能机器人控制方法、设备及介质

Also Published As

Publication number Publication date
US20230206093A1 (en) 2023-06-29
EP4198772A4 (en) 2023-08-16
CN114930319A (zh) 2022-08-19
EP4198772A1 (en) 2023-06-21

Similar Documents

Publication Publication Date Title
US20180227482A1 (en) Scene-aware selection of filters and effects for visual digital media content
US8503770B2 (en) Information processing apparatus and method, and program
CN110688874B (zh) 人脸表情识别方法及其装置、可读存储介质和电子设备
US20130177296A1 (en) Generating metadata for user experiences
US20230360254A1 (en) Pose estimation method and related apparatus
CN110837750B (zh) 一种人脸质量评价方法与装置
CN108198130B (zh) 图像处理方法、装置、存储介质及电子设备
CN106559645B (zh) 基于摄像机的监控方法、系统和装置
TW201823983A (zh) 移動物件的虛擬訊息建立方法、搜尋方法與應用系統
CN105005777A (zh) 一种基于人脸的音视频推荐方法及系统
WO2021098338A1 (zh) 一种模型训练的方法、媒体信息合成的方法及相关装置
US20230206093A1 (en) Music recommendation method and apparatus
CN107392159A (zh) 一种面部专注度检测系统及方法
CN112101123B (zh) 一种注意力检测方法及装置
CN110865705A (zh) 多模态融合的通讯方法、装置、头戴设备及存储介质
CN112581627A (zh) 用于体积视频的用户控制的虚拟摄像机的系统和装置
CN111444826A (zh) 视频检测方法、装置、存储介质及计算机设备
CN112492297B (zh) 一种对视频的处理方法以及相关设备
KR20200132569A (ko) 특정 순간에 관한 사진 또는 동영상을 자동으로 촬영하는 디바이스 및 그 동작 방법
CN111836073B (zh) 视频清晰度的确定方法、装置、设备及存储介质
CN112069863A (zh) 一种面部特征的有效性判定方法及电子设备
CN111241926A (zh) 考勤与学情分析方法、系统、设备及可读存储介质
CN111444822B (zh) 对象识别方法和装置、存储介质和电子装置
CN112288876A (zh) 远距离ar识别服务器及系统
CN111385481A (zh) 图像处理方法及装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20950851

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020950851

Country of ref document: EP

Effective date: 20230315

NENP Non-entry into the national phase

Ref country code: DE