WO2020155490A1 - Procédé et appareil de gestion de musique sur la base d'une analyse de parole et dispositif informatique - Google Patents

Procédé et appareil de gestion de musique sur la base d'une analyse de parole et dispositif informatique Download PDF

Info

Publication number
WO2020155490A1
WO2020155490A1 PCT/CN2019/089117 CN2019089117W WO2020155490A1 WO 2020155490 A1 WO2020155490 A1 WO 2020155490A1 CN 2019089117 W CN2019089117 W CN 2019089117W WO 2020155490 A1 WO2020155490 A1 WO 2020155490A1
Authority
WO
WIPO (PCT)
Prior art keywords
wearing
value
music file
similarity
preset
Prior art date
Application number
PCT/CN2019/089117
Other languages
English (en)
Chinese (zh)
Inventor
李影
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155490A1 publication Critical patent/WO2020155490A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones

Definitions

  • This application relates to the field of price prediction, and in particular to a method, device and computer equipment for managing music based on voice analysis.
  • the main purpose of this application is to provide a method, device and computer equipment for managing music based on voice analysis that automatically recognizes the validity of instructions and then recognizes music.
  • this application proposes a method for managing music based on voice analysis, including:
  • the smart headset After the smart headset receives an instruction to collect audio clips from the user, it acquires the wearing value collected by the wearing sensor set on the smart headset, the wearing sensor is used to detect whether the user is wearing the smart headset;
  • the present application also provides a device for managing music based on voice analysis, which is characterized in that it includes:
  • the acquiring and wearing module is used for the smart earphone to obtain the wearing value collected by the wearing sensor set on the smart earphone after receiving the instruction of the user to collect the audio clip, the wearing sensor is used to detect whether the user is wearing the smart earphone ;
  • the wearing judgment module is used to judge whether the wearing value is within a preset wearing value range
  • a sound collection module configured to determine that the user wears the smart headset if the wearing value is within a preset wearing value range, and control the microphone to collect sound to obtain audio information
  • An extraction module for extracting the frequency spectrum and voiceprint information in the audio information
  • the gender judgment module is used to input the voiceprint information into a preset gender judgment model to obtain the gender type of the voiceprint information;
  • a matching calculation module configured to respectively calculate the similarity between the music file with the gender type tag in the preset server and the frequency spectrum to obtain a plurality of first similarity values
  • a determining module configured to use the music file corresponding to the largest first similarity value as the target music file
  • the download module is used to download the target music file to the memory of the smart headset.
  • the present application also provides a computer device including a memory and a processor, the memory stores computer readable instructions, and the processor implements the steps of the above method when the computer readable instructions are executed.
  • the present application also provides a computer non-volatile readable storage medium, on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the steps of the above method are implemented.
  • the method, device and computer equipment for managing music based on voice analysis of the present application automatically detect whether the smart earphone is in contact with the human body to determine whether the issued instruction is a misoperation, thereby reducing unnecessary music recognition.
  • the music file is automatically downloaded to the memory of the smart headset, which saves the user's download time; at the same time, it automatically recommends the same style as the music tag to the user according to the music tag, giving users a better experience.
  • recognizing music not only the music is recognized through the frequency spectrum, but also the lyrics of the music are reviewed to make the recognized music files more accurate. Send the downloaded music to the user's friends so that the user can share the music with their friends.
  • FIG. 1 is a schematic flowchart of a method for managing music based on voice analysis according to an embodiment of this application;
  • FIG. 2 is a schematic block diagram of the structure of an apparatus for managing music based on voice analysis according to an embodiment of the application;
  • FIG. 3 is a schematic block diagram of the structure of an apparatus for managing music based on voice analysis according to an embodiment of the application;
  • FIG. 4 is a schematic block diagram of the structure of an apparatus for managing music based on voice analysis according to an embodiment of the application;
  • FIG. 5 is a schematic block diagram of the structure of an apparatus for managing music based on voice analysis according to an embodiment of the application;
  • FIG. 6 is a schematic block diagram of the structure of an apparatus for managing music based on voice analysis according to an embodiment of the application;
  • FIG. 7 is a schematic block diagram of the structure of an apparatus for managing music based on voice analysis according to an embodiment of the application;
  • FIG. 8 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • an embodiment of the present application provides a method for managing music based on voice analysis, including the steps:
  • the smart earphone obtains the wearing value collected by the wearing sensor set on the smart earphone after receiving the instruction to collect the audio clip from the user, and the wearing sensor is used to detect whether the user is wearing the smart earphone;
  • the smart headset is based on a normal headset, and is also loaded with smart hardware such as a memory, a communication module, a processor, and a microphone.
  • the smart headset is equipped with input devices such as buttons and sensors.
  • the smart headset controls the wearing value collected by the wearing sensor on the smart headset.
  • the wearing sensor detects whether the smart headset is compatible with The sensor that the user touches and wears, because when the user wears the smart earphone and the user does not wear the smart earphone, the smart earphone is in a different position state. According to the different sensors set on the smart earphone, the value of the different position state can be detected. Determine whether the user wears the smart headset.
  • the wearing sensor collects the degree of contact with the user to obtain the wearing value, and then feeds the obtained wearing value back to the smart headset.
  • the smart headset After the smart headset receives the wearing value collected by the wearing sensor, it compares it with the preset wearing value range to determine whether the collected wearing value is within this range. If so, it is determined that the headset is worn by the user. Therefore, it is determined that the command to collect the audio clip received by the smart headset is issued by the user, and it is not a misoperation accidentally encountered.
  • the wearing value range is set by the user according to his physical fitness, the specific type of wearing sensor, and the specific position of the wearing sensor on the smart headset.
  • the smart headset confirms that the instruction to collect audio clips is not a misoperation but a user operation, and then controls the microphone Start to collect surrounding sounds, and form audio information from the sounds collected by the microphone.
  • the smart headset controls the microphone to continuously collect sound for 10 seconds to obtain audio information with a duration of 10 seconds.
  • the smart headset first preprocesses audio information, including windowing the audio information to obtain an audio fragment sequence, and then performs fast Fourier transform on each frame in the audio fragment sequence to obtain a time series spectrum set . Then, perform Christ extraction on the above-mentioned time series spectrum set to obtain a fundamental frequency sequence, which is actually a function of the pitch of the above-mentioned audio information changing at any time. Then the base frequency sequence is converted into sound names to obtain the sound name sequence. Finally, the note-name sequence is divided into notes, and finally the frequency spectrum of the audio information is obtained.
  • the frequency spectrum extraction methods described in other publications can also be used to extract the frequency spectrum of the audio information in this application.
  • Voiceprint is a sound wave spectrum that carries verbal information displayed by electroacoustic instruments. Modern scientific research shows that voiceprint is not only specific, but also relatively stable.
  • the audio information includes the sound made by people singing.
  • everybody's voice is different, and the corresponding voiceprint is also different.
  • the smart headset divides the audio signal into frames, extracts the acoustic features of the voice, and processes the acoustic features of the voice to calculate the voiceprint information.
  • the voiceprint information of the singer is included in the audio information.
  • the voiceprint information is input into the gender judgment model, and the gender judgment model outputs the gender type of the voiceprint information. Because of the obvious difference between male voice and female voice.
  • the gender type of the gender judgment model includes male, female, and neutral. Because the voice of some singing people is not good at distinguishing gender.
  • the music with the gender type label of the voiceprint information The files are filtered out, and the similarity calculation is performed only with the filtered music files, which reduces the calculation objects and improves the calculation speed.
  • the smart earphone obtains a first similarity value after calculating the similarity between the frequency spectrum in the audio information and a music file with the above gender tag, and calculates multiple times to obtain the first similarity value corresponding to multiple music files .
  • the preset server is a server preset by the staff for storing music files.
  • the music file corresponding to the highest first similarity value is determined as the target music file.
  • the smart earphone accesses the cloud through the communication module, and compares the frequency spectrum of the audio information with the frequency spectrum of the music file in the cloud server to see whether the frequency spectrum of the audio information is the same as the frequency spectrum of the above-mentioned music file or is part of the frequency spectrum of the music file. If yes, it is determined that the music file is the target music file, and the target music file information is obtained at the same time, and the target music file is played.
  • a dialog box whether to download the target music file is generated for the user to choose whether to download.
  • the smart headset downloads the target music file from the cloud and stores it in the smart headset's memory. It is convenient for users to collect this music.
  • the audio information includes the first lyrics
  • the target music file includes the second lyrics text
  • the method before the step of downloading the target music file to the memory of the smart headset, the method includes:
  • S801 Parse out the first lyrics text corresponding to the first lyrics in the audio information, and obtain the second lyrics text of the target music file;
  • the audio information contains the first lyrics, that is, in the audio information, a person is singing and uttering text.
  • the smart headset obtains the audio information, performs semantic analysis on the audio information, and recognizes the text therein, that is, the first lyrics text.
  • the music file includes various music-related information such as music audio, artist, album name, song title, and lyrics text.
  • the smart earphone reads the second lyric text in the target music file, and calculates the similarity between the first lyric text and the second lyric text
  • the matching calculation method is: Calculate the first lyric text and the second lyric text To obtain the second similarity value; the specific calculation method is to use the degree of coincidence between the characters in the first lyrics text and the characters in the second lyrics text, if all the characters in the first lyrics text and the second lyrics text If some characters or all characters are completely overlapped, the similarity between the two is 100%.
  • the similarity threshold is a critical value preset by the staff for determining whether the first lyric text belongs to the second lyric text.
  • the size of the set similarity threshold is based on the success rate of converting audio information into text in the semantic analysis technology. Determine whether the second similarity value is higher than the preset similarity threshold. If the second similarity value is higher than the similarity threshold, it means that the first lyrics text is exactly the same as the second lyrics text or the first lyrics text and the second lyrics text If most of the text is the same, it is determined that the first lyric text matches the second lyric text, and it is further determined that the frequency spectrum in the audio information is the music in the target music file. Thus, an instruction to download the target music file is generated.
  • the above-mentioned wearing sensor is a contact temperature sensor that is arranged on the smart earphone to contact a person's ear, the wearing value is a temperature value, and the wearing value range is a temperature value range;
  • the steps for whether the wearing value is within the preset wearing value range include:
  • the contact temperature sensor is used to detect the temperature of the object in direct contact with it. It is installed at the speaker of the smart headset. When the user wears the smart headset, the contact temperature sensor contacts the user's headset or head. , The temperature value of the head position that is in contact with the user is collected; when the user is not wearing the smart headset, the contact temperature sensor does not touch the human body, is in contact with the air or is in contact with other objects, and collects wrong signals or other The temperature value of the object.
  • the temperature value range is set by the user according to his physical fitness and the normal temperature value of the location in contact with the user. When the user does not set it, the smart headset automatically accesses the corresponding server through the above-mentioned communication module to obtain the normal human body temperature value range.
  • the wearing sensor may also be a pressure sensor, a distance sensor, etc., which are arranged at the smart earphone in contact with the human body.
  • the collection source that is, the specific type of the wearing sensor, can be obtained.
  • the wearing value is a temperature value
  • the collection source is a temperature sensor, that is, a contact temperature sensor set on the earphone in contact with the human earphone.
  • the temperature value range of the contact temperature sensor is called from the memory, and the temperature value range is used as the wearing value range.
  • compare the aforementioned wearing value with the wearing value range to see whether the wearing value is within the wearing value range, and if so, it is determined that the user is wearing the smart headset.
  • the method before the step of inputting the voiceprint information into a preset gender judgment model to obtain the gender type of the voiceprint information, the method includes:
  • S501 Input multiple sample voiceprint information and genders corresponding to the sample voiceprint information into a neural network model, and perform training to obtain the gender judgment model.
  • the neural network when the gender judgment model is trained, the neural network is used as the basic model, a plurality of pre-collected sample voiceprint information is input into the neural network, and the gender of the sample voiceprint information As the result of the output.
  • the sample voiceprint information of all men can be input into the neural network first, and the output result is all men.
  • the sample voiceprint information of all women into the neural network, and the output results are all women.
  • the neural network model is based on the sample voiceprint information of men and women, and obtains the parameters of men and women respectively, and correspondingly obtains the gender judgment model based on the neural network model.
  • the method includes:
  • the contact information is searched in the memory.
  • the contact information is the contact phone number or mailbox of the user's close friend.
  • the smart headset sends the download link to the aforementioned mailbox or mobile phone through the communication module. , So that users can share their favorite music with others.
  • the method includes:
  • the user likes a piece of music, and the corresponding user also wants to listen to other music associated with this piece of music.
  • each piece of music will be in an album, and the information in each music file also includes album information.
  • the smart headset reads the album information in the target music file, then accesses the cloud to find other music files with the album information, and then downloads other music files to the smart headset’s memory, directly downloads the music that the user likes, and gives the user more information. Good service experience.
  • the above step of playing the target music file includes:
  • the specific information of the target music file is obtained from the server, including pulse code modulation information (ie, PCM) of the target music file. Then process the PCM to get the decibel value of the target music file.
  • the specific calculation process is: first filter and amplify the PCM, then divide the frame, sample each frame, and then accumulate the value of each sampling point to obtain a total value, and then divide the total value by the number of samples Count to get the average energy value of the sound. Then the average energy value is quantized in a ratio between 100 and 32767 to obtain a quantized value of 1-100, which is the decibel value of the target music file output.
  • the decibel threshold preset by the user is called, and the decibel threshold is the most comfortable and most accustomed sound level of the music that the user sets according to his own preferences.
  • the smart headset obtains the rated power of the speaker of the smart headset at the same time, multiplying the decibel threshold by the rated power, and then dividing by the decibel value to obtain the output power of the speaker of the smart headset. That is, playing the target music file with the output power can make the loudspeaker output the target music file to be the decibel threshold. Then control the speaker to play the target music file with the output power. Because the decibel value of each music file is different, the sound played out is very different.
  • the decibel value of the second music file is twice the decibel value of the first music file, and the first music file switches to the first music file.
  • the power of the smart headset remains unchanged. This will cause the user to suddenly receive very high decibel music, which will damage the user’s ears. Therefore, before playing the target music file, calculate the target music first According to the decibel value of the file, the output power of the smart headset is adjusted according to the decibel value and the decibel threshold set by the user, so that the size of the sound played by the target music file meets the needs of the user and brings a good experience effect to the user.
  • the method for managing music based on voice analysis of the present application automatically detects whether the smart headset is in contact with the human body to determine whether the issued instruction is a misoperation, thereby reducing unnecessary music recognition.
  • the music file is automatically downloaded to the memory of the smart headset, which saves the user's download time; at the same time, it automatically recommends the same style as the music tag to the user according to the music tag, giving users a better experience.
  • recognizing music not only the music is recognized through the frequency spectrum, but also the lyrics of the music are reviewed to make the recognized music files more accurate. Send the downloaded music to the user's friends so that the user can share the music with their friends.
  • an embodiment of the present application also provides an apparatus for managing music based on voice analysis, including:
  • the acquiring and wearing module 1 is used for the smart earphone to obtain the wearing value collected by the wearing sensor set on the smart earphone after receiving the instruction of the user to collect the audio clip.
  • the wearing sensor is used to detect whether the user is wearing the smart earphone or not. headset;
  • the judgment wearing module 2 is used to judge whether the wearing value is within a preset wearing value range
  • the sound collection module 3 is configured to determine that the user wears the smart headset if the wearing value is within the preset wearing value range, and collect sound to obtain audio information;
  • the extraction module 4 is used to extract the frequency spectrum and voiceprint information in the audio information
  • the gender judgment module 5 is used to input the voiceprint information into a preset gender judgment model to obtain the gender type of the voiceprint information;
  • the matching calculation module 6 is configured to respectively calculate the similarity between the music file with the gender type tag in the preset server and the frequency spectrum to obtain multiple first similarity values;
  • the determining module 7 is configured to use the music file corresponding to the largest first similarity value as the target music file, and play the target music file;
  • the download module 8 is configured to receive a download instruction sent by the user to download the target music file.
  • the above-mentioned apparatus for managing music based on voice analysis further includes:
  • Lyrics parsing module 801 configured to parse the first lyric text corresponding to the first lyric in the audio information, and obtain the second lyric text of the target music file;
  • the similarity calculation module 802 is configured to calculate the similarity between the first lyric text and the second lyric text to obtain a second similarity value
  • the similarity determining module 803 is configured to determine whether the second similarity value is higher than a preset similarity threshold
  • the generating instruction module 804 is configured to generate an instruction to download the target music file if the second similarity value is higher than a preset similarity threshold.
  • the above-mentioned wearing sensor is a contact temperature sensor arranged on the smart earphone to contact a person's ear, the wearing value is a temperature value, and the wearing value range is a temperature value range;
  • Module 2 includes:
  • the first judging unit is used to judge the collection source of the wearing value
  • a determining unit configured to, if it is determined that the collection source is the contact temperature sensor, call a preset temperature value range from a memory, and use the temperature value range as the wearing value range;
  • the second judgment unit is used to judge whether the wearing value is within the temperature value range
  • the determining unit is configured to determine that the user wears the smart headset if the wearing value is within the temperature value range.
  • the aforementioned apparatus for managing music based on voice analysis further includes:
  • the training module 501 is configured to input multiple sample voiceprint information and the gender corresponding to the sample voiceprint information into the neural network model for training to obtain the gender judgment model.
  • the foregoing apparatus for managing music based on voice analysis further includes:
  • the sending module 9 is used to send the download link of the target music file to a designated contact person.
  • the above-mentioned apparatus for managing music based on voice analysis further includes:
  • the storage module 10 is configured to download other music files in the album where the target music file is located to the memory.
  • the above determination module 7 includes:
  • a calculation unit configured to calculate the pulse code modulation information to obtain the decibel value of the target music file
  • An adjustment unit configured to adjust the output power of the smart headset according to the decibel value and a preset decibel threshold
  • the playing unit is used to play the target music file with the output power.
  • the foregoing calculation unit includes:
  • the framing subunit is used for filtering and amplifying the pulse code modulation information, and then framing;
  • the sampling subunit is used to sample each frame and accumulate the values obtained from each sample to obtain the total value of the pulse code modulation information
  • a calculation subunit configured to divide the total value by the number of frames to obtain the average energy value of the sound corresponding to the pulse code modulation information
  • the quantization subunit is used to quantize the average energy value to obtain the decibel value output by the target music file.
  • the device for managing music based on voice analysis of the present application automatically detects whether the smart headset is in contact with the human body to determine whether the issued instruction is a misoperation, thereby reducing unnecessary music recognition.
  • the music file is automatically downloaded to the memory of the smart headset, which saves the user's download time; at the same time, it automatically recommends the same style as the music tag to the user according to the music tag, giving users a better experience.
  • recognizing music not only the music is recognized through the frequency spectrum, but also the lyrics of the music are reviewed to make the recognized music files more accurate. Send the downloaded music to the user's friends so that the user can share the music with their friends.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 8.
  • the computer equipment includes a processor, a memory, a network interface and a database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the computer equipment database is used to store audio information, music files and other data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a method of managing music based on voice analysis.
  • the above-mentioned processor executes the steps of the above-mentioned method for managing music based on voice analysis: after the smart earphone receives an instruction from the user to collect audio clips, it acquires the wearing value collected by the wearing sensor set on the smart earphone, and the wearing sensor Used to detect whether the user wears the smart headset; determine whether the wearing value is within the preset wearing value range; if so, determine whether the user wears the smart headset, and collect sound to obtain audio information; extract the audio information The frequency spectrum and voiceprint information in the file; input the voiceprint information into a preset gender judgment model to obtain the gender type of the voiceprint information; respectively calculate the music with the gender type tag in the preset server The similarity between the file and the frequency spectrum obtains multiple first similarity values; the music file corresponding to the largest first similarity value is used as the target music file, and the target music file is played; Download instruction to download the target music file.
  • the audio information includes the first lyrics
  • the target music file includes the second lyrics text
  • the method includes: The first lyric text corresponding to the first lyric in the audio information is parsed, and the second lyric text of the target music file is obtained; the similarity between the first lyric text and the second lyric text is calculated to obtain A second similarity value; it is determined whether the second similarity value is higher than a preset similarity threshold; if so, an instruction to download the target music file is generated.
  • the above-mentioned wearing sensor is a contact temperature sensor that is arranged on the smart earphone to contact a person's ear, the above-mentioned wearing value is a temperature value, and the above-mentioned wearing value range is a temperature value range;
  • the step of determining whether the value is within the preset wearing value range includes: judging the collection source of the wearing value; if it is determined that the collection source is the contact temperature sensor, calling the preset temperature value range from the memory to The temperature value range is used as the wearing value range; it is determined whether the wearing value is within the temperature value range; if the wearing value is within the temperature value range, it is determined that the user is wearing the smart headset.
  • the processor before the processor executes the step of inputting the voiceprint information into a preset gender judgment model to obtain the gender type of the voiceprint information, it includes: combining a plurality of sample voiceprint information and The gender corresponding to the sample voiceprint information is input into the neural network model for training to obtain the gender judgment model.
  • the above-mentioned processor executes the above-mentioned step of receiving the download instruction sent by the user to download the target music file, it includes: sending the download link of the target music file to a designated contact.
  • the above-mentioned processor executes the above-mentioned step of receiving the download instruction sent by the user to download the target music file, it includes: downloading other music files in the album where the target music file is located to all Mentioned in memory.
  • the above-mentioned processor executing the above-mentioned step of playing the target music file includes: obtaining pulse code modulation information of the target music file; calculating the pulse code modulation information to obtain the target music file Adjust the output power of the smart headset according to the decibel value and a preset decibel threshold; use the output power to play the target music file.
  • the computer device of the present application automatically detects whether the smart headset is in contact with the human body to determine whether the issued instruction is a misoperation, thereby reducing unnecessary music recognition.
  • the music file is automatically downloaded to the memory of the smart headset, which saves the user's download time; at the same time, it automatically recommends the same style as the music tag to the user according to the music tag, giving users a better experience.
  • recognizing music not only the music is recognized through the frequency spectrum, but also the lyrics of the music are reviewed to make the recognized music files more accurate. Send the downloaded music to the user's friends so that the user can share the music with their friends.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 8.
  • the computer equipment includes a processor, a memory, a network interface and a database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the memory provides an environment for the operation of the operating system and computer readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as carbon futures price prediction models.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instruction executes the process of the above-mentioned method embodiment.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application further provides a computer non-volatile readable storage medium, on which computer readable instructions are stored, and when the computer readable instructions are executed, the processes as in the foregoing method embodiments are executed.
  • a computer non-volatile readable storage medium on which computer readable instructions are stored, and when the computer readable instructions are executed, the processes as in the foregoing method embodiments are executed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil permettant de gérer de la musique sur la base d'une analyse de parole, ainsi qu'un dispositif informatique. Le procédé comprend les étapes suivantes : après réception d'une instruction, envoyée par un utilisateur, pour collecter un clip audio, un écouteur intelligent acquiert une valeur de port collectée par un capteur de port disposé sur l'écouteur intelligent (S1) ; déterminer si la valeur de port se trouve dans une plage de valeurs de port prédéfinie (S2) ; si tel est le cas, déterminer que l'utilisateur porte l'écouteur intelligent et collecter des sons pour obtenir des informations audio (S3) ; extraire un spectre de fréquences et des informations d'empreinte vocale dans les informations audio (S4) ; entrer les informations d'empreinte vocale dans un modèle de détermination de sexe pour obtenir le type de sexe des informations d'empreinte vocale (S5) ; calculer respectivement la similarité entre un fichier musical avec une étiquette de type de sexe dans un serveur et le spectre de fréquence pour obtenir de multiples premières valeurs de similarité (S6) ; utiliser un fichier musical correspondant à la première valeur de similarité maximale en tant que fichier musical cible et lire le fichier musical cible (S7) ; et recevoir une instruction de téléchargement envoyée par l'utilisateur pour télécharger le fichier musical cible (S8).
PCT/CN2019/089117 2019-01-31 2019-05-29 Procédé et appareil de gestion de musique sur la base d'une analyse de parole et dispositif informatique WO2020155490A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910100398.9A CN109785859B (zh) 2019-01-31 2019-01-31 基于语音分析的管理音乐的方法、装置和计算机设备
CN201910100398.9 2019-01-31

Publications (1)

Publication Number Publication Date
WO2020155490A1 true WO2020155490A1 (fr) 2020-08-06

Family

ID=66503021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089117 WO2020155490A1 (fr) 2019-01-31 2019-05-29 Procédé et appareil de gestion de musique sur la base d'une analyse de parole et dispositif informatique

Country Status (2)

Country Link
CN (1) CN109785859B (fr)
WO (1) WO2020155490A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291767A (zh) * 2020-10-28 2021-01-29 广东美她实业投资有限公司 基于智能蓝牙耳机的外卖下单方法、设备及可读存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785859B (zh) * 2019-01-31 2024-02-02 平安科技(深圳)有限公司 基于语音分析的管理音乐的方法、装置和计算机设备
CN112102848B (zh) * 2019-06-17 2024-04-26 华为技术有限公司 一种用于识别音乐的方法、芯片和终端
CN110246505A (zh) * 2019-06-24 2019-09-17 付金龙 声波控制灯带闪动的方法、系统及电子设备
CN110362711A (zh) * 2019-06-28 2019-10-22 北京小米智能科技有限公司 歌曲推荐方法及装置
CN111064846A (zh) * 2019-12-13 2020-04-24 歌尔科技有限公司 一种头戴式设备及语音秘书的设置方法、装置
CN111125432B (zh) * 2019-12-25 2023-07-11 重庆能投渝新能源有限公司石壕煤矿 一种视频匹配方法及基于该方法的培训快速匹配系统
CN111368136A (zh) * 2020-03-31 2020-07-03 北京达佳互联信息技术有限公司 歌曲识别方法、装置、电子设备及存储介质
CN111488485B (zh) * 2020-04-16 2023-11-17 北京雷石天地电子技术有限公司 基于卷积神经网络的音乐推荐方法、存储介质和电子装置
CN111768782A (zh) * 2020-06-30 2020-10-13 广州酷狗计算机科技有限公司 音频识别方法、装置、终端及存储介质
CN113518202A (zh) * 2021-04-07 2021-10-19 华北电力大学扬中智能电气研究中心 一种安防监控方法、装置、电子设备及存储介质
CN113380249A (zh) * 2021-06-11 2021-09-10 北京声智科技有限公司 语音控制方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1785891A1 (fr) * 2005-11-09 2007-05-16 Sony Deutschland GmbH Récupération d'informations musicales au moyen d'un algorithme de recherche tridimensionnel
CN104657438A (zh) * 2015-02-02 2015-05-27 联想(北京)有限公司 信息处理方法及电子设备
CN105338447A (zh) * 2015-10-19 2016-02-17 京东方科技集团股份有限公司 耳机控制电路及方法、耳机以及音频输出装置及方法
CN108737872A (zh) * 2018-06-08 2018-11-02 百度在线网络技术(北京)有限公司 用于输出信息的方法和装置
CN109145148A (zh) * 2017-06-28 2019-01-04 百度在线网络技术(北京)有限公司 信息处理方法和装置
CN109785859A (zh) * 2019-01-31 2019-05-21 平安科技(深圳)有限公司 基于语音分析的管理音乐的方法、装置和计算机设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3300386A4 (fr) * 2015-05-19 2018-12-05 Oh, Young Gwun Écouteur bluetooth et montre le comprenant
CN108391206A (zh) * 2018-03-30 2018-08-10 广东欧珀移动通信有限公司 信号处理方法、装置、终端、耳机及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1785891A1 (fr) * 2005-11-09 2007-05-16 Sony Deutschland GmbH Récupération d'informations musicales au moyen d'un algorithme de recherche tridimensionnel
CN104657438A (zh) * 2015-02-02 2015-05-27 联想(北京)有限公司 信息处理方法及电子设备
CN105338447A (zh) * 2015-10-19 2016-02-17 京东方科技集团股份有限公司 耳机控制电路及方法、耳机以及音频输出装置及方法
CN109145148A (zh) * 2017-06-28 2019-01-04 百度在线网络技术(北京)有限公司 信息处理方法和装置
CN108737872A (zh) * 2018-06-08 2018-11-02 百度在线网络技术(北京)有限公司 用于输出信息的方法和装置
CN109785859A (zh) * 2019-01-31 2019-05-21 平安科技(深圳)有限公司 基于语音分析的管理音乐的方法、装置和计算机设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291767A (zh) * 2020-10-28 2021-01-29 广东美她实业投资有限公司 基于智能蓝牙耳机的外卖下单方法、设备及可读存储介质

Also Published As

Publication number Publication date
CN109785859A (zh) 2019-05-21
CN109785859B (zh) 2024-02-02

Similar Documents

Publication Publication Date Title
WO2020155490A1 (fr) Procédé et appareil de gestion de musique sur la base d'une analyse de parole et dispositif informatique
JP6113302B2 (ja) 音声データの伝送方法及び装置
CN108847215B (zh) 基于用户音色进行语音合成的方法及装置
AU2016277548A1 (en) A smart home control method based on emotion recognition and the system thereof
CN102404278A (zh) 一种基于声纹识别的点歌系统及其应用方法
KR20090108643A (ko) 네트워크 휴대용 장치에서의 특징 추출
WO2017084327A1 (fr) Procédé permettant d'ajouter un compte, un terminal, un serveur et un support de stockage informatique
CN112992109B (zh) 辅助歌唱系统、辅助歌唱方法及其非瞬时计算机可读取记录媒体
CN110675886A (zh) 音频信号处理方法、装置、电子设备及存储介质
CN111105796A (zh) 无线耳机控制装置及控制方法、语音控制设置方法和系统
WO2019233361A1 (fr) Procédé et dispositif de réglage de volume de musique
US20160034247A1 (en) Extending Content Sources
WO2014173325A1 (fr) Procédé et dispositif de reconnaissance de gutturophonie
CN109346057A (zh) 一种智能儿童玩具的语音处理系统
CN112216294A (zh) 音频处理方法、装置、电子设备及存储介质
JP5598516B2 (ja) カラオケ用音声合成システム,及びパラメータ抽出装置
CN110889008B (zh) 一种音乐推荐方法、装置、计算装置和存储介质
JP5428458B2 (ja) 評価装置
CN112908302B (zh) 一种音频处理方法、装置、设备及可读存储介质
WO2020043110A1 (fr) Procédé de traitement vocal, dispositif d'information et produit-programme d'ordinateur
CN108682423A (zh) 一种语音识别方法和装置
JP2019101148A (ja) 通信カラオケシステム
CN114664303A (zh) 连续语音指令快速识别控制系统
WO2022041177A1 (fr) Procédé de traitement de message de communication, dispositif et client de messagerie instantanée
CN111986657A (zh) 音频识别方法和装置、录音终端及服务器、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19912346

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19912346

Country of ref document: EP

Kind code of ref document: A1