CN118379978A

CN118379978A - Online Karaoke method, system and storage medium based on intelligent sound equipment

Info

Publication number: CN118379978A
Application number: CN202410635017.8A
Authority: CN
Inventors: 刘伟; 张倩; 李本江
Original assignee: Guangdong Taide Zhilian Technology Co ltd
Current assignee: Guangdong Taide Zhilian Technology Co ltd
Priority date: 2024-05-22
Filing date: 2024-05-22
Publication date: 2024-07-23

Abstract

The invention relates to the field of intelligent sound, and particularly provides an online karaoke method, an online karaoke system and a storage medium based on intelligent sound. The method comprises the steps of: receiving song requesting information of a user through a microphone; the song requesting information comprises voice song requesting information and trigger song requesting information; inputting song requesting information into an intelligent sound box to generate a song playing and reading instruction; calling in a song storage module through a song playing and reading instruction, and playing target song accompaniment through an intelligent sound box; acquiring K song data of a user through a microphone, and carrying out audio beautification on the K song data through a K song processing module to generate initial sound; and synthesizing the initial sound and the target song accompaniment to generate a beautifying sound, and playing and storing the beautifying sound through the intelligent sound box. The user of the invention enjoys KTV service without place limitation, and can optimize the ensemble accuracy of accompaniment and user sound.

Description

Online Karaoke method, system and storage medium based on intelligent sound equipment

Technical Field

The invention relates to the technical field of intelligent sound boxes, in particular to an online karaoke method, an online karaoke system and a storage medium based on intelligent sound.

Background

Traditional KTV entertainment mode usually needs people to go to entity KTV store, uses equipment such as microphone on-site singing, then records by the staff and uploads to network platform for other people to appreciate. This mode has many inconveniences such as limited places, high cost, limited numbers of participants, etc. With the development of technology, the application of intelligent audio equipment improves the entertainment mode. The user can realize the demand of singing at any time and any place through online K sings of intelligent stereo set.

In the wireless chorus method, storage medium, control device and K singing treasures for preventing howling of patent CN201811576311.7, a judgment of current optimized sound information is provided, if the current optimized sound information is greater than preset sound intensity reference value information, the current optimized sound information is processed to reduce the sound intensity until the current optimized sound information is less than the sound intensity reference value information, which means that at this moment, the cyclic amplification is performed anyway, the amplification gain formed relatively is less than 1, so that the situation of wireless amplification is always close to 1 in the amplification process, and the effect of beautifying the sound is realized.

However, it can only optimize the sound intensity, and it is not possible to beautify the accompaniment accuracy of accompaniment and user's sound and tone quality, and thus its beautifying effect is not good.

Disclosure of Invention

The invention provides an online karaoke method, system and storage medium based on intelligent sound, which are used for solving the following problems:

(1) And performing singing on site by using equipment such as a microphone, and recording and uploading the singing by a staff to a network platform for enjoyment by other people. This mode has many inconveniences such as limited places, high cost, limited numbers of participants, etc.

(2) The patent CN201811576311.7 can only optimize the sound intensity, and can not beautify the accompaniment and tone quality, and the effect and technique problems are solved.

In a first aspect, the present invention provides an online karaoke method based on intelligent sound, including an intelligent sound box, a microphone, a song storage module and a karaoke processing module, the method comprising:

receiving song requesting information of a user through a microphone; the song requesting information comprises voice song requesting information and trigger song requesting information;

inputting song requesting information into an intelligent sound box to generate a song playing and reading instruction;

calling a target song in the song storage module through the song playing and reading instruction, and playing target song accompaniment through the intelligent sound box;

acquiring K song data of a user through a microphone, and carrying out audio beautification on the K song data through a K song processing module to generate initial sound;

and synthesizing the initial sound and the target song accompaniment to generate a beautifying sound, and playing and storing the beautifying sound through the intelligent sound box.

With reference to the first aspect, the audio beautification includes:

beautification of loudness based on background noise removal;

Tone beautification based on baseline fluctuation tuning;

Tone beautification based on harmonic equalization;

frequency optimization based on frequency response smoothing.

With reference to the first aspect, the receiving, by the microphone, song requesting information of the user includes:

Presetting a first trigger response and a second trigger response; wherein,

The first trigger response is a key trigger response, and the second trigger response is a voice trigger response;

generating a song requesting interface on a preset terminal according to the key triggering response, converting the song storage module into a song list, generating a triggering mechanism of the song list, and acquiring song information through the triggering mechanism;

Determining corresponding song content information in the voice according to the voice trigger response;

performing user intention recognition processing on song content information through a user intention recognition model to acquire song keywords;

and according to the song keywords, performing song matching in a song storage module, and determining song requesting information.

With reference to the first aspect, the step of calling the target song in the song storage module by playing the reading instruction through the song includes the following steps:

Receiving a song playing and reading instruction, and determining a calling script for calling songs in a song storage module;

The song storage module is internally provided with a song database and an index storage library, and the index mapping library is an address calling database of the song database;

character splitting is carried out on the song information by calling a script, and key characters are obtained;

Identifying a song type corresponding to the key character, and determining a target storage area of the song in a song database based on the song type;

In the target storage area, inquiring target characters corresponding to the key characters, and calling target songs corresponding to the target characters, wherein the calling of the target songs corresponding to the target characters comprises the following steps:

inquiring target characters corresponding to the key characters in an index mapping library;

if the target characters exist in the index mapping library, determining the calling address of the target song, and calling the target song;

If the index mapping library does not have the target character, sending out non-target song prompt information through the intelligent sound box.

With reference to the first aspect, the playing, by the intelligent sound box, of the target song accompaniment includes:

configuring an audio cutting algorithm in the intelligent sound box, and acquiring original audio of a target song through the intelligent sound box;

according to the original audio, voiceprint analysis is carried out, and accompaniment audio and voice audio are determined;

and carrying out multipath audio splitting on the accompaniment audio and the human voice audio through an audio cutting algorithm to generate the accompaniment audio and the human voice audio, and respectively guiding the accompaniment audio and the human voice audio into the intelligent sound box.

With reference to the first aspect, the audio beautification further includes:

acquiring K song data of a user and taking the K song data as K song audio to be processed;

Performing first comparison on the K song audio and the human voice audio through a preset audio comparison model to obtain first audio difference data, and performing second comparison on the K song audio and the accompaniment audio to obtain second audio difference data; wherein,

The first audio difference data beautifies the difference data;

the second audio difference data is human voice difference data;

content extraction is carried out on the first audio difference data and the second audio difference data through a preset content encoder, and beautification audio points are determined;

And carrying out beautification processing on beautified audio points through a preset timbre encoder, a loudness encoder, a tone encoder and a frequency optimizer to generate initial sound.

With reference to the first aspect, the audio beautification further includes:

according to the K song data, a synchronous detection mechanism based on the voice audio and the accompaniment audio at each moment is established;

Judging the sound level of each moment according to the synchronous detection mechanism, and recording the sound level through an audio frame window;

Judging the voice audio frequency in the K song data and the voice delay frame number of the target song according to the voice record;

and synchronously compensating the voice audio of the user according to the voice delay frame number.

With reference to the first aspect, the synthesizing the initial sound and the target song accompaniment to generate the beautification sound includes:

respectively constructing a first audio sequence based on initial sound and a second audio sequence of target song accompaniment on a time axis;

determining a target synthesis order of the beautified sound according to the first audio sequence;

and synthesizing the first audio sequence and the second audio sequence according to the target synthesis sequence to generate beautification sounds.

In a second aspect, the present invention provides an online karaoke system based on intelligent audio, including:

the microphone is used for receiving song requesting information of a user; the song requesting information comprises voice song requesting information and trigger song requesting information;

The intelligent sound box is internally provided with a song storage module and a K song processing module: the song storage module is used for acquiring a song playing and reading instruction of the intelligent sound box, calling the song and playing target song accompaniment through the intelligent sound box;

And the K song processing module is used for acquiring K song data of a user through the microphone, carrying out audio beautification on the K song data through the K song processing module, generating initial sound, synthesizing the initial sound and target song accompaniment, generating beautification sound, and playing and storing through the intelligent sound box.

In a third aspect, a non-transitory storage medium stores a computer program comprising program code means adapted to perform the method described above when the program is run on a data processing device.

The invention has the beneficial effects that:

The user can enjoy professional KTV service at home without going out and waiting in line. Convenience is provided for users when the users work busy or are bad in weather at ordinary times. Meanwhile, the application of the intelligent sound equipment reduces the use cost of the traditional KTV equipment, so that the KTV service cost is lower. The online singing method based on the intelligent sound does not need the user to purchase extra hardware.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of a method for online karaoke based on intelligent sound in an embodiment of the invention;

FIG. 2 is a flowchart for collecting and identifying song information according to an embodiment of the present invention;

fig. 3 is a system execution diagram of an online K song system based on intelligent sound according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

The invention provides an online karaoke method based on intelligent sound, which comprises an intelligent sound box, a microphone, a song storage module and a karaoke processing module, wherein the method comprises the following steps:

Receiving song requesting information of a user through a microphone; wherein,

The song requesting information comprises voice song requesting information and trigger type song requesting information;

Calling in a song storage module through a song playing and reading instruction, and playing target song accompaniment through an intelligent sound box;

The principle of the technical scheme is as follows:

As shown in figure 1, the invention aims to overcome the defects of the existing intelligent sound online karaoke method and provides a novel intelligent sound online karaoke method.

In a specific implementation, first, song requesting information of a user is received through a microphone. The song requesting information comprises voice song requesting information and trigger type song requesting information. The voice song requesting information is that a user directly speaks the name or lyrics of a song to be singed to a microphone, and the triggering song requesting information is that the user presses a button on the intelligent sound equipment to select the song to be singed. And then inputting the song requesting information into the intelligent sound box. After receiving the song requesting information, the intelligent sound box generates a corresponding song playing and reading instruction. The song playing and reading instruction is used for calling corresponding songs in the song storage module, so that the intelligent sound equipment can play out target songs. And calling the target song accompaniment in the song storage module through the song playing and reading instruction, and playing the target song accompaniment through the intelligent sound box. The intelligent sound box and the intelligent sound box are integrally designed, so that a user can hear song accompaniment and can input own singing voice through the microphone. After K song data of the user is input through the microphone, sound data of the user are determined, and an audio frequency spectrum with beautified audio frequency is generated. And then carrying out audio beautification on the Karaoke data through the Karaoke processing module to generate initial sound. Audio beautification is the processing of the user's voice data to make it more pleasant to listen to. And finally, synthesizing the initial sound and the target song accompaniment to generate a beautifying sound, and playing and storing the beautifying sound through the intelligent sound box. The method aims at fusing the voice data of the user with the accompaniment of the target song to form a final K song effect. The user can hear a perfect combination of his own voice and accompaniment. Meanwhile, the intelligent sound box can store and play the beautified sound in real time.

The beneficial effects of the technical scheme are that:

As an embodiment of the present invention, the audio beautification includes:

beautification of loudness based on background noise removal;

Tone beautification based on baseline fluctuation tuning;

Tone beautification based on harmonic equalization;

frequency optimization based on frequency response smoothing.

The principle of the technical scheme is as follows:

In the present invention: loudness beautification is the optimization of audio quality by loudness adjustment of sound signals. When a user makes a sound, the original sound signal captured by the microphone may contain large noise, affecting the overall sound quality. Through loudness beautification, excessive noise in the original sound signal is removed, background noise is reduced, and audio quality is improved. Loudness beautification methods include, but are not limited to, mean adjustment, segmentation processing, noise cancellation. Tone beautification is the tuning of the tone of a sound signal to optimize audio quality. When the user sounds, the original sound signal captured by the microphone affects the overall tone harmony when there is a baseline fluctuation in the original sound signal. Through tone beautification, the tone of sound tends to be stable, unnecessary vibration is reduced, and the audio quality is improved. Pitch beautification methods include, but are not limited to, linear prediction models, frequency domain filtering, and vocal tract tracking. Tone beautification is the tuning of the tone of a sound signal to optimize audio quality. When a user makes a sound, the original sound signal captured by the microphone contains harmonic distortion, affecting the overall tone color appearance. Through tone beautification, the tone color of sound tends to be full, the expressive force of the tone color is enhanced, and the audio quality is improved. Tone color beautification methods include, but are not limited to: harmonic equalization, resonance extraction, signal gain control. Frequency optimization is the optimization of the frequency response of a sound signal to improve audio quality. When a user makes a sound, the original sound signal captured by the microphone contains high distortion, which affects the overall audio quality. By frequency optimization, the frequency response of sound tends to be smooth, distortion is reduced, and audio quality is improved. Frequency optimization methods include, but are not limited to: low-pass filter, high-pass filter, adaptive filter.

As an embodiment of the present invention, the receiving, by the microphone, the song requesting information of the user includes:

Presetting a first trigger response and a second trigger response; wherein,

The principle of the technical scheme is as follows:

As shown in fig. 2, in actual implementation, the present invention presets a first trigger response and a second trigger response, where the first trigger response and the second trigger response are different trigger responses preset in the intelligent sound system, and the different trigger responses meet different types of user requirements. The first trigger response is a key trigger response, triggered by a side key of the smart sound or other operable element, such as a touch screen or other sensing device. The second trigger response is a voice trigger response, and is triggered by the self-contained voice recognition and semantic understanding function of the intelligent sound system, such as the user speaking a specific keyword or phrase. And generating a song requesting interface on the preset terminal according to the key triggering response. For example, for a user with a key trigger response, a special song-requesting page is displayed on the front-end interface, and the user can select his favorite song on the page. For a user to trigger a response by voice, voice recognition and understanding in the intelligent sound system is required, and then a corresponding song is selected on the system front-end interface. And then, converting the song storage module into a song list to generate a triggering mechanism of the song list, and arranging and organizing song information in the song storage module by the triggering mechanism of the song list to generate a song list which can be selected by a user. In the music library, all songs will be presented in a list form, and the user can select according to his own preference. According to the trigger mechanism, song information is acquired to determine the original singing, style and the like of the selected song in detail; the method comprises the steps of determining corresponding song content information in voice according to voice trigger response, and determining specific content information of songs wanted by a user according to voice trigger response of the user. For example, if the user says "i want to sing the song", the system will determine the specific name or number of the song based on the voice content of the user. And carrying out user intention recognition processing on the song content information according to the user intention recognition model to acquire song keywords, wherein the processing is carried out on the song content information provided by the user according to the user intention recognition model, and the key information of the target song of the user is screened out, so that the target song is rapidly determined. According to the song keywords, song matching is carried out in the song storage module, song requesting information is determined, and according to the song keywords extracted by the user, song matching is carried out in the song storage module, song information which is most in line with the user target is found out and returned to the user as song requesting information.

As an example of an embodiment of the present invention,

The method for calling the target song in the song storage module by playing the reading instruction through the song comprises the following steps:

If the index mapping library does not have the target character, sending out non-target song prompt information through the intelligent sound box. The principle of the technical scheme is as follows:

In specific implementation, the invention determines the call script for performing song reading call through the song playing reading instruction, and the call script performs song call in the song database. In addition, the song playing and reading instruction can also determine a song playing instruction and a song playing parameter; the process needs to analyze the song playing and reading instruction, and is to split the song playing and reading instruction into two parts: song play instructions and song play parameters. The song playing instruction is used for controlling the real-time operation of the intelligent sound equipment, such as playing, pausing, stopping and the like; the song play parameters are used to specify some attributes of the song play, such as volume, play speed, etc. After determining the call script, key characters of the target song are determined by splitting the click information, wherein the key characters comprise keywords of the song and types of the song, and the types of the song comprise but are not limited to: ballad, pop, rock, etc.; then determining a target storage area (storage area in a song database) of the target song through the song type, and then determining a corresponding target character through the target storage area, so as to further determine the target song corresponding to the target character. And in the target song calling stage, determining a target song calling address through the index mapping database, and calling the target song. If the target characters of the target songs are not available, the target songs are not stored, and the network is required to be connected for query downloading, or the target-free songs are directly prompted. For example, when the system needs to search for songs in a specific key field, it can directly search for the corresponding song name in the index mapping library, thereby determining the data storage location of the song. If the corresponding song information is not found in the index mapping library, the system can also send prompt information of a non-target song to the intelligent sound box through the index mapping library.

As an embodiment of the present invention, the playing of the target song accompaniment through the smart speaker includes:

The principle of the technical scheme is as follows:

According to the invention, a plurality of audio cutting parameters are preset in the intelligent sound box so as to adapt to different music styles and individual tone colors; through a plurality of audio cutting parameters, intelligent stereo set adapts to different users' singing style and tone quality characteristics better to provide more individualized audio cutting function, after the audio cutting, provide the accompaniment audio of more adaptation, and carry out pedestrian sound frequency optimization, make both combine more adaptation.

In a specific implementation process, the accompaniment audio and the voice audio are dynamically analyzed in real time through the intelligent sound box, then the accompaniment and the voice of the target song are divided into the voice audio and the accompaniment audio through the audio cutting algorithm, and the accompaniment obtained after the original audio is cut is the accompaniment without noise because different audio cutting parameters exist for different users in the audio cutting algorithm, so that the accompaniment can adapt to the real-time singing state of the user in an optimal state; the voice audio is the audio of the user, can be optimized independently, and is played through the intelligent sound box after being combined with accompaniment.

For example, when a user starts singing, the intelligent sound can automatically match the optimal accompaniment volume and tone of the current song by analyzing the audio signal of the user in real time, and adjust the optimal human voice domain of the human voice of the user so that the song performance of the user achieves the optimal effect.

The real-time processing of accompaniment audio and voice audio is carried out in the intelligent sound box, so that the immersion and the presence of K songs of a user can be enhanced; for example, when a user sings, the intelligent sound can simulate the sound effect of a real KTV room by increasing the reverberation effect of voice audio or reducing accompaniment volume and the like, so that the immersion and the presence of the user are enhanced.

In actual implementation, the intelligent sound box is integrated with a plurality of sound effect processing modules, so that the effect of K songs of a user can be enhanced; for example, the smart sound can add various sound effects processing such as echo, tremolo, chorus, etc. to the user while singing the song to enhance the user's effect of K song. Intelligent optimization is carried out on the generated accompaniment audio and voice audio in the intelligent sound box so as to meet different requirements of users; for example, the intelligent sound equipment can intelligently optimize accompaniment audio and voice audio generated by the intelligent sound equipment according to the characteristics of the pitch, tone and the like of the user, and is more in line with the personal style and emotion expression of the user.

As an embodiment of the present invention, the audio beautification includes:

The first audio difference data beautifies the difference data;

the second audio difference data is human voice difference data;

The principle of the technical scheme is as follows:

After K song audio is acquired, the method adapts to songs of different types and styles by adaptively adjusting the preset audio comparison model. Therefore, the intelligent sound equipment can accurately find the optimal audio difference data and perform corresponding processing when processing different types of songs. The accuracy and the instantaneity of the audio beautification are further improved through repeated iterative processing of the first audio difference data and the second audio difference data; for example, the aesthetic effect of audio may be fine-tuned by increasing or decreasing specific frequency components in the audio difference data. Optimizing an audio beautifying algorithm in the intelligent sound equipment, improving the operation efficiency of the intelligent sound equipment and saving computing resources; for example, some efficient approximation methods may be employed, such as local binary pattern decomposition (LBP) based audio beautification algorithms, or using deep learning models for audio beautification. On the processing result of the intelligent sound, the final audio output is carried out by combining the singing style of the user and the emotion color of the song; for example, emotion recognition is performed on the user's K-song data, and the emotional topic of the song is combined to generate an audio output with more individuation and emotion resonance.

As one embodiment of the invention, the audio beautification further comprises

The principle of the technical scheme is as follows:

According to the invention, through K song data, the audio characteristics of the voice audio and the target song can be determined, an audio frame window is set based on a synchronous detection mechanism to carry out synchronous detection, and the size of the audio frame window is dynamically adjusted in the detection process so as to improve the accuracy and instantaneity of sound detection and compensation; for example, the size of the audio frame window is automatically adjusted according to the change of the treble zone of the song, and the pitch change of the user is captured. Then, through the sound level record, through analyzing the voice audio of the user and the audio of the target song, the potential sound level delay frame number is identified, the time frame compensation is carried out according to the delayed frame number, the sound level of the voice audio is improved, and the audio beautification is realized.

As an embodiment of the present invention, the synthesizing the initial sound and the target song accompaniment includes:

The principle of the technical scheme is as follows:

According to the invention, the audio sequence of the initial sound and the audio sequence of the target song accompaniment are built through the time axis, the audio sequence arrangement can be carried out on the time axis, and the extraction points and the division points of the two sequences can be clearly extracted and divided. For example, musical analysis techniques such as pitch analysis, and the like may be employed to determine the time axis positions of the two sequences. Then, a target synthesis order of the beautified sound is determined, wherein the synthesis order is a first audio sequence, and the first audio sequence is based on the voice of the user, is a real-time determined sequence and is more suitable for the voice of the user when being synthesized. The process of generating the beautified sound needs to take into account the processing manner and parameter setting of the audio data. For example, an appropriate audio processing tool such as a mixer, a volume equalizer, etc. and an appropriate parameter setting such as a sampling rate, quantization accuracy, etc. may be selected to obtain the best effect.

An online karaoke system based on intelligent sound, comprising:

The microphone is used for receiving song requesting information of a user; wherein,

The intelligent sound box is internally provided with a song storage module and a K song processing module: wherein,

The song storage module acquires a song playing and reading instruction of the intelligent sound box, performs song calling and plays target song accompaniment through the intelligent sound box;

The K song processing module is used for obtaining K song data of a user through the microphone, carrying out audio beautification on the K song data through the K song processing module, generating initial sound, synthesizing the initial sound and target song accompaniment, generating beautification sound, and playing and storing through the intelligent sound box.

As shown in fig. 2. The invention aims to overcome the defect of online K song of the existing intelligent sound equipment, and provides a novel online K song system based on the intelligent sound equipment.

The beneficial effects of the technical scheme are that:

A non-transitory storage medium storing a computer program comprising program code means adapted to perform the method described above when said program is run on a data processing device.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. An online karaoke method based on intelligent sound is characterized by comprising an intelligent sound box, a microphone, a song storage module and a karaoke processing module, wherein the method comprises the following steps:

2. The intelligent audio based on-line singing method as claimed in claim 1, wherein the audio beautification includes:

beautification of loudness based on background noise removal;

Tone beautification based on baseline fluctuation tuning;

Tone beautification based on harmonic equalization;

frequency optimization based on frequency response smoothing.

3. The method for online singing based on intelligent sound equipment as claimed in claim 1, wherein the receiving the song requesting information of the user through the microphone comprises:

Presetting a first trigger response and a second trigger response; wherein,

4. The on-line K song method based on intelligent sound equipment according to claim 1, wherein the step of calling the target song in the song storage module by playing the read command through the song comprises the following steps:

5. The on-line K song method according to claim 1, wherein playing the target song accompaniment through the smart sound box comprises:

6. The intelligent audio based on-line singing method as claimed in claim 1, wherein the audio beautification further comprises:

The first audio difference data beautifies the difference data;

the second audio difference data is human voice difference data;

7. The intelligent audio based on-line singing method as claimed in claim 1, wherein the audio beautification further comprises:

8. The intelligent sound based on-line K song method of claim 1 wherein said synthesizing the initial sound and the target song accompaniment to generate the beautification sound comprises:

9. Online K sings system based on intelligent stereo set, its characterized in that includes:

10. A non-transitory storage medium storing a computer program comprising program code means adapted to perform the method of any one of claims 1 to 8 when the program is run on a data processing device.