CN116506650A

CN116506650A - Live broadcast-based virtual resource configuration method, computer equipment and storage medium

Info

Publication number: CN116506650A
Application number: CN202310344912.XA
Authority: CN
Inventors: 许函; 王海平; 陈超
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-07-28

Abstract

The application discloses a virtual resource configuration method based on live broadcast, a computer device and a storage medium, wherein the method comprises the following steps: the anchor terminal uploads an audio file to a server; the server extracts the characteristics of the anchor voiceprint in the audio file; the server performs feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library so as to judge whether the anchor voiceprint features exist in the preset voiceprint library or not; if the fact that the anchor voiceprint features are not in the preset voiceprint library is judged, the server issues virtual resources to anchor terminals according to anchor identity identification of anchors; and the anchor terminal responds to the selection operation of the anchor on the virtual resource and generates the live video combining the virtual resource and the anchor characteristic. By means of the method, the management of obtaining virtual resources by the anchor can be facilitated.

Description

Live broadcast-based virtual resource configuration method, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of live broadcast technologies, and in particular, to a live broadcast-based virtual resource allocation method, a computer device, and a storage medium.

Background

Along with the development of internet technology and communication technology, society has entered the era of intelligent interconnection, and it is also more and more popular to carry out interdynamic, amusement and work in the internet, and wherein it is the live broadcast technique that is more general, people can watch live broadcast or carry out live broadcast through intelligent device anytime and anywhere, has greatly enriched people's life and has widened people's field of vision.

With the rapid development of the live broadcast industry, the form of live broadcast interaction is also various. During the live broadcast, the anchor may perform such as singing, dancing, etc. The audience may interact with the anchor while watching the anchor's performance, such as giving a virtual gift, posting comments, or praying to the anchor. However, at present, the anchor acquires the virtual resources of the platform, such as a secondary anchor, and the recruitment cost of the platform is very high, so that many manual audits and the like are required to be performed to audit the anchor's acquisition of the virtual resources.

Disclosure of Invention

The technical problem to be solved mainly by the application is to provide a live broadcast-based virtual resource allocation method, computer equipment and storage medium, which are convenient for managing the acquisition of virtual resources by a host.

In order to solve the technical problems, one technical scheme adopted by the application is as follows: the utility model provides a virtual resource allocation method based on live broadcast, which comprises the following steps: the anchor terminal uploads an audio file to a server; the server extracts the characteristics of the anchor voiceprint in the audio file; the server performs feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library so as to judge whether the anchor voiceprint features exist in the preset voiceprint library or not; if the fact that the anchor voiceprint features are not in the preset voiceprint library is judged, the server issues virtual resources to anchor terminals according to anchor identity identification of anchors; and the anchor terminal responds to the selection operation of the anchor on the virtual resource and generates the live video combining the virtual resource and the anchor characteristic.

In order to solve the technical problems, another technical scheme adopted by the application is as follows: the utility model provides a virtual resource allocation method based on live broadcast, which comprises the following steps: extracting the characteristics of the anchor voiceprint in the audio file uploaded by the anchor terminal; performing feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library to judge whether the anchor voiceprint features exist in the preset voiceprint library or not; if the voiceprint features of the anchor are judged not to exist in the preset voiceprint library, virtual resources are issued to the anchor terminal according to the anchor identity identification mark of the anchor, so that the anchor terminal can respond to the selection operation of the anchor on the virtual resources to generate live video combining the virtual resources and the anchor features.

In order to solve the technical problems, another technical scheme adopted by the application is as follows: providing a computer device comprising a processor, a memory, and a communication circuit; the communication circuit and the memory are coupled with the processor; the memory stores a computer program for execution by the processor to implement the live-based virtual resource allocation method as provided in the present application.

In order to solve the technical problems, another technical scheme adopted by the application is as follows: there is provided a computer readable storage medium storing a computer program for execution by a processor to implement a live-based virtual resource allocation method as provided herein above.

The beneficial effects of this application are: different from the prior art, the server extracts the feature of the anchor voiceprint in the audio file uploaded by the anchor terminal, and performs feature matching on the feature of the anchor voiceprint and the voiceprint feature in the preset voiceprint library so as to judge whether the feature of the anchor voiceprint exists in the preset voiceprint library. If the anchor voiceprint features are judged not to exist in the preset voiceprint library, the server can issue virtual resources to the anchor terminal according to the anchor identity identification of the anchor. The anchor terminal may generate a live video combining the virtual resource and the anchor feature in response to an anchor selection operation of the virtual resource. Therefore, the identity of the anchor is identified and recruited in a machine identification mode, so that the difficulty in identifying the anchor during recruitment in the virtual resource opening process is reduced, and the identity of the anchor is identified under the condition that the identity of the anchor does not pass through a real face. On one hand, the method can improve the efficiency of the anchor recruitment, reduce the cost of the anchor recruitment and reduce the manual participation process, and on the other hand, the method can improve the accuracy of the anchor recruitment, and is beneficial to improving the live broadcast quality of the anchor and managing the anchor.

Drawings

FIG. 1 is a schematic diagram of the system components of an embodiment of the live system of the present application;

FIG. 2 is a flow chart of a first embodiment of a live-based virtual resource allocation method of the present application;

fig. 3 is a schematic flow chart of a second embodiment of a live-based virtual resource allocation method according to the present application

FIG. 4 is a timing diagram of a second embodiment of a live-based virtual resource allocation method of the present application;

FIG. 5 is an interface schematic diagram of a server according to a second embodiment of the present application of a live-based virtual resource allocation method;

fig. 6 is a schematic flow chart of a preset deep learning model in a second embodiment of a live-based virtual resource allocation method of the present application;

fig. 7 is an interface schematic diagram of a hosting terminal in a second embodiment of the live-based virtual resource allocation method of the present application;

FIG. 8 is a schematic diagram of virtual resources in a second embodiment of a live-based virtual resource allocation method according to the present application;

FIG. 9 is a schematic circuit diagram of an embodiment of a computer device of the present application;

fig. 10 is a schematic circuit configuration diagram of an embodiment of a storage medium readable by a computer of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

With the rapid development of the live broadcast industry, the form of live broadcast interaction is also various. In the live broadcast process, a host user performs at a host terminal, and the user can watch the host performance at a spectator terminal or interact with the host through the spectator terminal. During the live broadcast of the audience by watching the anchor through the audience terminal, the audience user may present a virtual gift to the anchor user to interact with the anchor user.

The inventor discovers through long-term research that in the existing live broadcast process, most of the anchor is opened through the real face, so that the live broadcast interaction mode is single. Part of the live broadcast forms are opened through virtual resources, such as secondary images. In the process of the anchor recruitment, the anchor is not opened by a real face, so that the anchor recruitment is difficult to identify, the anchor identity cannot be accurately identified, and the anchor opened in a live broadcast mode through virtual resources is inconvenient to manage. In order to improve the above technical problems, the present application proposes the following embodiments.

As shown in fig. 1, a live broadcast system 1 described in the embodiments of the live broadcast system of the present application may include a server 10, a hosting terminal 20, and an audience terminal 30. The anchor terminal 20 and the audience terminal 30 may be electronic devices, and in particular, the anchor terminal 20 and the audience terminal 30 are electronic devices, i.e., client terminals, in which respective client programs are installed. The electronic device may be a mobile terminal, a computer, a server, or other terminals, the mobile terminal may be a mobile phone, a notebook computer, a tablet computer, an intelligent wearable device, or the like, and the computer may be a desktop computer, or the like.

The server 10 may pull the live data stream from the anchor terminal 20, and may correspondingly process the obtained live data stream and push the live data stream to the viewer terminal 30. The audience terminal 30 may view the live process of the anchor or guest after acquiring the live data stream. A mixed stream of live data streams may occur at least one of the server 10, the anchor terminal 20, and the viewer terminal 30. Video communication or voice communication can be performed between the anchor terminal 20 and the anchor terminal 20, and between the anchor terminal 20 and the viewer terminal 30. During the live viewing process, the anchor terminal 20 may push live data streams including video streams to the server 10, and further push corresponding live data to each viewer terminal 30 in the live room to which the anchor terminal 20 corresponds. The anchor terminal 20 and the viewer terminal 30 can display corresponding live pictures in the live broadcasting room. In particular, the server 10 may be, for example, a server cluster, and may be used not only to collect and push live data streams, but also to further process service requests and related matters, for example, to store and process data related to services generated in the live broadcast process, for example, to process virtual gift gifts, virtual coin recharging and consumption, public screen information transceiving, authentication, linking and automatic authentication of sensitive words/pictures, and the like.

Of course, the anchor terminal 20 and the audience terminal 30 are relatively speaking, and the terminal in the live broadcast process is the anchor terminal 20, and the terminal in the live broadcast watching process is the audience terminal 30.

As shown in fig. 2, a first embodiment of the live-based virtual resource allocation method of the present application may include the following steps: m100: and uploading the audio file to the server by the anchor terminal. M200: the server extracts the anchor voiceprint features in the audio file. M300: and the server performs feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library so as to judge whether the anchor voiceprint features exist in the preset voiceprint library. M400: and if the anchor voiceprint features are judged not to exist in the preset voiceprint library, the server issues virtual resources to the anchor terminal according to the anchor identity identification of the anchor. M500: and the anchor terminal responds to the selection operation of the anchor on the virtual resource and generates the live video combining the virtual resource and the anchor characteristic.

In the virtual resource configuration process based on live broadcast, the server 10 may interact with the anchor terminal 20 to complete the virtual resource configuration. Specifically, the anchor terminal 20 may upload the audio file to the server 10, and after receiving the audio file, the server 10 may extract the anchor voiceprint feature in the audio file, and perform feature matching on the anchor voiceprint feature and the voiceprint feature in the preset voiceprint library, so as to determine whether the anchor voiceprint feature exists in the preset voiceprint library. If the server determines that the anchor voiceprint feature does not exist in the preset voiceprint library, the server 10 may issue a virtual resource to the anchor terminal 20 according to the anchor identity of the anchor. The anchor terminal 20 may generate live video combining the virtual resource and the anchor feature in response to an anchor selection operation of the virtual resource. Therefore, the identity of the anchor is identified and recruited in a machine identification mode, so that the difficulty in identifying the anchor during recruitment in the virtual resource opening process is reduced, and the identity of the anchor is identified under the condition that the identity of the anchor does not pass through a real face. On one hand, the method can improve the efficiency of the anchor recruitment, reduce the cost of the anchor recruitment and reduce the manual participation process, and on the other hand, the method can improve the accuracy of the anchor recruitment, and is beneficial to improving the live broadcast quality of the anchor and managing the anchor.

As shown in fig. 3, the second embodiment of the live-based virtual resource allocation method of the present application may use the server 10 as an execution subject. The embodiment may include the following steps: s100: and extracting the characteristics of the anchor voiceprint in the audio file uploaded by the anchor terminal. S200: and carrying out feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library so as to judge whether the anchor voiceprint features exist in the preset voiceprint library. S300: if the voiceprint features of the anchor are judged not to exist in the preset voiceprint library, virtual resources are issued to the anchor terminal according to the anchor identity identification mark of the anchor, so that the anchor terminal can respond to the selection operation of the anchor on the virtual resources to generate live video combining the virtual resources and the anchor features.

In the live broadcast process, the feature of the anchor voiceprint in the audio file uploaded by the anchor terminal 20 is extracted, and the feature of the anchor voiceprint is matched with the feature of the voiceprint in the preset voiceprint library, so as to determine whether the feature of the anchor voiceprint exists in the preset voiceprint library. If it is determined that the anchor voiceprint feature does not exist in the preset voiceprint library, virtual resources are issued to the anchor terminal 20 according to the anchor identity identification of the anchor, so that the anchor terminal 20 can respond to the selection operation of the anchor on the virtual resources to generate live video combining the virtual resources and the anchor feature. Therefore, the identity of the anchor is identified and recruited in a machine identification mode, so that the difficulty in identifying the anchor during recruitment in the virtual resource opening process is reduced, and the identity of the anchor is identified under the condition that the identity of the anchor does not pass through a real face. On one hand, the method can improve the efficiency of the anchor recruitment, reduce the cost of the anchor recruitment and reduce the manual participation process, and on the other hand, the method can improve the accuracy of the anchor recruitment, and is beneficial to improving the live broadcast quality of the anchor and managing the anchor.

The method described in this embodiment may be applied to a scenario in which a live process is played by a host using virtual resources, as shown in fig. 4, and the following describes the embodiment in detail with the server 10 as an execution body.

S100: and extracting the characteristics of the anchor voiceprint in the audio file uploaded by the anchor terminal.

The audio file may be obtained by the anchor terminal 20 through a recording device communicatively coupled to the anchor terminal 20. During the live broadcast, the anchor terminal 20 may upload the audio file to the server 10, so that the audio file is transmitted to all the audience terminals 30 corresponding to all the audiences currently in the live broadcast room through the server 10, so that the audience terminals 30 may play the audio file while playing the live broadcast picture.

Voiceprint features are a generic term for speech features contained in speech that characterize and identify a speaker, and speech models built based on these features. Since voiceprint recognition is a process of recognizing a speaker corresponding to the section of speech according to voiceprint features of the speech to be recognized, in order to recognize the identity of the anchor based on the voiceprint features, the anchor voiceprint features need to be extracted first.

In one implementation, before extracting the anchor voiceprint features in the audio file uploaded by the anchor terminal, the steps of:

S110: and acquiring an audio file obtained by the anchor terminal by reading the preset text in the process of trial.

The pilot may be a process by which the anchor performs anchor identification in order to obtain the audio file before the formal start.

The preset text may be text preset in the server 10 for obtaining an audio file to identify the anchor. Through setting the preset text, the content and the duration of reading in the process of the anchor screening can be conveniently limited, and therefore the efficiency of extracting the anchor voiceprint features is improved.

During the pilot broadcast, the anchor may read the preset text, and the anchor terminal 20 obtains the audio of the anchor reading the preset text to form an audio file, and sends the audio file to the server 10. The server 10 acquires an audio file obtained by the anchor terminal 20 reading a preset text during the trial, so that the anchor voiceprint feature of the anchor can be acquired based on the audio file.

In one implementation, reference may be made to S110 for how to obtain an audio file obtained by the anchor terminal reading the preset text by the anchor during the trial, where the steps include:

s111: playback links generated by the end of the trial run for linking to the audio file are obtained.

S112: the audio file is retrieved via the playback link.

The playback links may be links formed based on audio of the trial process for linking to the audio files, and in particular, the anchor terminal 20 may automatically generate corresponding playback links after the end of the trial by the anchor terminal 20. The server 10 may acquire a playback link and acquire an audio file through the playback link, and the audio file may enter a configuration flow set in advance in the server 10 through a playback stream form. As shown in fig. 5, fig. 5 is an interface of the server 10 when running a configuration flow, through which the anchor voiceprint feature can be extracted from the audio file, and whether the virtual resource can be configured for the anchor is determined based on the anchor voiceprint feature.

In one implementation, reference may be made to the following steps included in S100 for how to extract the anchor voiceprint features in the audio file uploaded by the anchor terminal:

s120: and carrying out noise reduction processing on the audio file and extracting a voice fragment carrying the anchor voice in the audio file.

Since the audio file may contain background sound parts other than the presenter's voice, such as background music and environmental sounds, the background sound parts may interfere with the process of extracting the presenter's voiceprint features. Therefore, noise reduction processing is required to be performed on the audio file, and an audio fragment which carries the voice of the anchor and can be used for extracting the characteristics of the anchor voice print is extracted from the audio file, so that the accuracy of extracting the characteristics of the anchor voice print is improved, and the accuracy of identity recognition of the anchor is improved.

In one implementation, for how to perform the noise reduction processing on the audio file and extract the voice segment carrying the presenter' S voice in the audio file, reference may be made to S120, which includes the following steps:

s121: and removing the background sound part in the audio file, and carrying out noise reduction treatment on the audio file to obtain a foreground sound part.

The background sound part in the audio file can be removed because the background sound part can cause interference to the host voice. Specifically, in the process of removing the background sound portion in the audio file, a sound event algorithm based on deep learning may be used to remove the background sound portion in the audio file.

In order to eliminate noise interference in the audio file, the audio file may be subjected to noise reduction processing after the background sound portion in the audio file is removed, so as to obtain a foreground sound portion. Specifically, in the process of performing noise reduction processing on an audio file, noise interference in the audio file can be eliminated using a noise reduction algorithm based on deep learning.

For example, if there is both a presenter's voice and background music in the audio file, the extraction of the presenter's voice characteristics during the voice print extraction process can be affected, thereby affecting the accuracy of voice print identification. Background music in the audio file is removed by using a sound event algorithm based on deep learning, then noise interference is eliminated by using a noise reduction algorithm based on deep learning, and finally a foreground sound part carrying the anchor sound is obtained.

S122: and eliminating non-voice fragments which do not carry the host voice in the foreground voice part to obtain voice fragments.

The non-speech segments may include speech segments that are not presented with the preset text by the presenter during the pilot, such as ventilation sounds, breathing sounds, or cough sounds emitted by the presenter during the pilot.

Because the non-voice fragments do not carry voice information for extraction and recognition, after the background sound part in the audio file is removed and the audio file is subjected to noise reduction treatment to obtain a foreground sound part, the non-voice fragments which do not carry the anchor voice in the foreground sound part can be removed to obtain the voice fragments. Specifically, in the process of eliminating the non-voice fragments which do not carry the voice of the anchor in the foreground sound part, a voice activity detection algorithm based on deep learning can be used for removing the non-voice fragments which do not carry the voice of the anchor in the foreground sound part, so that the accuracy of the characteristics of the anchor voice print and the recognition of the voice print is improved.

S130: the anchor voiceprint features are extracted from the speech segments.

After the audio segments are culled and denoised to obtain a speech segment containing only the presenter's voice, the presenter's voiceprint features may be extracted from the speech segment so that feature matching may be performed based on the presenter's voiceprint features.

In one implementation, reference may be made to the following steps for how to extract the anchor voiceprint features from a speech segment:

s131: the speech segments are divided into speech blocks.

To facilitate processing of speech segments, the speech segments may be divided into speech blocks. Specifically, in dividing the speech segments into several speech blocks, the division may be performed according to a division criterion set in advance in the server 10. For example, the speech segments may be divided according to a preset size or duration of the speech blocks. If the total duration of the voice segment is 30s and the duration of the preset voice block is 5s, the voice segment can be divided into 6 voice blocks.

S132: a determination is received as to whether each speech block is a chairman's voice.

After dividing the speech segment into a number of speech blocks, it may be determined whether each speech block is a chairman's voice, and the determination result is received. Specifically, in the process of judging whether each voice block is a sound of a host, it is possible to confirm whether each voice block is a sound of a host who is performing trial by manually performing, thereby obtaining a judgment result. Each voice block can be recognized by using a recognition model of deep learning to judge whether each voice block is a chairman sound or not, and the judgment result is output.

S133: and if the judgment result is yes, extracting voiceprint characteristics of the voice block.

If the determined result is that the voice block is the anchor voice, the voice print feature of the voice block can be extracted as the anchor voice print feature.

If the determined result is that the voice block is not the anchor voice, the voice block can be discarded without extracting voiceprint features from the voice block.

In one implementation, reference may also be made to the following steps included in S100 for how to extract the anchor voiceprint features in the audio file uploaded by the anchor terminal:

s140: and carrying out feature extraction processing on the time domain signals of the audio file through a preset deep learning model to obtain feature vectors serving as the features of the anchor voiceprint.

After a plurality of voice blocks carrying the anchor voice are acquired, voiceprint features can be extracted through a preset deep learning model. Specifically, feature extraction processing can be performed on the time domain signal of the audio file through a preset deep learning model, so that a feature vector serving as the feature of the anchor voiceprint is obtained, and the feature vector can serve as the feature of the anchor voiceprint.

In one implementation manner, for how to perform feature extraction processing on the time domain signal of the audio file through the preset deep learning model to obtain the feature vector as the feature of the anchor voiceprint, reference may be made to S140, which includes the following steps:

S141: an audio time domain signal of an audio file is input to a first convolution layer.

S142: and inputting the convolution result output by the first convolution layer into a first residual neural network.

S143: and respectively inputting the processing results of the first residual neural network into a second residual neural network and a second convolution layer.

S144: and respectively inputting the processing result of the second residual neural network into a third residual neural network and a second convolution layer.

S145: the convolution result of the second convolution layer is input to the max pooling layer.

S146: and inputting the pooling result of the maximum pooling layer into the full-connection layer, and outputting the feature vector through the full-connection layer.

As shown in fig. 6, the preset deep learning model may include a first convolution layer, a first residual neural network, a second residual neural network, a third residual neural network, a second convolution layer, a max pooling layer, and a full connection layer connected in sequence. Specifically, the first convolution layer and the second convolution layer are one-dimensional convolution layers.

In the process of performing feature extraction processing on the time domain signal of the audio file through the preset deep learning model, the audio time domain signal of the audio file can be input into a first convolution layer, a convolution result output by the first convolution layer is input into a first residual error neural network, a processing result of the first residual error neural network is respectively input into a second residual error neural network and a second convolution layer, a processing result of the second residual error neural network is respectively input into a third residual error neural network and the second convolution layer, and a convolution result of the second convolution layer is input into a maximum pooling layer. And finally, inputting the pooling result of the maximum pooling layer into the full-connection layer, outputting one-dimensional feature vectors through the full-connection layer, and taking the one-dimensional feature vectors as the characteristics of the anchor voiceprint. An effective neural network for extracting voiceprint features, namely a preset deep learning model is constructed by combining a time delay neural network and a residual error network, and the preset deep learning model is small in size and strong in network learning capacity, so that the accuracy of the extraction process of the voiceprint features of the anchor is improved.

After extracting the anchor voiceprint features, the following steps may be performed:

s200: and performing feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library to judge whether the anchor voiceprint features exist in the preset voiceprint library.

The preset voiceprint library may include a voiceprint library storing all voiceprint features corresponding to all the hosts currently being opened by the live platform using the virtual resource.

When the host recruitment is performed in the virtual resource opening process, in order to avoid repeatedly opening the virtual resource live broadcast authority to the same host, the host needs to be identified to judge whether the host has opened the virtual resource live broadcast authority. Specifically, since all voiceprint features corresponding to all the anchors currently utilizing the virtual resources for playing by the live broadcast platform are stored in the preset voiceprint library, whether the anchor has opened the virtual resource live broadcast authority can be judged by performing feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library.

In one implementation, for how to match the feature of the anchor voiceprint with the voiceprint features in the preset voiceprint library to determine whether the anchor voiceprint feature exists in the preset voiceprint library, reference may be made to S200, which includes the following steps:

S210: and calculating cosine similarity between the main broadcasting voiceprint feature and each voiceprint feature in the preset voiceprint library.

S220: judging whether the cosine similarity is larger than a preset threshold value.

S230: if the feature is smaller than the preset voiceprint library, judging that the feature of the anchor voiceprint does not exist in the preset voiceprint library.

In the process of carrying out feature matching on the anchor voiceprint features and the voiceprint features in the preset voiceprint library, the cosine similarity between the anchor voiceprint features and each voiceprint feature in the preset voiceprint library can be calculated respectively. And comparing each cosine similarity with a preset threshold value respectively, and judging whether the cosine similarity is larger than the preset threshold value or not. If the cosine similarity is smaller than a preset threshold, it can be determined that the two voiceprint features are not from the same person speaker, so that it can be determined that the anchor voiceprint feature is not matched with the voiceprint features in the preset voiceprint library, and further it is determined that the anchor voiceprint feature is not in the preset voiceprint library.

If the cosine similarity is greater than a preset threshold, it can be determined that the two voiceprint features are from the same speaker, so that it can be determined that the anchor voiceprint features are matched with the voiceprint features in the preset voiceprint library, and further it is determined that the anchor voiceprint features exist in the preset voiceprint library, and the virtual resource live permission of the anchor can not be opened for repeating the anchor, namely the virtual resource application of the anchor is refused.

In one implementation, for how to match the feature of the anchor voiceprint with the voiceprint features in the preset voiceprint library to determine whether the anchor voiceprint feature exists in the preset voiceprint library, reference may also be made to S200, which includes the following steps:

s240: judging whether the characteristics of the anchor voiceprint are matched with the voiceprint characteristics in a preset voiceprint library.

S250: if the voice print characteristics are not matched, judging whether the anchor identification corresponding to the anchor voice print characteristics is the same as the anchor identification corresponding to the voice print characteristics in the preset voice print library.

S260: if the features are different, judging that the features of the anchor voiceprint do not exist in the preset voiceprint library.

The anchor identity identification may include an identification used to distinguish anchor identities in virtual resource play. Specifically, the anchor identification identifier may be stored in a preset voiceprint library corresponding to the anchor voiceprint feature, so as to facilitate determining the corresponding anchor based on the anchor identification identifier corresponding to the anchor voiceprint feature. The anchor identification may be carried by the anchor voiceprint feature, i.e., the anchor may already have had before the voiceprint match was made, or may be assigned to the anchor by the server 10.

In the process of judging whether the vocal print characteristics of the anchor and the vocal print characteristics in the preset vocal print library are matched, the method for judging based on cosine similarity can be referred to, and will not be described herein. If the cosine similarity is smaller than a preset threshold, it can be determined that the anchor voiceprint feature is not matched with the voiceprint feature in the preset voiceprint library, and then it can be determined whether the anchor identity corresponding to the anchor voiceprint feature is identical to the anchor identity corresponding to the voiceprint feature in the preset voiceprint library.

When the voiceprint features of the anchor are extracted, if the anchor has the anchor identification mark, the anchor identification mark can be obtained at the same time, and then whether the anchor identification mark corresponding to the anchor voiceprint features is the same as the anchor identification mark corresponding to the voiceprint features in the preset voiceprint library can be judged when the anchor voiceprint is not matched with the voiceprint features in the preset voiceprint library. If the features are different, the fact that the anchor voiceprint features do not exist in the preset voiceprint library can be judged.

In one implementation, after determining whether the anchor voiceprint feature matches a voiceprint feature in a preset voiceprint library, the method may include the steps of:

s251: if not, judging whether the anchor voiceprint features have corresponding anchor identity identification.

S252: if the corresponding anchor identity identification is not determined, the anchor identity identification is allocated to the anchor, the anchor identity identification is bound with the anchor voiceprint feature, and then whether the anchor identity identification corresponding to the anchor voiceprint feature is the same as the anchor identity identification corresponding to the voiceprint feature in the preset voiceprint library is determined.

If the anchor voiceprint features are not matched with the voiceprint features in the preset voiceprint library, whether the anchor voiceprint features have corresponding anchor identification marks can be judged. If the server 10 does not have the corresponding anchor identity, the server may allocate an anchor identity to the anchor, bind the anchor identity to the anchor voiceprint feature, and further perform the judgment to determine whether the anchor identity corresponding to the anchor voiceprint feature is the same as the anchor identity corresponding to the voiceprint feature in the preset voiceprint library.

In one implementation, after determining whether the anchor identification corresponding to the anchor voiceprint feature is the same as the anchor identification corresponding to the voiceprint feature in the preset voiceprint library, the method may include the following steps:

s270: if the features are the same, judging that the features of the anchor voiceprint exist in a preset voiceprint library.

If the anchor identity identifier corresponding to the anchor voiceprint feature is the same as the anchor identity identifier corresponding to the voiceprint feature in the preset voiceprint library, that is, the anchor identity identifier already stores other corresponding voiceprint features in the preset voiceprint library, it can be determined that the anchor voiceprint feature exists in the preset voiceprint library.

In one implementation, after determining that the anchor voiceprint feature is not present in the preset voiceprint library, the following steps may be performed:

s280: if the host voice print characteristics are judged not to exist in the preset voice print library, adding the host voice print characteristics into the preset voice print library to update the preset voice print library.

Since it is determined that the voiceprint feature of the anchor is not in the preset voiceprint library, that is, the anchor does not use the virtual resource to play the broadcast history before the anchor, the anchor meets the condition of the anchor recruitment of the virtual resource to play the virtual resource, the anchor can be opened with the virtual resource live broadcast authority, so that the anchor can utilize the virtual resource to play the live broadcast. Therefore, the hosting voiceprint feature of the hosting may be added to the preset voiceprint library to update the preset voiceprint library.

In one implementation, reference may be made to S280 for the following steps involved in how to add the anchor voiceprint feature to the preset voiceprint library:

s281: and carrying out feature coding on the anchor voiceprint features to obtain index codes.

S282: and adding the anchor voiceprint features and index code association thereof into a preset voiceprint library.

The index encoding may include encoding for identifying the anchor voiceprint features for ease of differentiation. For example, the index code may be a unique code. Specifically, after the feature of the anchor voiceprint is extracted, the feature of the anchor voiceprint may be encoded to obtain an index code.

In the process of adding the anchor voiceprint features into the preset voiceprint library, feature coding can be performed on the anchor voiceprint features to obtain index codes, and then the anchor voiceprint features and the index codes thereof are added into the preset voiceprint library in a correlated manner.

Furthermore, the anchor voiceprint features, the anchor identification mark and the index code can be stored in a correlated manner in the process of storing the anchor voiceprint features, so that the corresponding anchor or anchor voiceprint features can be determined based on the anchor identification mark or the index code, and the management efficiency of the voiceprint features in the preset voiceprint library is improved.

In one implementation, after determining that the anchor voiceprint feature exists in the preset voiceprint library, the following steps may be performed:

s290: if the host voice print characteristics are judged to exist in the preset voice print library, the host voice print characteristics are not added into the preset voice print library.

If the host voiceprint features are judged to exist in the preset voiceprint library, that is, if the host identity identification corresponding to the host voiceprint features is the same as the host identity identification corresponding to the voiceprint features in the preset voiceprint library, the host voiceprint features are not added into the preset voiceprint library. It is still possible to open the virtual resource live-broadcast right for the anchor and issue the virtual resource to the anchor terminal 20 so that the anchor can live with the virtual resource at the anchor terminal 20.

For example, in the virtual resource playing process, the same public meeting may use the same UID for different anchors, if the UID obtained by the anchor B is used by the anchor a, and the anchor a uses the UID to complete virtual resource configuration, that is, the voiceprint feature of the anchor a is already stored in the preset voiceprint library, then in the matching process of the anchor B with the UID corresponding to the voiceprint feature in the preset voiceprint library, the voiceprint feature of the anchor a is matched. At this time, it may be determined that the anchor voiceprint feature exists in the preset voiceprint library, and the anchor voiceprint feature of anchor B may not be added to the preset voiceprint library. But may still open the virtual resource live broadcast right for the anchor B and issue the virtual resource to the anchor terminal 20 corresponding to the anchor B.

In one implementation, after feature matching is performed on the anchor voiceprint feature and the voiceprint feature in the preset voiceprint library to determine whether the anchor voiceprint feature exists in the preset voiceprint library, the following steps may be further performed:

s300: if the voiceprint features of the anchor are judged not to exist in the preset voiceprint library, virtual resources are issued to the anchor terminal according to the anchor identity identification mark of the anchor, so that the anchor terminal can respond to the selection operation of the anchor on the virtual resources to generate live video combining the virtual resources and the anchor features.

The virtual resource may be a resource preset in the server 10 for configuring the avatar. The virtual resources are transmitted to the anchor terminal 20 so that the anchor can perform a selection operation on the virtual resources, thereby generating a live video combining the virtual resources and the anchor feature at the anchor terminal 20.

If it is determined that the anchor voiceprint feature does not exist in the preset voiceprint library, the virtual resource may be issued to the anchor terminal 20 according to the anchor identity identification feature of the anchor. After receiving the virtual resource, the anchor terminal can respond to the selection operation of the anchor on the virtual resource to generate a live video combining the virtual resource and the anchor feature.

For example, as shown in fig. 7 and 8, after the virtual resource is issued to the anchor terminal 20, the anchor terminal 20 may display an avatar configuration interface based on the virtual resource, the anchor may select a background, expression, action, etc. configuration of the avatar, and the anchor feature may be an anchor voiceprint feature, so that a live video combining the avatar and the anchor voiceprint feature may be generated at the anchor terminal 20 and transmitted to each of the audience terminals 30 through the server 10, so that the audience user may watch at the audience terminal 30.

As shown in fig. 9, the computer device 100 described in the embodiment of the computer device of the present application may be the server 10 described above. Computer device 100 may include a processor 110, a memory 120, and a communication circuit 130.

The Memory 120 is used to store a computer program, and may be a ROM (Read-Only Memory), a RAM (random access Memory), random Access Memory, or other type of storage device. In particular, the memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory is used to store at least one piece of program code.

The processor 110 is used to control the operation of the computer device 100, and the processor 110 may also be referred to as a CPU (Central Processing Unit ). The processor 110 may be an integrated circuit chip with signal processing capabilities. Processor 110 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The general purpose processor may be a microprocessor or the processor 110 may be any conventional processor or the like.

The processor 110 is configured to execute a computer program stored in the memory 120 to implement the live-based virtual resource allocation method described in the first embodiment of the live-based virtual resource allocation method and the second embodiment of the live-based virtual resource allocation method of the present application.

The computer device 100 may also include a communication circuit 130, the communication circuit 130 being a device or circuit by which the computer device 100 communicates with an external device to enable the processor 110 to interact with external devices via the communication circuit 130.

For detailed descriptions of functions and execution procedures of each functional module or component in the embodiment of the computer device of the present application, reference may be made to the descriptions in the first embodiment of the live-based virtual resource allocation method and the second embodiment of the live-based virtual resource allocation method in the present application, which are not described herein again.

In several embodiments provided herein, it should be understood that the disclosed computer device 100 and live-based virtual resource allocation method may be implemented in other ways. For example, the various embodiments of computer device 100 described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Referring to fig. 10, the above-described integrated units, if implemented in the form of software functional units and sold or used as independent products, may be stored in the computer-readable storage medium 200. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions/computer programs to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: various media such as a USB flash disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk, and electronic terminals such as a computer, a mobile phone, a notebook computer, a tablet computer, a camera, and the like having the storage media.

The description of the execution process of the program data in the computer readable storage medium may be described with reference to the first embodiment of the live-based virtual resource allocation method and the second embodiment of the live-based virtual resource allocation method, which are not described herein.

The foregoing description is only exemplary embodiments of the present application and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the present application.

Claims

1. The live broadcast-based virtual resource allocation method is characterized by comprising the following steps of:

the anchor terminal uploads an audio file to a server;

the server extracts the characteristics of the anchor voiceprint in the audio file;

the server performs feature matching on the anchor voiceprint features and voiceprint features in a preset voiceprint library so as to judge whether the anchor voiceprint features exist in the preset voiceprint library;

if the fact that the anchor voiceprint features do not exist in the preset voiceprint library is judged, the server issues virtual resources to the anchor terminal according to anchor identity identification of the anchor;

And the anchor terminal responds to the selection operation of the anchor on the virtual resource and generates a live video combining the virtual resource and anchor characteristics.

2. The live broadcast-based virtual resource allocation method is characterized by comprising the following steps of:

extracting the characteristics of the anchor voiceprint in the audio file uploaded by the anchor terminal;

performing feature matching on the anchor voiceprint features and voiceprint features in a preset voiceprint library to judge whether the anchor voiceprint features exist in the preset voiceprint library;

and if the host broadcast voiceprint features are judged not to exist in the preset voiceprint library, virtual resources are issued to the host broadcast terminal according to the host broadcast identification mark of the host broadcast, so that the host broadcast terminal can respond to the selection operation of the host broadcast on the virtual resources to generate live broadcast video combining the virtual resources and the host broadcast features.

3. The method according to claim 2, characterized in that:

the extracting the anchor voiceprint features in the audio file uploaded by the anchor terminal comprises:

and carrying out feature extraction processing on the time domain signals of the audio file through a preset deep learning model to obtain feature vectors serving as the features of the anchor voiceprint.

4. A method according to claim 3, characterized in that:

the preset deep learning model comprises a first convolution layer, a first residual neural network, a second residual neural network, a third residual neural network, a second convolution layer, a maximum pooling layer and a full connection layer which are connected in sequence;

the feature extraction processing is performed on the time domain signal of the audio file through a preset deep learning model to obtain a feature vector serving as the feature of the anchor voiceprint, and the feature vector comprises:

inputting an audio time domain signal of the audio file into a first convolution layer;

inputting the convolution result output by the first convolution layer to the first residual neural network;

inputting the processing result of the first residual neural network into the second residual neural network and the second convolution layer respectively;

inputting the processing result of the second residual neural network into the third residual neural network and the second convolution layer respectively;

inputting the convolution result of the second convolution layer into the maximum pooling layer;

and inputting the pooling result of the maximum pooling layer into the full-connection layer, and outputting the feature vector through the full-connection layer.

5. The method of claim 2, wherein the step of determining the position of the substrate comprises,

Before extracting the anchor voiceprint features in the audio file uploaded by the anchor terminal, the method comprises the following steps:

acquiring a playback link, which is generated by the anchor at the end of the trial, of the anchor terminal and is used for being linked to the audio file;

and acquiring the audio file through the playback link.

6. The method according to claim 5, wherein:

the step of performing feature matching on the anchor voiceprint feature and the voiceprint feature in a preset voiceprint library to determine whether the anchor voiceprint feature exists in the preset voiceprint library, includes:

judging whether the anchor voiceprint features are matched with the voiceprint features in the preset voiceprint library or not;

if the voice print characteristics are not matched, judging whether the anchor identity identification corresponding to the anchor voice print characteristics is the same as the anchor identity identification corresponding to the voice print characteristics in the preset voice print library;

if the features are different, judging that the anchor voiceprint features do not exist in the preset voiceprint library.

7. The method according to claim 6, wherein:

after the feature matching is performed on the anchor voiceprint feature and the voiceprint feature in the preset voiceprint library so as to judge whether the anchor voiceprint feature exists in the preset voiceprint library, the method comprises the following steps:

If the host voice print characteristics are judged not to exist in the preset voice print library, adding the host voice print characteristics into the preset voice print library so as to update the preset voice print library.

8. The method of claim 7, wherein the step of determining the position of the probe is performed,

the adding the anchor voiceprint feature into the preset voiceprint library comprises the following steps:

performing feature coding on the anchor voiceprint features to obtain index codes;

and adding the anchor voiceprint features and the index code association into the preset voiceprint library.

9. The method of claim 6, wherein the step of providing the first layer comprises,

after the step of judging whether the anchor identity identification corresponding to the anchor voiceprint feature is the same as the anchor identity identification corresponding to the voiceprint feature in the preset voiceprint library, the method comprises the following steps:

if the features are the same, judging that the anchor voiceprint features exist in the preset voiceprint library;

and if the anchor voiceprint features are judged to exist in the preset voiceprint library, not adding the anchor voiceprint features into the preset voiceprint library.

10. The method of claim 6, wherein the step of providing the first layer comprises,

after the judging whether the anchor voiceprint feature is matched with the voiceprint feature in the preset voiceprint library, the method comprises the following steps:

if not, judging whether the anchor voiceprint features have corresponding anchor identity identification marks or not;

if the corresponding anchor identity identification is not determined, the anchor identity identification is allocated to the anchor, the anchor identity identification is bound with the anchor voiceprint feature, and then whether the anchor identity identification corresponding to the anchor voiceprint feature is identical with the anchor identity identification corresponding to the voiceprint feature in the preset voiceprint library is determined.

11. The method according to claim 2, characterized in that:

the extracting the anchor voiceprint feature of the audio file uploaded by the anchor terminal comprises the following steps:

carrying out noise reduction treatment on the audio file and extracting a voice fragment carrying the anchor voice in the audio file;

extracting the anchor voiceprint features from the speech segment.

12. The method of claim 11, wherein the step of determining the position of the probe is performed,

the step of carrying out noise reduction processing on the audio file and extracting the voice fragment carrying the anchor voice in the audio file comprises the following steps:

Removing a background sound part in the audio file, and carrying out noise reduction treatment on the audio file to obtain a foreground sound part;

and eliminating non-voice fragments which do not carry the anchor voice in the foreground sound part to obtain the voice fragments.

13. The method of claim 11, wherein the step of determining the position of the probe is performed,

the extracting the anchor voiceprint feature from the speech segment includes:

dividing the voice segment into a plurality of voice blocks;

receiving a determination result of whether each voice block is a sound of the anchor;

and if the judgment result is yes, extracting voiceprint features of the voice block.

14. A computer device comprising a processor, a memory, and a communication circuit; the communication circuit and the memory are coupled to the processor; the memory stores a computer program for execution by the processor to implement the live-based virtual resource allocation method of any of claims 1-13.

15. A computer readable storage medium storing a computer program for execution by a processor to implement the live-based virtual resource allocation method of any of claims 1-13.