CN111314788A

CN111314788A - Voice password returning method and presenting method, device and equipment for voice gift

Info

Publication number: CN111314788A
Application number: CN202010177339.4A
Authority: CN
Inventors: 岑焕成; 陈杰; 杨克敏; 林可恩
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2020-06-19

Abstract

The embodiment of the application provides a voice password return method, a presentation device and equipment of a voice gift, wherein the return method extracts audio information from each effective sound acquisition channel by acquiring a voice password acquisition instruction corresponding to the voice gift issued by a server; respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files, and selecting a voice password acquisition channel from the effective sound acquisition channels according to the matching degree; and generating voice password return information according to the audio information corresponding to the voice password acquisition channel, and uploading the voice password return information to the server for identification. According to the technical scheme, the audio information of the optimal channel is selected from the multiple effective sound acquisition channels and transmitted to the server, so that the data transmission amount is reduced, the transmission efficiency is improved, and the voice processing time of the voice gift is shortened.

Description

Voice password returning method and presenting method, device and equipment for voice gift

Technical Field

The present application relates to the field of live broadcast technologies, and in particular, to a method and an apparatus for returning a voice password of a voice gift, a method and an apparatus for presenting a voice gift, a computer device, and a storage medium.

Background

With the development of network technology, real-time video communication such as live webcast and video chat room becomes an increasingly popular entertainment mode. In the real-time video communication process, the interactivity among users can be increased by giving away the virtual gift to show the special effect.

However, in the related live broadcast technology, when a host broadcasts directly by using a PC (Personal Computer), voice information collected by multiple sound devices such as a microphone often needs to be uploaded to a server for corresponding voice processing. Because there are a plurality of sound collection channels at the PC end, the audio information of the anchor can be gathered to a plurality of sound collection channels homoenergetic, when the anchor carries out the live broadcast of some scenes, if the speech information who gathers each sound collection channel uploads to the server and discerns and causes the information redundancy easily, make and upload to the server audio information volume big, lead to transmission efficiency low, speech processing time is long, has influenced the live broadcast effect.

Disclosure of Invention

The purpose of this application aims at solving at least one of above-mentioned technical defect, and the speech processing time is long especially, has influenced the problem of live broadcast effect.

In a first aspect, an embodiment of the present application provides a voice password returning method for a voice gift, including the following steps:

acquiring a voice password acquisition instruction corresponding to a voice gift issued by a server, and extracting audio information from each effective voice acquisition channel according to the voice password acquisition instruction;

respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files, and selecting a voice password acquisition channel from the effective sound acquisition channels according to the matching degree;

and generating voice password return information according to the audio information corresponding to the voice password acquisition channel, and uploading the voice password return information to the server for identification.

In an embodiment, the step of respectively matching the audio information of the effective sound collection channels with a locally pre-stored standard sound file includes:

acquiring a voice password recording time length corresponding to the voice gift, and acquiring audio information corresponding to each time slice of each effective sound acquisition channel according to a preset time slice in the recording time length;

and respectively matching the audio information in the time slice corresponding to each effective sound acquisition channel with a locally pre-stored standard sound file.

In an embodiment, before the step of extracting audio information from each valid sound collection channel according to the voice password collection instruction, the method further includes:

monitoring the input state of each audio input channel;

and determining effective sound collection channels according to the current input state of each audio input channel.

In an embodiment, the step of respectively matching the audio information of the effective sound acquisition channels with a locally pre-stored standard sound file, and selecting a voice password acquisition channel from the effective sound acquisition channels according to the matching degree includes:

when a voice password is collected for the first time, matching the audio information collected by each effective sound collection channel with a reference silent file in sequence at set time intervals to obtain a first error value corresponding to each set time interval, and determining a sound file part from the current audio information according to the first error value;

and performing matching error calculation on the audio information of the sound file part and the corresponding standard sound file to obtain a second error value corresponding to each effective sound acquisition channel, and determining a voice password acquisition channel according to the second error value.

In one embodiment, the step of determining the voice password collecting channel according to the second error value comprises:

and comparing second error values corresponding to the effective sound acquisition channels, and determining the effective sound acquisition channel with the minimum second error value as a voice password recording channel.

In an embodiment, the step of respectively matching the audio information of the effective sound acquisition channels with a locally pre-stored standard sound file, and selecting a voice password acquisition channel from each effective sound acquisition channel according to the matching degree includes:

when the voice password is not collected for the first time, carrying out error calculation on the audio information collected by the voice password collecting channel selected by the last voice password collection and the corresponding standard voiced file to obtain a corresponding error value of this time;

and if the difference between the error value of the current time and the error value corresponding to the voice password acquisition channel selected by the last voice password acquisition is smaller than a first preset threshold, determining the voice password acquisition channel selected by the last voice password acquisition as the current voice password acquisition channel.

starting an analysis process in the process of extracting audio information from each effective sound acquisition channel in the acquisition process;

and respectively matching the audio information of the effective sound acquisition channel with a locally pre-stored standard sound file through the analysis process.

In a second aspect, an embodiment of the present application further provides a method for giving a voice gift, including the following steps:

receiving voice gifts given by audiences and voice password acquisition instructions sent by a server, and extracting audio information from each effective sound acquisition channel according to the voice password acquisition instructions;

respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files, and selecting a voice password acquisition channel from each sound acquisition channel according to the matching degree;

generating voice password return information according to the audio information corresponding to the voice password acquisition channel, and uploading the voice password return information to the server; the server matches the voice password return information with the text content of the voice gift;

and receiving a voice password matching result returned by the server, and determining presentation confirmation information of the voice gift according to the voice password matching result.

In an embodiment, the step of receiving the matching result returned by the server and displaying the gift-giving confirmation information according to the matching result includes:

and receiving the matching value returned by the server, and displaying gift presentation confirmation information when the matching value is determined to reach a second preset threshold value.

In a third aspect, an embodiment of the present application further provides a voice password returning device for a voice gift, including:

the audio information extraction module is used for acquiring a voice password acquisition instruction corresponding to the voice gift sent by the server and extracting audio information from each effective sound acquisition channel according to the voice password acquisition instruction;

the acquisition channel selection module is used for respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files and selecting voice password acquisition channels from the effective sound acquisition channels according to the matching degree;

and the returned information uploading module is used for generating voice password returned information according to the audio information corresponding to the voice password acquisition channel and uploading the voice password returned information to the server for identification.

In a fourth aspect, an embodiment of the present application further provides a presenting device for a voice gift, including:

the audio information receiving module is used for receiving voice gifts given by audiences and voice password acquisition instructions sent by the server and extracting audio information from each effective sound acquisition channel according to the voice password acquisition instructions;

the acquisition channel selection module is used for respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files and selecting voice password acquisition channels from the sound acquisition channels according to the matching degree;

the returned information uploading module is used for generating voice password returned information according to the audio information corresponding to the voice password acquisition channel and uploading the voice password returned information to the server; the server matches the voice password return information with the text content of the voice gift;

and the confirmation information determining module is used for receiving the voice password matching result returned by the server and determining the presentation confirmation information of the voice gift according to the voice password matching result.

In a fifth aspect, the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the voice password returning method for a voice gift according to any one of the embodiments of the first aspect or the gifting method for a voice gift according to any one of the embodiments of the second aspect when executing the program.

In a sixth aspect, embodiments of the present application further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the steps of the voice password returning method for a voice gift according to any one of the embodiments of the first aspect or the gifting method for a voice gift according to any one of the embodiments of the second aspect.

The voice password returning method, the presenting method, the apparatus, the device and the storage medium for the voice gift provided by the above embodiments extract audio information from each effective sound acquisition channel by acquiring a voice password acquisition instruction corresponding to the voice gift issued by the server; respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files, and selecting a voice password acquisition channel from the effective sound acquisition channels according to the matching degree; and voice password return information is generated according to the audio information corresponding to the voice password acquisition channel, and the voice password return information is uploaded to the server for identification, so that the optimal audio information of one channel is selected from the multiple effective sound acquisition channels and transmitted to the server, the transmission quantity of data is reduced, the transmission efficiency is improved, and the voice processing time of the voice gifts is shortened.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic frame diagram of a webcast system according to an embodiment;

FIG. 2 is a diagram of a presentation window of a voice password provided by an embodiment;

FIG. 3 is a flowchart of a voice password passback method of providing a voice gift according to one embodiment;

FIG. 4 is another diagram of a presentation window for a voice password provided by an embodiment;

FIG. 5 is a flow chart of a method for presenting a voice gift provided by an embodiment;

FIG. 6 is a schematic diagram of an embodiment of a voice password passback device for providing a voice gift;

FIG. 7 is a schematic structural diagram of a presentation apparatus for voice gifts according to an embodiment.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Before describing the method provided by the embodiment of the present application, an application scenario of the embodiment of the present application is described first. Referring to fig. 1, fig. 1 is a schematic diagram of a framework of a live webcast system provided by an embodiment, where the system framework may include a server and clients, each client includes one or more anchor clients (i.e., anchor, the same below) 10 and a plurality of viewer clients (i.e., viewer clients, the same below) 20, and a live platform located on the server may include a plurality of virtual live webcasts and a server 30, where each anchor client 10 and each viewer client 20 establish communication connection with the server 30 through a wired network or a wireless network.

Generally speaking, each virtual live broadcast room correspondingly plays different live broadcast contents, the anchor broadcasts are live broadcast through the anchor client 10, and audiences select to enter a certain virtual live broadcast room through the audience client 20 to watch the anchor broadcast for live broadcast. The viewer client 20 and the anchor client 10 may enter the live platform through a live Application (APP) installed on the terminal device.

The anchor client 10 and the viewer client 20 are terminal devices, and in some embodiments, the anchor client may be a desktop computer or a notebook computer, which is not limited thereto. In this embodiment, the anchor client 10 may be connected to one or more external devices, such as an external camera and a microphone, the viewer client 20 may be a smart phone, a tablet computer, a desktop computer, or a notebook computer, and the server 30 is a background server for providing a background service for the terminal device, and may be implemented by an independent server or a server cluster formed by multiple servers. In one embodiment, the server 30 may be a live web platform.

According to one example of the technical scheme, when the audience selects to present the voice gifts to the target anchor, the anchor receives the voice password task corresponding to the voice gifts, and successfully completes the voice password task within the preset time length which is preset in advance, so that the gift awards corresponding to the voice gifts can be obtained. The voice gift is a novel presentation mode of a virtual gift, in the live broadcast process, an audience user receives the voice gift by carrying voice password content, the anchor broadcasts the voice gift by reading or singing according to the voice password content and the like, the voice gift can enhance the interaction between the audience and the anchor broadcasts, and the liveness of the live broadcast room is improved.

Generally speaking, each voice gift corresponds to different voice passwords, the anchor performs recording according to the voice passwords, if the matching degree of the audio information obtained by recording and the content corresponding to the voice passwords reaches a preset threshold value, the voice password task is successfully completed, and the anchor can obtain the reward corresponding to the voice gift. Optionally, the anchor terminal may buffer the voice gifts into the queue when receiving the plurality of voice gifts, and receive a voice gift task corresponding to a corresponding voice gift when the anchor selects one or more of the voice gifts.

In an embodiment, the virtual gift box at the audience end displays a voice gift. The audience user sends out a voice gift giving request to the target anchor through the server after selecting the voice gift. The server responds to the presentation request of the voice gift to generate a voice password task corresponding to the voice gift, and sends the voice password corresponding to the voice password task to the anchor terminal. In an embodiment, the voice password is presented in text form on the display interface at the anchor side for easy anchor viewing.

Each voice gift is configured with a unique gift identification, and the corresponding voice password task and the corresponding text content can be found out through the gift identification. The main broadcast end receives the voice password task issued by the server and finds out the corresponding voice password through the gift identification. In one embodiment, the voice password is presented in the form of text information, and the text information corresponding to the voice password task is displayed on a display interface of the anchor terminal to prompt the anchor terminal to operate according to the voice password.

Optionally, in an embodiment, the anchor receives a voice password task issued by the server, pops up the display window when the pop-up window is in the open state, and displays the content of the voice password on the display window.

And when receiving the voice password task issued by the server, the anchor terminal triggers the display window. Fig. 2 is a schematic view of a display window of a voice password according to an embodiment, and as shown in fig. 2, a recording button 101, text information 102 corresponding to the voice password, a guidance phrase 103, a countdown progress bar 104, a check item 105 whether to pop up automatically, and the like are displayed on the display window 201. After the automatically popped up checkbox 105 is checked, the text information corresponding to the voice password is automatically displayed on the display window 201, the guide language 103 is used for guiding the anchor to complete the voice gift task, the anchor can click the recording button 101 to record, and the recording is stopped when the preset time length is reached. The countdown progress bar 104 is used to prompt the anchor to complete the remaining time of the voice gift task and automatically stop recording when a preset duration is reached.

Optionally, in an embodiment, when the anchor clicks "rule" in the guidance language, the execution rule of the voice gift task is presented in the form of a web page. In an initial state, a display window is automatically popped up by default selection. When the selection is popped up automatically, the main broadcast receives the voice gift next time and pops up the display window automatically, and when the selection is not popped up automatically, the main broadcast receives the voice gift next time and displays the voice gift in a mode that the display window is shrunk up by default.

Fig. 3 is a flowchart of a voice password returning method for a voice gift according to an embodiment, and as shown in fig. 3, the voice password returning method for the voice gift can be executed in a voice password returning device, such as an anchor client (i.e., an anchor side, the same applies below).

Specifically, the voice password returning method of the voice gift may include the following steps:

s110, acquiring a voice password acquisition instruction corresponding to the voice gift issued by the server, and extracting audio information from each effective voice acquisition channel according to the voice password acquisition instruction.

In the embodiment, when the host receives the voice command acquisition instruction, a display panel can be popped up on a display interface, a recording key is prompted on the display panel, and as shown in fig. 2, the display panel is also displayed with text information of the voice command, such as 'kayi-me like you', and the like.

In an embodiment, after the anchor terminal obtains a voice password acquisition instruction corresponding to the voice gift sent by the server, the anchor terminal needs to trigger the recording key within a preset time to trigger the anchor terminal to extract audio information from each effective sound acquisition channel. In another embodiment, after acquiring a voice password acquisition instruction corresponding to a voice gift sent by a server, the anchor terminal actively triggers to extract audio information from each effective voice acquisition channel.

For example, when the host is a PC, the host can have multiple microphone channels, such as a camera microphone channel of the PC, a microphone channel of a peripheral device connected to the PC, a general microphone channel connected to a MIC-IN socket of a sound card, and other audio input channels.

In an embodiment, before extracting audio information from each valid sound collection channel according to the voice password collection instruction in step S110, the method further includes the following steps:

s100, monitoring the input state of each audio input channel; and determining effective sound collection channels according to the current input state of each audio input channel.

In an embodiment, first, an effective sound collection channel may be selected by detecting all audio devices corresponding to a anchor terminal and by an input state of an audio input channel of each audio device, where the input state may include: fault, normal and unconnected, etc. Optionally, all audio devices corresponding to the anchor terminal are obtained in a DirectSound manner, and an effective sound collection channel is selected according to an input state of an audio input channel of each audio device, where the input state may include: fault, normal and unconnected, etc. DirectSound is a bottom layer component of DirectXAudio, provides rich interface functions, and realizes the playing control of waveform sound data in wav format. Unlike the sound playing function provided by the general Windows API, DirectSound can realize the mixed playing of multiple sounds. The DirectSound can fully use the memory resource of the sound card, and simultaneously provides a 3D sound effect algorithm to simulate real 3D stereo.

Furthermore, another monitoring process different from the voice password acquisition process can be established to monitor the input state of each audio input channel in real time, so that the voice password acquisition process and the audio input channel monitoring process are not affected by each other. The valid sound collection channels are recalibrated when hot-plugging of the audio input channel corresponding device occurs.

And S120, respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files, and selecting a voice password acquisition channel from the effective sound acquisition channels according to the matching degree.

In an embodiment, the voice password may have different contents, for example, it may be a sentence read by pronouncing, a song sing a piece of lyrics, a piece of lyrics sing in accompaniment, a piece of music playing or a clapping sound. Correspondingly, the standard sound files pre-stored locally include standard sound files, such as sound file types of pure voices, pure music, mixed voice and the like, and can also be various types of reference silent file types and the like.

And respectively matching the audio information acquired by the effective sound acquisition channel with a corresponding locally pre-stored standard sound file, wherein if the voice password is a pronunciation-reading sentence, the corresponding standard sound file comprises a pure vocal sound file, and if the voice password is a vocal accompaniment of a piece of lyric, the corresponding standard sound file comprises a vocal sound file with vocal mixed sound, and the like.

The explanation is given by taking a voice password as a pronouncing and reading sentence, such as kayi-me like you, wherein the effective sound acquisition channels comprise a camera sound acquisition channel, a common microphone channel and an external device sound acquisition channel.

When the anchor broadcasts the voice password in the live broadcasting process, the audio information corresponding to the voice password can be collected by the camera sound collection channel, the general microphone channel and the sound collection channel of the play-out equipment, but the quality of the audio information of the voice password collected by the effective sound collection channel is different due to the influence of the environment, such as the distance from the sound-emitting part of the anchor, the influence of surrounding noise, the influence of the collection performance of the equipment and the like, so that the identification of the voice password return information by a subsequent server is influenced.

In this embodiment, the camera sound collection channel, the general microphone channel and the external device sound collection channel can collect the audio information and match the audio information with the pre-stored standard sound file locally, and generally speaking, the higher the quality of the collected audio information is, the higher the matching degree is. Optionally, the path of valid sound acquisition channel with the highest matching degree is selected as a voice password acquisition channel, and if the matching degree corresponding to the general microphone channel worn on the anchor collar is the highest, the general microphone channel worn on the anchor collar is selected as the voice password acquisition channel.

In an embodiment, the extraction of the audio information from each effective sound collection channel may be performed by a collection process, and in the process of extracting the audio information from each effective sound collection channel in the collection process, an analysis process is started; the audio information of the effective sound acquisition channel is matched with a standard sound file pre-stored locally through the analysis process, and different tasks are executed in parallel through the acquisition process and the analysis process, so that the audio information acquisition and the audio information matching are not influenced mutually, the processing efficiency of the audio information is improved, and the processing time of the voice gift is shortened.

S130, generating voice password return information according to the audio information corresponding to the voice password acquisition channel, and uploading the voice password return information to the server for identification.

And compressing and packaging the audio information corresponding to the voice password acquisition channel to generate voice password return information, and uploading the voice password return information to the server side from the anchor side for identification.

Fig. 4 is another schematic diagram of the display window of the voice password according to an embodiment, as shown in fig. 4, the display window of the anchor displays an identification frame 106, and the identification frame 106 displays characters such as "identify in the middle" to remind the anchor server that the voice password returned information is being identified. Optionally, the presentation window may also display the identified result, such as a matching value, a nickname of the viewer's user presenting the voice gift, and the like.

Optionally, the voice password feedback information may be divided into information in a plurality of time slices for real-time transmission for a plurality of times, and if the audio information corresponding to the voice password acquisition channel is audio information with a duration of 30 seconds, the audio information with a duration of 30 seconds is divided into 3 time slices at intervals of 10 seconds in the recording process. When the timing duration reaches a set time interval, if the timing duration reaches 10 seconds, packaging the audio information recorded within the 10 seconds into corresponding voice password return information and transmitting the voice password return information to the server, and if the timing duration reaches 10 seconds, transmitting first voice password return information corresponding to the first audio information recorded within the 1 st to 10 th seconds to the server for identification; and when the timing duration reaches 20 seconds, transmitting second voice password return information corresponding to second audio information recorded from 11 seconds to 20 seconds to the server for identification, and so on, and when the timing duration reaches 30 seconds, transmitting third voice password return information corresponding to third audio information recorded from 21 seconds to 30 seconds to the server for identification through the server.

And after the recording equipment is started, the anchor records the audio information according to the voice password presented on the display interface of the client. Furthermore, the anchor terminal transmits the audio information recorded by the recording equipment to the server in real time, so that the audio information is subjected to real-time voice recognition through the server, and the audio information is not required to be transmitted after the preset time length of audio recording is finished, and the processing efficiency of the voice password return information is improved.

In the voice password returning method for the voice gift provided by this embodiment, audio information is extracted from each effective sound acquisition channel by acquiring a voice password acquisition instruction corresponding to the voice gift issued by a server; respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files, and selecting a voice password acquisition channel from the effective sound acquisition channels according to the matching degree; and voice password return information is generated according to the audio information corresponding to the voice password acquisition channel, and the voice password return information is uploaded to the server for identification, so that the optimal audio information of one channel is selected from the multiple effective sound acquisition channels and transmitted to the server, the transmission quantity of data is reduced, the transmission efficiency is improved, and the voice processing time of the voice gifts is shortened.

In order to make the technical solution clearer and easier to understand, specific implementation processes and modes of a plurality of steps in the technical solution are described in detail below.

In an embodiment, the step S120 of respectively matching the audio information of the effective sound collecting channels with a locally pre-stored standard sound file may include the following steps:

s1201, acquiring a voice password recording duration corresponding to the voice gift, and acquiring audio information corresponding to each time slice of each effective sound acquisition channel according to preset time slices in the recording duration.

In this embodiment, the voice password needs to be recorded within a preset time period, which may be the time for starting the recording time period when the anchor clicks the recording button, or the time may be automatically started when the anchor receives the voice password.

In order to improve the efficiency and accuracy of identifying the voice password, in the embodiment, a plurality of time slices are divided according to the recording duration of the voice password, each effective sound acquisition channel carries out segmentation processing and acquisition on the audio signal in the recording duration according to the time slice, and a main buffer area and a secondary buffer area are dynamically created for reading and storing. Optionally, the audio information in each time slice is stored in an array form to form a time slice array.

For example, the recording time length of the voice password is 15 seconds, and the recording time length of the voice password is divided into 3 time slices every 5 seconds: 0-4 seconds are the first time slice, 5-9 seconds are the second time slice, and 10-14 seconds are the third time slice. And collecting audio information of each effective sound collection channel in each time slice, such as a camera sound collection channel, a common microphone channel and an external playing device sound collection channel, and storing the audio information in a corresponding time slice array.

And S1202, respectively matching the audio information in the time slice corresponding to each effective sound acquisition channel with a locally pre-stored standard sound file.

According to the time slice form, the audio information in the time slice corresponding to each effective sound acquisition channel is respectively matched with one or more locally pre-stored sound standard files, and optionally, the timbre, the quality, the definition and the like of the audio information in the time slice corresponding to each effective sound acquisition channel can be matched and calculated.

Optionally, in an embodiment, the step S120 of respectively matching the audio information of the effective sound acquisition channels with a locally pre-stored standard sound file, and selecting a voice password acquisition channel from the effective sound acquisition channels according to the matching degree may include the following steps:

s201, when a voice password is collected for the first time, matching the audio information collected by each effective sound collection channel with a reference silent file in sequence at set time intervals to obtain a first error value corresponding to each set time interval, and determining a sound file part from the current audio information according to the first error value.

In an embodiment, the set time interval may be a basic unit of seconds, such as matching the audio information with the reference silent file every 1 second, that is, every second, or matching the audio information with the reference silent file every 2 seconds, and the like.

When a voice gift is received for the first time after a main broadcast is started and voice passwords are collected and recorded, all effective voice collection channels, such as audio information collected by a camera voice collection channel, a common microphone channel and a sound collection channel of an external playing device, are sequentially matched with a reference silent file from the beginning of recording time length timing, a first error value in each second is calculated, if the first error value is within a threshold value, no voice input is considered, error detection of the audio information in the next second is continued, otherwise, voice input in the current time is judged, and the step S202 is switched to and processing is continued.

S202, matching error calculation is carried out on the audio information of the sound file part and the corresponding standard sound file to obtain second error values corresponding to the effective sound acquisition channels, and a voice password acquisition channel is determined according to the second error values.

And when the audio information corresponding to the current time is judged to be in the part of the audio file, matching the audio information read in the next second with the corresponding standard audio file to calculate an error value, and if the voice password is a pronouncing sentence, matching the read audio information with the corresponding standard pure human audio file. Optionally, the feature points of the audio information may be extracted, the feature points and the standard pure human voice file are subjected to feature point information matching, and a second error value between the audio information and the standard pure human voice file within the second is calculated. The smaller the second error value, the more matched it is to the standard voiced document being matched.

Further, in an embodiment, the determining the voice password collecting channel according to the second error value in step S202 may include the following steps:

s2021, comparing second error values corresponding to the effective sound acquisition channels, and determining the effective sound acquisition channel with the minimum second error value as a voice password recording channel.

Further, a plurality of second error values obtained by the audio file portion of each effective audio acquisition channel within the recording duration are averaged to obtain a second error average value corresponding to each effective audio acquisition channel, and optionally, the effective audio acquisition channel with the smallest second error average value may be determined as the audio password acquisition channel.

It should be noted that, considering that there may be a preparation time at the beginning when the anchor records the voice password, the anchor often does not record sound within the preparation time, resulting in a section of silent audio information, and in the error calculation process, the error calculation efficiency of the standard audio file is lower than that of the standard silent file, if the whole section of audio information is directly subjected to error calculation with the standard audio file, the data processing efficiency will be reduced undoubtedly, and when there are more effective sound collection channels, the whole section of audio information is directly subjected to error calculation with the standard audio file, the data processing efficiency will be further reduced.

In this embodiment, the audio information acquired by each effective sound acquisition channel is first subjected to matching calculation with a reference silent file one second by one, when a large error is detected, it is indicated that sound input is started from the moment of the audio information, then matching is performed with a standard sound file, and matching calculation is performed on a sound file part in the audio information at the moment and the standard sound file, so that a voice password recording channel is determined, the data processing efficiency of the audio information is improved, and a suitable voice password recording channel is favorably and quickly selected.

According to the service characteristics of the anchor live broadcast, if the anchor is always live broadcast at the same position in the process, if the anchor is sitting at the same position for live broadcast, the position is not changed or the sound channel is switched frequently. In an embodiment, the step S120 of respectively matching the audio information of the valid sound acquisition channels with a locally pre-stored standard sound file, and selecting a voice password acquisition channel from each valid sound acquisition channel according to the matching degree may include the following steps:

s203, when the voice password is not collected for the first time, carrying out error calculation on the audio information collected by the voice password collecting channel selected by the last voice password collection and the corresponding standard audio file to obtain the corresponding error value of the time.

In an embodiment, the anchor system may automatically record the recognition result of the last voice password collection, such as the selected voice password collection channel and the second error value.

When the voice password collection is not the first voice password collection, error calculation is preferentially carried out on the audio information collected by the voice password collection channel selected by the last voice password collection and the corresponding standard voiced file to obtain the corresponding error value of the time. And selecting the camera microphone channel as the voice password acquisition channel for the last voice password acquisition, and performing error calculation on the audio information acquired by the camera microphone channel and the corresponding standard audio file when the voice password is acquired.

S204, if the difference between the error value of the current time and the error value corresponding to the voice password collecting channel selected in the last voice password collecting is smaller than a first preset threshold value, determining the voice password collecting channel selected in the last voice password collecting as the voice password collecting channel of the current time.

Continuing with the previous example, if the difference between the current error value obtained by performing error calculation on the audio information acquired by the camera microphone channel and the corresponding standard audio file and the error value acquired by the last voice password of the camera microphone channel is smaller than the first preset threshold, it indicates that the anchor does not change the effective sound acquisition channel, and the anchor directly serves as the current voice password acquisition channel to the camera microphone channel without performing calculation analysis on the reference mute file and other effective sound acquisition channels.

If the difference between the error value of this time and the error value corresponding to the last voice password collection and selection voice password collection channel is greater than the first preset threshold, it indicates that the anchor may have switched the valid sound collection channel, or the distance between the anchor and the valid sound collection channel is changed, so that the current voice password collection channel is different from the last selected voice password collection channel, and step S201 is executed again to select the current voice password collection channel from other valid sound collection channels.

In the related art, the PC anchor uploads the audio signals collected by all the microphone channels to the server to identify the content of the audio signals through the server, so that the data size is large and the identification time is long. In the scheme, firstly, the change of all microphone input channels in the PC anchor terminal is monitored in real time, audio signals in the microphone input channels are matched with corresponding standard sound files, a voice password recording channel and corresponding audio information are selected and uploaded to a server for content identification, and the data transmission quantity and the time for identifying the content are reduced.

In order to make the technical solutions provided by the embodiments of the present application clearer, the following describes a voice password returning method of a voice gift with reference to the following examples.

S401, after receiving a voice password acquisition instruction sent by a server, a PC anchor terminal starts a service Process A and starts an acquisition Process B for acquiring audio signals by using an IPC (Inter-Process Communication) channel mode.

And the business process A transmits the recording time allTime of the voice password and the divided time slice array timeSlice to the acquisition process B.

S402, the acquisition process B acquires audio signals of all effective sound acquisition channels in real time and transmits the audio signals to the analysis process C.

The acquisition process B acquires all audio equipment of the system in a DirectSound mode, screens out all effective microphone input channels, establishes a monitoring thread for each effective channel, and recalibrates the number of the effective microphone input channels when the equipment corresponding to the channel is hot plugged; reading and storing by dynamically establishing a main buffer area and an auxiliary buffer area by acquiring PCM data of each channel and then performing segmentation operation according to the timeslices; the data is copied to analysis process C.

And S403, the analysis process C collects the audio information collected by each effective sound collection channel and performs matching analysis calculation with a preset standard sound file, and the analysis calculation result is transmitted back to the collection process B.

The analysis process C receives the Audio information of the plurality of effective sound collection channels sent by the collection process B, and the formats of the Audio information may be PCM (Pulse Code Modulation), WAV, MP3(Moving Picture expert group Audio Layer III, Moving Picture expert compression standard Audio Layer 3), AAC (Advanced Audio coding), and the like; AI (Artificial Intelligence) analysis is carried out on each effective sound acquisition channel, audio information of each effective sound acquisition channel is read one by one, firstly, the effective sound acquisition channel is matched with a reference silent file, an error is calculated, if the error is within a threshold value, no sound is input, detection is continued, otherwise, the effective sound acquisition channel is preliminarily judged to be sound input, the next step is carried out, matching of characteristic point information is carried out on sound part data of each effective sound acquisition channel and a standard sound file (pure voice, pure music or sound mixing sound), a channel with the minimum error with the standard sound file is calculated to be used as a voice password recording channel, then, an identification result is informed to an acquisition process B, the current voice password recording channel and the error value are recorded, the next step calculates the error between the channel and the reference voice data firstly, the last error is compared, when the comparison value is smaller than a specific threshold value Y, it can be obtained that the anchor does not switch microphone channels, and directly informs the last channel to the acquisition process B without silence matching and continuous calculation and analysis of other channels, thereby improving the selection efficiency of the voice password recording channel.

S404, the acquisition process B uploads the audio information of the voice password recording channel to a server for content identification.

S405, when the business process A receives another voice password acquisition instruction, the steps S401 to S404 are repeatedly executed.

Based on the scheme, different processes are used for processing different services, interference is reduced, and uploading of data from a multi-microphone channel to a single-microphone channel is realized by analyzing the only voice password recording channel through AI, so that transmission consumption and total time for identifying content are greatly saved.

Fig. 5 is a flowchart of a method for presenting a voice gift, which can be performed by a device for presenting a voice gift, such as a host.

Specifically, as shown in fig. 5, the method for presenting a voice gift may include the following steps:

and S510, receiving voice gifts given by audiences and voice password acquisition instructions sent by the server, and extracting audio information from each effective sound acquisition channel according to the voice password acquisition instructions.

It should be noted that, when the host is a PC, the host typically has multiple microphone channels for PC, such as a camera microphone channel of the PC itself, a microphone channel of a peripheral device connected to the PC, a general microphone channel connected to a MIC-IN socket of a sound card, and other audio input channels.

And S520, respectively matching the audio information of the effective sound acquisition channels with a locally pre-stored standard sound file, and selecting a voice password acquisition channel from each sound acquisition channel according to the matching degree.

When the anchor broadcasts the voice password in the live broadcasting process, the audio information corresponding to the voice password can be collected by the camera sound collection channel, the general microphone channel and the sound collection channel of the play-out equipment, but the quality of the audio information of the voice password collected by the effective sound collection channel is different due to the influence of the environment, such as the distance from the sound-emitting part of the anchor, the influence of surrounding noise, the influence of the collection performance of the equipment and the like, so that the identification of the voice password return information by a subsequent service area is influenced.

S530, generating voice password return information according to the audio information corresponding to the voice password acquisition channel, and uploading the voice password return information to the server; and the server matches the voice password return information with the text content of the voice gift.

And after the recording equipment is started, the anchor records the audio information according to the voice password presented on the display interface of the client. Furthermore, the anchor terminal transmits the audio information recorded by the recording equipment to the server in real time, so that the audio information is subjected to real-time voice recognition through the server, and the audio information is not required to be transmitted after the preset time length of audio recording is finished, and the processing efficiency of the voice password is improved.

In an embodiment, after receiving the voice password return information, the server converts the voice password return information into corresponding text information, and matches the text information with the text content preset by the voice gift. Optionally, the text information and the text content may be presented in the form of pinyin letters, if the pinyin letters are the same, the corresponding text of the pinyin letters is considered to be the same, and then the matching degree between the voice password return information and the text content is calculated according to the ratio of the number of the same text to the total number corresponding to the text content.

In an embodiment, the server receives the voice password return information corresponding to each time slice, identifies the voice password return information of each time slice in real time, matches the voice password return information with the preset text content of the voice gift, and further counts the matching degree of the voice password return information of all the time slices corresponding to the voice gift with the preset text content.

And S540, receiving a voice password matching result returned by the server, and determining presentation confirmation information of the voice gift according to the voice password matching result.

In this embodiment, after the server calculates the matching result, the server returns the matching result to the anchor terminal to display the matching result through the display interface of the anchor terminal. Alternatively, the matching result may be represented by a numerical value of the matching degree.

In an embodiment, the anchor terminal receives the matching value returned by the server, and displays the gift-giving confirmation information when the matching value is determined to reach a second preset threshold value.

The second preset threshold may be set according to an actual situation, for example, the higher the matching degree is, the closer the audio information recorded by the anchor is to the content of the voice password is. In an embodiment, prompt information of successful matching, such as a returned numerical value of the matching degree or a prompt of a picture of successful matching, may be displayed through a display window of the anchor display interface.

If the matching degree reaches a second preset threshold, the content recorded by the anchor broadcast is in accordance with the voice password, at this time, the voice gift is successfully received, and a confirmation message of successful gift presentation is displayed, wherein the confirmation message of successful gift presentation may include a notification that the voice password is successfully recorded, a virtual currency price value corresponding to the voice gift, a gift special effect corresponding to the voice gift, and the like.

In one embodiment, after the presentation window presents the information that the execution of the voice password is successful, the presentation window presents an automatic message after a set time, such as 3 seconds. In another embodiment, the special effects of the voice gift may also be presented in a live room.

Optionally, in an embodiment, when multiple voice gifts are triggered simultaneously in the same live broadcast room, the voice password of the presentation window is presented as queuing logic. And when the voice password task corresponding to the current voice gift is not executed, triggering the display window of the next voice gift to perform queuing processing, and not displaying by the anchor terminal.

After the voice password task is successfully executed, when a plurality of gift special effects to be displayed exist, the gift special effects can be cached in a special effect display queue for queuing, audience users giving the voice gifts can preferentially see the gift special effects corresponding to the voice gifts when the voice gifts are successfully given, and the rest audience users and the anchor broadcast display according to the queue logic of the journey.

If the matching degree is lower than a second preset threshold, it indicates that the content recorded by the main broadcast does not conform to the voice password, the execution of the voice password task fails, and a gift presentation failure confirmation message is displayed, wherein the gift presentation failure confirmation message may include a notification of the execution failure of the voice password task.

In one embodiment, after the presentation window presents the failure information of the execution of the voice password task, the presentation window presents an automatic message after a set time, such as 1 second. The preset threshold may be set according to an actual situation, and optionally, the preset threshold is 80%.

Further, after the execution condition of the voice password task is fed back to the anchor terminal, a gift presentation confirmation notification can be fed back to the audience terminal presenting the voice gift, for example, a notification of the failure of the voice gift presentation is sent to the audience terminal, and the audience user is asked whether to send the voice gift again; and when the voice gift presentation is successful, displaying the voice gift presentation information on a public screen of the live broadcast room.

In the method for presenting a voice gift provided by this embodiment, audio information is extracted from each effective sound acquisition channel by receiving a voice gift presented by a viewer and a voice password acquisition instruction issued by a server; respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files, and selecting a voice password acquisition channel from each sound acquisition channel according to the matching degree; generating voice password return information according to the audio information corresponding to the voice password acquisition channel, and uploading the voice password return information to the server; the server matches the voice password returned information with the text content of the voice gift; and receiving a voice password matching result returned by the server, and determining presentation confirmation information of the voice gift according to the voice password matching result, so that the audio information of the optimal channel selected from the multiple effective sound acquisition channels is transmitted to the server, the transmission quantity of data is reduced, the transmission efficiency is improved, and the processing efficiency of the voice gift in the presentation process is improved.

The following describes a related embodiment of the voice password returning device of the voice gift in detail.

Fig. 6 is a schematic structural diagram of an exemplary voice password returning device for a voice gift, which can be disposed at a host end, as shown in fig. 6, the voice password returning device 100 for a voice gift can include: the audio information extraction module 110, the collection channel selection module 120, and the return information uploading module 130.

The voice password returning device provided by the embodiment can realize that the audio information of the optimal channel is selected from the multiple effective sound collecting channels and transmitted to the server, so that the data transmission amount is reduced, the transmission efficiency of the related information of the voice gift is improved, and the voice processing time of the voice gift is shortened.

In one embodiment, the acquisition channel selection module 120 includes: the time slice information matching unit is used for matching the time slice information with the time slice information acquisition unit; the time slice information acquisition unit is used for acquiring a voice password recording time length corresponding to the voice gift, and acquiring audio information in each time slice of each effective sound acquisition channel according to a preset time slice in the recording time length; and the time slice information matching unit is used for respectively matching the audio information in the time slices corresponding to the effective sound acquisition channels with a locally pre-stored standard sound file.

In one embodiment, the voice password returning apparatus 100 further comprises: the monitoring module is used for monitoring the input state of each audio input channel; and determining effective sound collection channels according to the current input state of each audio input channel.

In one embodiment, the acquisition channel selection module 120 includes: the system comprises a first matching module and a second matching module, wherein the first matching module is used for matching audio information acquired by each effective sound acquisition channel with a reference silent file at set time intervals in sequence when a voice password is acquired for the first time to obtain a first error value corresponding to each set time interval, and determining a sound file part from current audio information according to the first error value; and the second matching module is used for performing matching error calculation on the audio information of the audio file part and the corresponding standard audio file to obtain a second error value corresponding to each effective sound acquisition channel, and determining a voice password acquisition channel according to the second error value.

In an embodiment, the second matching module is configured to compare second error values corresponding to the valid sound acquisition channels, and determine the valid sound acquisition channel with the smallest second error value as the voice password recording channel.

In one embodiment, the acquisition channel selection module 120 includes: a third matching module and a determining module; the third matching module is used for carrying out error calculation on the audio information acquired by the voice password acquisition channel selected by the last voice password acquisition and the corresponding standard voiced file to obtain a corresponding current error value when the voice password is not acquired for the first time; and the determining module is used for determining the voice password acquisition channel selected by the last voice password acquisition as the current voice password acquisition channel if the difference between the error value of the current time and the error value corresponding to the voice password acquisition channel selected by the last voice password acquisition is smaller than a first preset threshold value.

In one embodiment, the acquisition channel selection module 120 includes: the device comprises an analysis process starting unit and an analysis process matching unit; the analysis process starting unit is used for starting an analysis process in the process of extracting audio information from each effective sound acquisition channel in the acquisition process; and the analysis process matching unit is used for respectively matching the audio information of the effective sound acquisition channel with a locally pre-stored standard sound file through the analysis process.

When the voice password returning device for the voice gift provided by the above-mentioned embodiment executes the voice password returning method for the voice gift provided by any of the above-mentioned embodiments, it has corresponding functions and beneficial effects.

The following describes in detail embodiments related to the gifting device for voice gifts.

Fig. 7 is a schematic structural diagram of an embodiment of a device for presenting a voice gift, which may be provided at a host end, and as shown in fig. 7, the device 500 for presenting a voice gift may include: an audio information receiving module 510, a collecting channel selecting module 520, a return information uploading module 530 and a confirmation information determining module 540.

The audio information receiving module 510 is configured to receive a voice gift given by a viewer and a voice password acquisition instruction sent by a server, and extract audio information from each effective sound acquisition channel according to the voice password acquisition instruction; the acquisition channel selection module 520 is used for respectively matching the audio information of the effective sound acquisition channels with locally pre-stored standard sound files and selecting voice password acquisition channels from the sound acquisition channels according to the matching degree; a returned information uploading module 530, configured to generate voice password returned information according to the audio information corresponding to the voice password acquisition channel, and upload the voice password returned information to the server; the server matches the voice password return information with the text content of the voice gift; and the confirmation information determining module 540 is configured to receive the voice password matching result returned by the server, and determine the presentation confirmation information of the voice gift according to the voice password matching result.

The device for presenting a voice gift provided above has corresponding functions and benefits when it executes the method for presenting a voice gift provided in any of the above embodiments.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the voice password returning method or the giving method of the voice gift in any embodiment is realized.

Optionally, the computer device may be a tablet computer, a server, or the like. When the computer device provided by the above executes the voice password returning method or the presenting method of the voice gift provided by any of the above embodiments, the computer device has corresponding functions and advantages.

Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a voice password passback method for a voice gift, including:

Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for presenting a voice gift, including:

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the operation of the voice password returning method or the presenting method of the voice gift described above, and may also perform the relevant operations in the voice password returning method or the presenting method of the voice gift provided by any embodiments of the present invention, and has corresponding functions and advantages.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the voice password returning method or the presenting method of the voice gift according to any embodiment of the present invention.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A voice password returning method of a voice gift is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step of matching the audio information of the valid sound collection channel with the locally pre-stored standard sound file comprises:

3. The method of claim 1, further comprising, before the step of extracting audio information from each valid sound capture channel according to the voice password capture command:

monitoring the input state of each audio input channel;

4. The method according to claim 1, wherein the step of matching the audio information of the valid sound collecting channels with the standard sound files stored locally and selecting the sound password collecting channel from the valid sound collecting channels according to the matching degree comprises:

5. The method of claim 4, wherein the step of determining the voice password collecting channel according to the second error value comprises:

6. The method according to claim 1, wherein the step of matching the audio information of the valid sound capturing channels with a standard sound file stored locally comprises the steps of:

7. The method according to claim 1, wherein the step of matching the audio information of the valid sound collection channel with the locally pre-stored standard sound file comprises:

8. A presenting method of a voice gift is characterized by comprising the following steps:

9. The data processing method of voice gifts according to claim 8, wherein the step of receiving the matching result returned from the server and presenting the gift-giving confirmation information according to the matching result comprises:

10. A voice password passback device for voice gifts, comprising:

11. A presentation device of a voice gift, comprising:

12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the voice password passback method of a voice gift of any one of claims 1 to 7 or the gifting method of a voice gift of any one of claims 8 to 9.

13. A storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing the steps of the voice password passback method of the voice gift of any one of claims 1 to 7 or the gifting method of the voice gift of any one of claims 8 to 9.