CN113596381B

CN113596381B - Audio data acquisition method and device

Info

Publication number: CN113596381B
Application number: CN202110747749.2A
Authority: CN
Inventors: 乔岩; 艾清波; 卢燕青; 杨春晖
Original assignee: Hainan Shilian Communication Technology Co ltd
Current assignee: Hainan Shilian Communication Technology Co ltd
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2024-08-27
Anticipated expiration: 2041-07-01
Also published as: CN113596381A

Abstract

The embodiment of the invention provides a method and a device for acquiring audio data, which are applied to a conference terminal establishing communication connection with at least one user terminal and comprise the following steps: receiving an audio data transmission request sent by at least one user terminal; determining an audio data transmission request with highest association degree as a target audio data transmission request; after receiving at least one audio data transmission request, the conference terminal can determine a target audio data transmission request with the highest association degree according to the association degree of the audio data transmission request and the video conference, thereby rapidly and accurately determining the target audio data transmission request conforming to the current conference process, requirement or content, and generating an audio data acquisition instruction to acquire the audio data through the target user terminal corresponding to the target audio data transmission request, and completing the video conference.

Description

Audio data acquisition method and device

Technical Field

The invention relates to the technical field of video networking, in particular to a method and a device for acquiring audio data.

Background

In the video networking, the conference terminal refers to application equipment bearing video networking services, and can support video conference, monitoring and viewing, video call, live broadcast on-demand, remote medical service and remote training full video services.

At present, in the process of completing video service by using a video networking terminal, the audio information of a speaking user can be collected and played by using a wired/wireless microphone equipped in the video networking terminal in a manner of acquiring and playing the audio information of the speaking user by using the video networking terminal, for example, when a video networking conference is performed, if the speaking user needs to speak, a conference host needs to transmit the wireless microphone to the hand of the speaking user or ask the speaking user to a speaking table provided with the wired microphone; in addition, in order to collect the audio information of the speaking user, the corresponding terminal and the video networking terminal can be in communication connection, so that the speaking request can be sent to the video networking terminal through the corresponding terminal, after the video networking terminal agrees to speak to the speaking user, the corresponding terminal is used for collecting the audio information, and the audio information is sent to the video networking terminal for playing.

However, in the current scheme, because the video networking terminal can be in communication connection with a plurality of terminals equipped with microphones, a chairman user corresponding to the video networking terminal cannot quickly screen out a speaking request meeting the current conference process and requirement from the video networking terminal under the condition that the video networking terminal receives a plurality of speaking requests, so that the continuity of the conference is affected.

Disclosure of Invention

In view of the foregoing, embodiments of the present invention are directed to providing a method for capturing audio data and a corresponding apparatus for capturing audio data, which overcome or at least partially solve the foregoing problems.

In order to solve the above problems, an embodiment of the present invention discloses a method for acquiring audio data, which is applied to a conference terminal that establishes communication connection with at least one user terminal, wherein the user terminal has an audio data acquisition device for acquiring audio data, and the method includes:

receiving an audio data transmission request sent by at least one user terminal in the video conference process;

Determining the association degree of the audio data transmission request and the video conference, determining the audio data transmission request with the highest association degree as a target audio data transmission request, and determining a target user terminal corresponding to the target audio data transmission request;

sending an audio data acquisition instruction to the target user terminal so that the target user terminal can acquire audio data in response to the audio data acquisition instruction;

and receiving the audio data returned by the target user terminal.

The embodiment of the invention also discloses a device for collecting audio data, which is applied to a conference terminal which establishes communication connection with at least one user terminal, wherein the user terminal is provided with a device for collecting audio data, and the device comprises:

The first receiving module is used for receiving an audio data transmission request sent by the user terminal in the video conference process;

The first determining module is used for determining the association degree of the audio data transmission request and the video conference, determining the audio data transmission request with the highest association degree as a target audio data transmission request, and determining a target user terminal corresponding to the target audio data transmission request;

the sending module is used for sending an audio data acquisition instruction to the target user terminal so that the target user terminal can acquire audio data in response to the audio data acquisition instruction;

and the second receiving module is used for receiving the audio data returned by the target user terminal.

The embodiment of the invention also provides a device, which comprises: one or more processors; and one or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform a method of acquisition of audio data as provided by the present invention.

In addition, the embodiment of the invention also provides a computer readable storage medium, and a stored computer program causes a processor to execute the audio data acquisition method.

The embodiment of the invention has the following advantages: because the conference terminal for realizing the video conference is in communication connection with at least one user terminal, a participant can send an audio data transmission request to the conference terminal by using the user terminal, and after receiving the at least one audio data transmission request, the conference terminal can determine the target audio data transmission request with the highest association according to the association degree of the audio data transmission request and the video conference, thereby rapidly and accurately determining the target audio data transmission request meeting the current conference process, requirement or content, and determining the target user terminal corresponding to the speaking user needing to acquire the audio data, thereby generating an audio data acquisition instruction to acquire the audio data through the target user terminal, and completing the acquisition process of the audio data in the video conference.

Drawings

FIG. 1 is an audio data acquisition system of the present invention;

FIG. 2 is a flow chart of the steps of a method of collecting audio data according to the present invention;

FIG. 3 is a schematic diagram of the generation and display of an audio data transmission request of the present invention;

FIG. 4 is a flow chart of steps of another method of collecting audio data according to the present invention;

FIG. 5 is a schematic diagram of another generation and display of an audio data transmission request of the present invention;

fig. 6 is a block diagram of an audio data acquisition device according to the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

In the embodiment of the invention, the audio data acquisition method can be applied to the conference terminal which establishes communication connection with at least one user terminal, the conference terminal can be a terminal for realizing video conference, and the user terminal can be provided with an audio data acquisition device for acquiring audio data. The user terminal can be various mobile terminals held by a participant user participating in the video conference, or various fixed terminals corresponding to the participant user participating in the video conference; the conference terminals can be all conference terminals in the video conference based on the video networking, and can also be the participant terminals in other video conferences.

Referring to fig. 1, there is shown an audio data collection system of the present invention, which may be applied to a video service based on video networking, where the video service may be a service for performing video conferences in a plurality of sites, and the system may include a plurality of sites, where the sites are communicatively connected through a video networking server 40, and each site includes a video networking terminal 10, where the video networking terminal 10 is further connected with an audio device 30 and a microphone.

Specifically, referring to fig. 1, a video service based on video networking is that a video conference is performed between a conference site 1 and a conference site 2, a participant in the conference site 1 may speak through a wireless/wired microphone connected with a video networking terminal 10, after audio data of the participant is collected by the wireless/wired microphone connected with the video networking terminal 10, the audio data of the participant may be played through a sound box 30 connected with the video networking terminal 10, and meanwhile, the video networking terminal 10 may also send the audio data of the participant to the video networking terminal in the conference site 2 through a video networking server 40, so that the video networking terminal in the conference site 2 may play the audio data of the participant through the sound box connected with the video networking terminal, thereby implementing collection and play of the audio data of the participant between the conference site 1 and the conference site 2.

However, in a video conference including a plurality of participating users, microphone resources are relatively limited, if the video networking terminal 10 in the conference site 1 is externally connected with a wired microphone, and the wired microphone is generally arranged on a speaking table, the speaking user needs to move to the speaking table provided with the wired microphone, so that the video networking terminal 10 can collect audio data of the speaking user through the externally connected wired microphone; if the video networking terminal 10 is externally connected with a wireless microphone, the conference host needs to transmit the wireless microphone to the hand of the speaking user, so as to ensure that the video networking terminal 10 can collect the audio data of the speaking user through the externally connected wireless microphone. The speaking mode of the participants breaks the continuity of the conference process and influences the user experience.

Accordingly, referring to fig. 1, an audio data collection system according to the present invention may further include at least one mobile terminal 20 communicatively connected to the internet of view terminal 10, and the mobile terminal 20 may have an audio data collection device for collecting audio data.

Specifically, the mobile terminal 20 may be a mobile terminal held by a participant in a meeting place, such as a mobile phone, a tablet computer, etc. Therefore, each participant user can use the mobile terminal held by the participant user when speaking, collect the speaking content of the participant user through the microphone arranged in the mobile terminal, and send the audio data corresponding to the speaking content to the video networking terminal 10, so that the video networking terminal 10 plays the audio data of the participant user through the external sound 30, or sends the audio data of the participant user to the video networking terminals in other meeting places through the video networking server 40, so that the video networking terminals in other meeting places can play the audio data of the participant user through the external sound, and the collection and play of the audio data of the speaking participant between the meeting place 1 and the meeting place 2 in the video conference are realized.

Referring to fig. 2, a flowchart illustrating steps of an audio data collection method of the present invention is applied to a conference terminal that establishes a communication connection with at least one user terminal, where the conference terminal is used to implement a video conference, and the user terminal has an audio data collection device for collecting audio data, and the method specifically may include the steps of:

step 101, in the video conference process, receiving an audio data transmission request sent by at least one user terminal.

In this step, during the video conference, the conference terminal may first receive an audio data transmission request transmitted by at least one user terminal communicatively connected thereto.

The audio data transmission request can be a request generated when a participant corresponding to the user terminal needs to speak in the video conference, namely, the participant who sends the audio data transmission request can be a preliminary speaker of the video conference, and the user terminal can be a terminal with an audio data acquisition device, which is held by the participant who is preliminary to speak.

For example, the videoconference includes meeting users: user 1, user 2, user 3 and user 4, the terminals held by the participating users are correspondingly: user terminal 1, user terminal 2, user terminal 3 and user terminal 4, and user terminal 1, user terminal 2, user terminal 3 and user terminal 4 are in communication connection with conference terminals in the video conference. At a certain moment in the video conference, if the user 1 and the user 2 want to speak in the video conference, the corresponding user terminal 1 and the corresponding user terminal 2 may be used to send an audio data transmission request to the conference terminal.

In the embodiment of the invention, all the user terminals corresponding to the participating users in the video conference can be connected with the conference terminals in advance, so that when speaking is needed, an audio data transmission request can be quickly sent to the conference terminals; and when a part of the participant users in the video conference need to speak, the corresponding user terminals are controlled to establish communication connection with the conference terminals in real time, and then the user terminals are utilized to send audio data transmission requests to the conference terminals, so that the user terminals corresponding to the users who do not need to speak do not need to establish communication connection with the conference terminals, and excessive resources are prevented from being occupied, and resource waste is avoided.

Step 102, determining the association degree between the audio data transmission request and the video conference, determining the audio data transmission request with the highest association degree as a target audio data transmission request, and determining a target user terminal corresponding to the target audio data transmission request.

In this step, the conference terminal may determine a target audio data transmission request in the received at least one audio data transmission request, and determine a target user terminal corresponding to the target audio data transmission request from the at least one user terminal, so as to determine a participant user corresponding to the target user terminal as a speaking user, and further may collect audio data of the speaking user, thereby completing collection and playing of audio data of the speaking user.

Specifically, the user terminal may correspond to a terminal held by each participating user in the video conference, and the conference terminal may execute and implement the terminal in the video conference. Therefore, after the conference terminal receives the audio data transmission request sent by the at least one user terminal, the conference terminal can determine at least one participant currently requesting to speak according to the received at least one audio data transmission request, and select one participant from the participant as the speaking user, wherein the audio data transmission request corresponding to the speaking user is the target audio data transmission request, the user terminal generating and sending the target audio data transmission request is the target user terminal, and the user corresponding to the target user terminal is the speaking user.

In order to make the finally determined speaking content of the speaking user, that is, the audio data collected by the target user terminal and transmitted to the conference terminal conform to the current conference process, requirement or content, when the conference terminal determines the target audio data transmission request from the received at least one audio data transmission request, the conference terminal may first determine the association degree of each audio data transmission request and the current video conference, so as to determine the audio data transmission request with the highest association degree as the target audio data transmission request.

Referring to fig. 3, which shows a schematic diagram of generation and display of an audio data transmission request according to the present invention, in an embodiment of the present invention, a user terminal 70 communicatively connected to a conference terminal 60 includes: the user terminal 1 and the user terminal 2, that is, the participant users who request to speak include the user 1 corresponding to the user terminal 1 and the user 2 corresponding to the user terminal 2, that is, the user 1 may generate an audio data transmission request 1 by using the user terminal 1, the audio data transmission request 1 may include a speaking summary 1 input by the user 1, and after the user 1 inputs the speaking summary 1, the user 1 may click a send request button to send the generated audio data transmission request 1 to the conference terminal 60; similarly, the user 2 may generate an audio data transmission request 2 using the user terminal 2, and the audio data transmission request 2 may include the talk profile 2 input by the user 2, and after inputting the talk profile 2, the user 2 may click a send request button to send the generated audio data transmission request 2 to the conference terminal 60.

Further, after receiving the audio data transmission request 1 and the audio data transmission request 2, the conference terminal may determine, according to the speech summary 1 included in the audio data transmission request 1 and the speech summary 2 included in the audio data transmission request 2, a degree of association between the audio data transmission request 1 and the current video conference and a degree of association between the audio data transmission request 2 and the current video conference in combination with a process or content of the current video conference, and determine the audio data transmission request with a relatively high degree of association as the target audio data transmission request.

In the embodiment of the present invention, after receiving the audio data transmission request 1 and the audio data transmission request 2, the conference terminal 60 may display the speaking summary 1 included in the audio data transmission request 1 and the speaking summary 2 included in the audio data transmission request 2 in the display device 50 of the conference terminal 60, and then the conference host user corresponding to the conference terminal 60 may select, in the display device 50, one target speaking summary from the speaking summary 1 and the speaking summary 2, that is, the conference host user allows the participant user to speak to the participant user corresponding to the target mobile device that generates the target speaking summary.

Step 103, sending an audio data acquisition instruction to the target user terminal so that the target user terminal can acquire audio data in response to the audio data acquisition instruction.

In this step, after determining the target user terminal corresponding to the speaking user, the conference terminal may send an audio data acquisition instruction to the target user terminal, so that after the target user terminal receives the audio data acquisition instruction, the audio data of the speaking user is acquired by the audio data acquisition device in the target user terminal, and the acquired audio data is sent to the target user terminal as a response to the audio data acquisition instruction.

And 104, receiving the audio data returned by the target user terminal.

In the step, the conference terminal can receive the audio data returned by the target user terminal, the audio data is the audio data acquisition instruction received by the speaking user corresponding to the target user terminal, the audio data input by the audio data acquisition device of the target user terminal is the speaking content of the acquired speaking user, and the target user terminal can further send the audio data to the conference terminal after the audio data is acquired.

After receiving the audio data returned by the target user terminal, the conference terminal can play the audio data by using an external audio data playing device such as a sound box, so that the speaking user can acquire the speaking content by using the terminal held by the conference terminal, and the speaking of the speaking user is finished by amplifying the audio data playing device externally connected with the conference terminal, so that other participant users in the conference site can receive the speaking content of the speaking user.

In addition, the conference terminal can also send the received audio data to other conference terminals in other conference sites after receiving the audio data returned by the target user terminal, and the other conference terminals can also play the audio data by using audio data playing devices such as sound equipment externally connected with the conference terminals, so that a speaking user in one conference site in the video conference can acquire speaking contents by using the terminal held by the conference terminal, and the conference terminal plays the speaking contents in other conference sites in the video conference, so that the participant users in the other conference sites can receive the speaking contents of the speaking user, and the video conference is realized.

In the embodiment of the invention, because the conference terminal for realizing the video conference is in communication connection with at least one user terminal, a participant can send an audio data transmission request to the conference terminal by using the user terminal, and the conference terminal can determine the target audio data transmission request with the highest association degree according to the association degree of the audio data transmission request and the video conference after receiving the at least one audio data transmission request, thereby rapidly and accurately determining the target audio data transmission request conforming to the current conference process, requirement or content and determining the target user terminal corresponding to the speaking user needing to acquire the audio data, thereby generating an audio data acquisition instruction to acquire the audio data through the target user terminal and completing the acquisition process of the audio data in the video conference.

Referring to fig. 4, a flowchart illustrating steps of another audio data acquisition method of the present invention may specifically include the following steps:

step 201, establishing communication connection with the user terminal.

In this step, communication connection between the conference terminal and each user terminal may be pre-established, so as to ensure that the conference terminal may receive that, in the case that a speaking request is required by a participant user corresponding to the user terminal, an audio data transmission request may be sent to the conference terminal through the held user terminal, so as to request to acquire a speaking qualification.

For example, a conference terminal may turn on a wireless communication (Wi-Fi) hotspot such that a participant in a video conference may establish a communication connection with the conference terminal using the respective user terminal based on the Wi-Fi hotspot of the conference terminal.

Specifically, the conference terminal may start a Wi-Fi hotspot, and locally start a small global Wide Web (Web) service, and generate a two-dimensional code according to a Web address corresponding to the Web service, so that the two-dimensional code is displayed in a display device of the conference terminal, and a participant in the video conference may scan the two-dimensional code by using the user terminal held by the participant in the video conference, so that the user terminal is connected to the Wi-Fi hotspot of the conference terminal, and communication connection is established between the user terminal and the conference terminal.

And 202, receiving an audio data transmission request sent by the user terminal through the communication connection.

In this step, since the conference terminal establishes a communication connection with the user terminal, the conference terminal can receive an audio data transmission request transmitted from the user terminal through the established communication connection.

Step 203, determining a first association degree between the content of the audio data corresponding to the user terminal and the video conference according to the content identification information.

In this step, the audio data transmission request may include content identification information of the audio data corresponding to the user terminal, the content identification information being used to characterize the content of the audio data.

Alternatively, the content identification information may include summary information and/or keywords of the content of the audio data.

In the embodiment of the invention, the participant corresponding to the user terminal can input the abstract information and/or keywords of the audio data (speaking content) to be acquired in advance as the content identification information of the audio data corresponding to the user terminal so as to represent the content of the audio data; the user terminal can also perform semantic analysis on the collected audio data after the audio data is collected, so as to obtain abstract information and/or keywords of the audio data.

Further, the summary information and/or the keyword of the audio data and the current progress or content of the video conference may be combined to determine a first association degree between the content of the audio data and the video conference, and since the content identification information characterizes the content of the audio data, the first association degree determined according to the content identification information in the audio data transmission request may be used to characterize the association degree between the content of the audio data corresponding to the user terminal and the video conference, and if the first association degree is higher, it is indicated that the content of the audio data is more consistent with the current progress or content of the video conference.

For example, if the content of the current video conference is protection for the endangered animals, the summary information of the audio data includes the current situation and protection measures of the endangered animals, or the keywords of the audio data are: species extinct, endangered animals, golden monkeys, etc., the association degree of the content of the audio data with the current video conference can be determined to be higher, the audio data transmission request can be determined to be a target audio data transmission request, and the audio data can be acquired from the corresponding target user terminal.

Step 204, determining a second association degree between the user corresponding to the user terminal and the video conference according to the identification information.

In this step, the audio data transmission request may include identification information corresponding to the user terminal, where the identification information is used to characterize the identification information of the user corresponding to the user terminal.

In the embodiment of the present invention, the identity information may be input in advance by the participant user corresponding to the user terminal, as the identity information corresponding to the user terminal, fig. 5 shows a schematic diagram of generation and display of another audio data transmission request according to the present invention, as shown in fig. 5, the user 1 may generate the audio data transmission request 1 by using the user terminal 1, where the audio data transmission request 1 may include information including the identity of the user 1, including name, gender, department, job class, etc., input by the user 1, and after the identity information is input by the user 1, the user 1 may click a send request button to send the generated audio data transmission request 1 to the conference terminal 60.

Further, the conference terminal 60 may determine a second association degree between the user corresponding to the user terminal and the current video conference by combining the identity information corresponding to the user terminal and the current video conference process or content, and because the identity information characterizes the identity information of the user corresponding to the user terminal, the second association degree determined according to the identity information in the audio data transmission request may be used to characterize the association degree between the reference user corresponding to the user terminal and the video conference, and if the second association degree is higher, it is indicated that the speaking content corresponding to the reference user corresponding to the user terminal is more consistent with the current video conference process or content.

For example, if the content of the current video conference is maintenance for the female interests, the sex in the identity information corresponding to the user terminal is female, and the department is a female association, it may be determined that the association degree between the identity information of the user corresponding to the user terminal and the current video conference is higher, the audio data transmission request may be determined as the target audio data transmission request, and the audio data may be acquired from the user terminal.

Step 205, determining the association degree between the audio data transmission request and the video conference according to the first association degree and/or the second association degree, and determining the audio data transmission request with the highest association degree as a target audio data transmission request.

In this step, the association degree between the audio data transmission request and the video conference may be determined according to the first association degree and/or the second association degree, and the audio data transmission request with the highest association degree may be determined as the target audio data transmission request.

Specifically, the association degree of the audio data transmission request and the video conference may be determined according to the first association degree, for example, the first association degree is directly determined as the association degree of the audio data transmission request and the video conference; the association degree of the audio data transmission request and the video conference can also be determined according to the second association degree, for example, the second association degree is directly determined as the association degree of the audio data transmission request and the video conference; the association degree of the audio data transmission request and the video conference may also be determined by the first association degree and the second association degree, for example, different specific gravities are set for the first association degree and the second association degree, so that the association degree of the audio data transmission request and the video conference is determined by combining the first association degree and the second association degree.

Step 206, displaying an identity information filtering list in the display device of the conference terminal, wherein the identity information filtering list comprises at least one preset identity information category.

Optionally, after step 202, step 206 may be performed.

In this step, after receiving the audio data transmission request sent by at least one user terminal, the target audio data transmission request may be determined according to a selection operation of the conference host user corresponding to the conference terminal in the user terminal.

Specifically, referring to fig. 5, an identity information filtering list may be displayed in the display device 50 of the conference terminal 60, where the identity information filtering list may include at least one preset identity information category, for example: name, gender, department, job level, etc.

Step 207, determining a target identity information category according to a selection operation of the user corresponding to the conference terminal in the identity information filtering list.

In this step, the conference host user corresponding to the conference terminal may determine the target identity information category in the identity information filtering list of the display device 50 through a selection operation.

Specifically, referring to fig. 5, if the progress of the video conference reaches the end portion and the participating users at the overall manager level are required to make a summary speaking, the conference host user corresponding to the conference terminal 60 may select the identity information category: the total manager job level in the job levels is used as a target identity information category, so that an audio data transmission request with the job level of the identity information being the total manager job level is used as a target audio data transmission request, namely, a corresponding target user terminal is a terminal held by a participant user of the total manager job level, the participant user of the total manager job level uses the terminal held by the participant user to collect a summary speech, audio data is obtained, the audio data is transmitted to the conference terminal 60, and the conference terminal 60 plays the audio data through the sound equipment 30, so that conference speech of the participant user directly by the total manager is completed.

Step 208, screening the audio data transmission requests to be selected, which meet the target identity information category, from at least one audio data transmission request according to the identity information in the audio data transmission requests.

In this step, after the target identity information category is determined, a candidate audio data transmission request conforming to the target identity information category may be screened from at least one audio data transmission request according to the identity information in the audio data transmission request.

For example, referring to fig. 5, if the current content of the video conference is a corporate financial annual report, the conference hosting user corresponding to the conference terminal 60 may select the identity information category: the financial departments in the departments serve as target identity information categories, so that the audio data transmission requests of the departments in the identity information serving as the financial departments serve as candidate audio data transmission requests.

And step 209, displaying content identification information of the audio data contained in the audio data transmission request to be selected in a display device of the conference terminal.

In this step, after the audio data transmission request to be selected is determined based on the identification information, the content identification information of the audio data contained in the audio data transmission request to be selected may be displayed in the display device of the conference terminal.

For example, referring to fig. 5, if the two candidate audio data transmission requests determined according to the identification information are two, and the content identification information included in the two candidate audio data transmission requests is the talk profile 1 and the talk profile 2, the content identification information as the two candidate audio data transmission requests may be: the talk burst profile 1 and the talk burst profile 2 are displayed in the display device 50 of the conference terminal 60.

Step 210, receiving a selection operation for at least one content identification information in the display device, and determining the target audio data transmission request from at least one candidate audio data transmission request according to the selection operation.

In this step, a selection operation for the at least one content identification information may be further received in the display device, so that a target audio data transmission request is determined from the at least one candidate audio data transmission request according to the selection operation.

In the embodiment of the present invention, the conference terminal 60 may transmit the content identification information of the two candidate audio data transmission requests: the talk profile 1 and the talk profile 2 are displayed on the display device 50, and the conference host user corresponding to the conference terminal 60 may select one target talk profile from the talk profiles 1 and 2 on the display device 50, so as to determine the to-be-selected audio data transmission request corresponding to the target talk profile as the target audio data transmission request, that is, the conference host user allows the participant user corresponding to the target mobile device generating the target talk profile to talk.

Step 211, determining a target user terminal corresponding to the target audio data transmission request.

After determining the target audio data transmission request in step 205 or 210, the conference terminal further determines a target user terminal corresponding to the target audio data transmission request.

And step 212, establishing socket network connection with the target user terminal according to the network address of the target user terminal contained in the target audio data transmission request.

In this step, the audio data transmission request may further include a network address of the user terminal. Such that the conference terminal, after determining the target audio data transmission request, may determine a network address of the target user terminal, for example, may be an internet protocol (Internet Protocol, IP) address, from the audio data transmission request, and establish a socket (socket) network connection between the conference terminal and the target user terminal according to the IP address of the conference terminal and the IP address of the target user terminal.

The socket is an abstraction of endpoints for performing bidirectional communication between application processes on different hosts in a network, and since socket communication programming is insensitive to the type of physical network media, when communication is performed in an equipment system, socket connection does not need to be performed through the network, so socket communication in the system does not depend on a surfing mode. While socket connection is generally based on IPv4/IPv6 between different devices, a large function of IP is to shield different features below the network layer (including the link layer and the physical layer), so that an application program uses the same method on any network medium when using IP packets for communication, and the socket connection is above the IP, without considering factors such as whether the physical network is wired or wireless. That is, since socket is only an application program interface (Application Programming Interface, API) of a transmission control protocol (Transmission Control Protocol, TCP) TCP/IP network, communication can be completed as long as a port number and a local IP address are known, irrespective of the internet surfing mode.

Therefore, by establishing a socket connection between the conference terminal and the target user terminal that needs to transmit audio data, the flexibility and efficiency of the audio data transmission process can be improved.

In the embodiment of the invention, the communication connection between the conference terminal and each user terminal can be pre-established, so that the conference terminal can receive the audio data transmission request from the held user terminal to the conference terminal under the condition that the conference user corresponding to the user terminal has speaking requirements, and the request for acquiring the speaking qualification is required. And after the conference terminal determines the target user terminal from the user terminals corresponding to at least one participant user requesting to speak, namely, determines the target user terminal corresponding to the target user who can speak in the video conference, the conference terminal establishes socket network connection with the target user terminal, so that the conference terminal acquires the audio data containing the speaking content of the target user from the target user terminal through the established socket network connection. And the socket network connection with other user terminals except the target user terminal is not required to be established, so that network resources are saved.

And step 213, transmitting the audio data acquisition instruction to the target user terminal through the socket network connection.

In this step, after the conference terminal establishes a socket connection with the target user terminal, the audio data acquisition instruction may be transmitted to the target user terminal through the established socket connection. After receiving the audio data acquisition instruction, the target user terminal acquires audio data of the speaking user through an audio data acquisition device in the target user terminal, and sends the acquired audio data to the target user terminal as a response of the audio data acquisition instruction.

And step 214, receiving the audio data returned by the target user terminal through the socket network connection.

In the step, the conference terminal can acquire the audio data returned by the target user terminal through the established socket connection.

And step 215, sending the audio data to other conference terminals so that the other conference terminals play the audio data through an audio data playing device connected with the other conference terminals.

In this step, after receiving the audio data returned by the target user terminal, the conference terminal may send the received audio data to other conference terminals in other conference sites, so that the other conference terminals play the audio data by using audio data playing devices such as a sound device externally connected to the conference terminals, so that the speaking user in one conference site in the video conference can acquire the speaking content by using the terminal held by the conference terminal, and play the speaking content in other conference sites in the video conference, so that the participant user in the other conference sites can receive the speaking content of the speaking user, and the video conference is realized.

If the video conference is a video-networking video conference based on video networking, referring to fig. 1, the video-networking terminals 10 in each conference site can implement the video conference through the video-networking server 40, the video-networking terminals 10 as conference terminals can directly send the audio data to the video-networking server 40 after receiving the audio data, so that the video-networking server 40 can further forward the audio data to other video-networking terminals in other conference sites in the video-networking conference, and after receiving the audio data, the other video-networking terminals in other conference sites can play the audio data through an audio data playing device connected with the other video-networking terminals, such as an external sound box, so as to finish speaking of a speaking user, and other users in other conference sites in the video-networking video conference can receive speaking contents of the speaking user.

In addition, the conference terminal can also receive the audio data returned by the target user terminal, and then play the audio data through the audio data playing device connected with the conference terminal, for example, an external sound device, so that the speaking user can acquire the speaking content by using the terminal held by the conference terminal, and the speaking content of the speaking user can be received by other participant users in the conference site by amplifying the audio data playing device connected with the conference terminal.

In addition, in the embodiment of the invention, before the conference terminal sends the audio data acquisition instruction to the target user terminal and receives the audio data returned by the target user terminal in response to the audio data acquisition instruction, a socket connection can be established between the conference terminal and the target user terminal, so that the audio data acquisition instruction and the audio data can be transmitted through the established socket connection, and the flexibility and the efficiency of the audio data transmission process can be improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 6, there is shown a block diagram of an audio data acquisition device of the present invention, which is applied to a conference terminal that establishes a communication connection with at least one user terminal, and the user terminal has an audio data acquisition device for acquiring audio data, and may specifically include the following modules:

A first receiving module 301, configured to receive an audio data transmission request sent by the user terminal during a video conference;

A first determining module 302, configured to determine a degree of association between the audio data transmission request and the video conference, determine an audio data transmission request with the highest degree of association as a target audio data transmission request, and determine a target user terminal corresponding to the target audio data transmission request;

A sending module 303, configured to send an audio data acquisition instruction to the target user terminal, so that the target user terminal acquires audio data in response to the audio data acquisition instruction;

And the second receiving module 304 is configured to receive the audio data returned by the target user terminal.

In a preferred embodiment of the present invention, the audio data transmission request includes content identification information of audio data corresponding to the user terminal, and identity information corresponding to the user terminal, where the content identification information is used to characterize content of the audio data, and the identity information is used to characterize identity information of a user corresponding to the user terminal;

The first determining module includes:

a first determining sub-module, configured to determine, according to the content identification information, a first degree of association between the content of the audio data corresponding to the user terminal and the video conference;

A second determining sub-module, configured to determine, according to the identity information, a second association degree between a user corresponding to the user terminal and the video conference;

and the third determining submodule is used for determining the association degree of the audio data transmission request and the video conference according to the first association degree and/or the second association degree.

In a preferred embodiment of the invention, the content identification information comprises summary information and/or keywords of the content of the audio data.

In a preferred embodiment of the invention, the device further comprises:

The first display module is used for displaying an identity information screening list in the display equipment of the conference terminal, wherein the identity information screening list comprises at least one preset identity information category;

the second determining module is used for determining a target identity information category according to the selection operation of the user corresponding to the conference terminal in the identity information screening list;

the screening module is used for screening the audio data transmission requests to be selected, which accord with the target identity information category, from at least one audio data transmission request according to the identity information in the audio data transmission requests;

A second display module, configured to display content identification information of audio data included in the audio data transmission request to be selected in a display device of the conference terminal;

And a third determining module, configured to receive a selection operation for at least one content identification information in the display device, and determine the target audio data transmission request from at least one audio data transmission request to be selected according to the selection operation.

In a preferred embodiment of the present invention, the audio data transmission request includes a network address of the user terminal; the sending module comprises:

A constructing sub-module, configured to establish a socket network connection with the target user terminal according to the network address of the target user terminal included in the target audio data transmission request;

the sending sub-module is used for connecting the audio data acquisition instruction through the socket network and sending the audio data acquisition instruction to the target user terminal;

the second receiving module includes:

and the first receiving sub-module is used for receiving the audio data returned by the target user terminal through the socket network connection.

In a preferred embodiment of the invention, the device further comprises:

The construction module is used for establishing communication connection with the user terminal;

the second receiving module includes:

And the second receiving sub-module is used for receiving the audio data transmission request sent by the user terminal through the communication connection.

In a preferred embodiment of the invention, the device further comprises:

And the playing module is used for sending the audio data to other conference terminals so that the other conference terminals can play the audio data through an audio data playing device connected with the other conference terminals.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.

The above description is made in detail on a method and apparatus for collecting audio data provided by the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the above description of the examples is only for helping to understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method for collecting audio data, which is applied to a conference terminal that establishes a communication connection with at least one user terminal, wherein the user terminal is provided with an audio data collecting device for collecting audio data, and the method comprises the following steps:

receiving audio data returned by the target user terminal;

The audio data transmission request comprises content identification information of audio data corresponding to the user terminal and identity identification information corresponding to the user terminal, wherein the content identification information is used for representing the content of the audio data, and the identity identification information is used for representing the identity information of a user corresponding to the user terminal;

The step of determining the association degree of the audio data transmission request and the video conference comprises the following steps:

determining a first association degree between the content of the audio data corresponding to the user terminal and the video conference according to the content identification information;

determining a second association degree between the user corresponding to the user terminal and the video conference according to the identification information;

and determining the association degree of the audio data transmission request and the video conference according to the first association degree and/or the second association degree.

2. The method according to claim 1, wherein the content identification information comprises summary information and/or keywords of the content of the audio data.

3. The method according to claim 1, wherein the method further comprises:

Displaying an identity information screening list in a display device of the conference terminal, wherein the identity information screening list comprises at least one preset identity information category;

determining a target identity information category according to the selection operation of the user corresponding to the conference terminal in the identity information screening list;

screening out a to-be-selected audio data transmission request conforming to the target identity information type from at least one audio data transmission request according to the identity information in the audio data transmission request;

Displaying content identification information of the audio data contained in the audio data transmission request to be selected in a display device of the conference terminal;

A selection operation for at least one of the content identification information is received in the display device, and the target audio data transmission request is determined from at least one of the audio data transmission requests to be selected according to the selection operation.

4. The method according to claim 1, wherein the audio data transmission request comprises a network address of the user terminal;

The step of sending an audio data acquisition instruction to the target user terminal comprises the following steps:

establishing socket network connection with the target user terminal according to the network address of the target user terminal contained in the target audio data transmission request;

the audio data acquisition instruction is connected through the socket network and is sent to the target user terminal;

The step of receiving the audio data returned by the target user terminal comprises the following steps:

and receiving the audio data returned by the target user terminal through the socket network connection.

5. The method according to claim 1, wherein prior to the step of receiving an audio data transmission request sent by at least one of the user terminals, the method further comprises:

Establishing communication connection with the user terminal;

The step of receiving an audio data transmission request sent by at least one user terminal includes:

and receiving an audio data transmission request sent by the user terminal through the communication connection.

6. The method of claim 1, wherein after the step of receiving the audio data returned by the target user terminal, the method further comprises:

And sending the audio data to other conference terminals so that the other conference terminals play the audio data through an audio data playing device connected with the other conference terminals.

7. An audio data acquisition device for use in a conference terminal in which a communication connection is established with at least one user terminal, the user terminal having an audio data acquisition device for acquiring audio data, the device comprising:

the second receiving module is used for receiving the audio data returned by the target user terminal;

The first determining module includes:

8. An apparatus, comprising:

One or more processors; and

One or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform the method of collecting audio data of any of claims 1 to 6.

9. A computer-readable storage medium, characterized in that a computer program stored therein causes a processor to execute the audio data acquisition method according to any one of claims 1 to 6.