CN113242358A

CN113242358A - Audio data processing method, device and system, electronic equipment and storage medium

Info

Publication number: CN113242358A
Application number: CN202110450433.7A
Authority: CN
Inventors: 黄伟琦; 江鹏; 夏帅
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2021-08-10

Abstract

The present disclosure provides an audio data processing method, apparatus, system, electronic device and storage medium, which relate to the field of artificial intelligence such as intelligent speech and natural language processing, wherein the method may include: determining audio equipment of access terminal equipment; when the audio equipment generates audio data, acquiring the audio data; and sending the audio data and the equipment identification of the audio equipment to a server associated with the terminal equipment. By applying the scheme disclosed by the disclosure, the implementation complexity, the implementation cost and the like can be reduced.

Description

Audio data processing method, device and system, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an audio data processing method, apparatus, system, electronic device, and storage medium in the fields of intelligent speech and natural language processing.

Background

When a plurality of users simultaneously perform a voice conference (web conference), each terminal device, such as a computer, usually corresponds to only one speaker (user), and even if the terminal device corresponds to a plurality of speakers, different speakers cannot be effectively distinguished, and if different speakers are to be distinguished, the terminal device needs to be docked and developed with a third-party manufacturer, so that the implementation complexity and the implementation cost are increased.

Disclosure of Invention

The disclosure provides an audio data processing method, an apparatus, a system, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided an audio data processing method including:

determining audio equipment of access terminal equipment;

when the audio equipment generates audio data, acquiring the audio data;

and sending the audio data and the equipment identification of the audio equipment to a server associated with the terminal equipment.

acquiring audio data from a client and generating equipment identification of audio equipment of the audio data, wherein the audio equipment is the audio equipment accessed to terminal equipment where the client is located;

and correspondingly storing the audio data and the equipment identifier.

acquiring an equipment identifier from a server, wherein the equipment identifier is the equipment identifier of audio equipment which generates audio data on a client and is acquired by the server;

and displaying the equipment identification.

According to an aspect of the present disclosure, there is provided an audio data processing apparatus including: the device comprises a determining module, a first obtaining module and a sending module;

the determining module is used for determining the audio equipment of the access terminal equipment;

the first obtaining module is configured to obtain audio data when the audio device generates the audio data;

and the sending module is used for sending the audio data and the equipment identifier of the audio equipment to a server associated with the terminal equipment.

According to an aspect of the present disclosure, there is provided an audio data processing apparatus including: a second acquisition module and a storage module;

the second obtaining module is used for obtaining audio data from a client and generating a device identifier of an audio device of the audio data, wherein the audio device is an audio device accessed to a terminal device where the client is located;

and the storage module is used for correspondingly storing the audio data and the equipment identifier.

According to an aspect of the present disclosure, there is provided an audio data processing apparatus including: a third acquisition module and a display module;

the third obtaining module is configured to obtain an equipment identifier from the server, where the equipment identifier is an equipment identifier of an audio equipment that generates audio data on the client and is obtained by the server;

and the display module is used for displaying the equipment identifier.

According to an aspect of the present disclosure, there is provided an audio data processing system including: the first and second devices as described above.

According to an aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

According to an aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described above.

According to an aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above.

One embodiment in the above disclosure has the following advantages or benefits: one or more audio devices can be accessed to the terminal device according to actual needs, and the device identifier of the audio device can be uploaded simultaneously when the audio data generated by any audio device is uploaded, so that different sources of the audio data, namely different speakers, can be effectively distinguished, and the implementation complexity, the implementation cost and the like are reduced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flowchart of a first embodiment of an audio data processing method according to the present disclosure;

FIG. 2 is a flow chart of a second embodiment of the audio data processing method according to the present disclosure;

FIG. 3 is a flow chart of a third embodiment of an audio data processing method according to the present disclosure;

FIG. 4 is a schematic diagram illustrating an overall implementation process of the audio data processing method according to the present disclosure;

FIG. 5 is a block diagram of an audio data processing apparatus 500 according to a first embodiment of the disclosure;

FIG. 6 is a schematic diagram illustrating a second embodiment 600 of an audio data processing apparatus according to the present disclosure;

FIG. 7 is a block diagram of an audio data processing apparatus 700 according to a third embodiment of the disclosure;

FIG. 8 is a block diagram of a first embodiment 800 of an audio data processing system according to the present disclosure;

FIG. 9 is a block diagram of a second embodiment 900 of an audio data processing system according to the present disclosure;

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart of a first embodiment of an audio data processing method according to the present disclosure. As shown in fig. 1, the following detailed implementation is included.

In step 101, an audio device of an access terminal device is determined.

In step 102, audio data is acquired when the audio device generates audio data.

In step 103, the audio data and the device identification (id) of the audio device are sent to a server associated with the terminal device.

In the scheme of the embodiment of the method, one or more audio devices can be accessed to the terminal device according to actual needs, and the device identifier of the audio device can be uploaded simultaneously when the audio data generated by any audio device is uploaded, so that different sources of the audio data can be effectively distinguished, and the implementation complexity, the implementation cost and the like are reduced.

The executing subject of the scheme of the method embodiment can be a client located in the terminal device. The specific type of the terminal device may be determined according to actual needs, such as a computer.

Whether the audio equipment of the access terminal equipment exists can be periodically detected according to the preset time interval. The polling may be performed periodically to determine the audio device of the access terminal device. The specific duration of the period can also be determined according to actual needs. At different times, the number of audio devices accessing the terminal device may change, for example, the audio device that originally accesses the terminal device is the audio device 1, and then the audio device 2 is accessed, and by periodically polling, the newly accessed audio device can be found in time, and becomes effective in time.

In addition, each audio device can be accessed into the terminal device in a hub (hub) mode, so that the method is very simple and convenient, and the audio devices can be increased or decreased at any time according to actual needs. The specific kind of the audio device may also depend on the actual needs, for example, it may be a microphone.

Preferably, the audio device of the access terminal device may be an audio device participating in a voice conference. In theory, different audio devices accessing the same terminal device may participate in the same voice conference or different voice conferences, but generally speaking, different audio devices accessing the same terminal device participate in the same voice conference.

The audio receiving thread can be started for each audio device, when the audio device is determined to be in a running state in a voice conference in which the audio device participates and the audio device generates audio data, the audio data corresponding to the audio device can be obtained and sent to the server associated with the terminal device, and the device identification of the audio device corresponding to the audio data and the conference identification of the corresponding voice conference can be sent to the server together while the audio data is sent, so that the sources of the audio data can be distinguished.

The device identifier may be obtained from the terminal device, and the conference identifier may be input by the user, for example, the user may input a conference identifier to participate by using a browser-related interface on the terminal device.

That is to say, one or more audio devices can be accessed to the terminal device according to actual needs, so as to expand conference participants as required, and different sources of audio data, i.e. different speakers, can be effectively distinguished through the device identifier of the corresponding audio device and the conference identifier of the corresponding voice conference, thereby reducing implementation complexity, implementation cost and the like.

In practical application, a long link can be established with the server, and the conference state of the voice conference is acquired from the server through the established long link. The conference state may include an un-started state, a proceeding state, and an ending state, etc. The established long link may be a web socket (WebSocket) long link, or the like.

Accordingly, for any audio device, when the voice conference in which the audio device participates is in an ongoing state and the audio device generates audio data, the corresponding audio data can be acquired.

By the above processing, unnecessary audio data transmission can be avoided, thereby saving transmission resources and the like.

In practical application, a supervision (supervise) thread can be started to monitor the state of the client, and the client can be restarted when the client exits due to abnormal conditions, so that the availability, stability and the like of the radio function are ensured.

The above description mainly describes the scheme of the present disclosure from the client side, and the following description further describes the scheme of the present disclosure from the server side.

Fig. 2 is a flowchart of a second embodiment of the audio data processing method according to the present disclosure. As shown in fig. 2, the following detailed implementation is included.

In step 201, audio data from the client and a device identifier of an audio device that generates the audio data are obtained, where the audio device is an audio device that accesses a terminal device where the client is located.

In step 202, the obtained audio data and the device identifier are stored correspondingly.

In the scheme of the embodiment of the method, the acquired audio data from the client and the corresponding device identifier of the audio device can be correspondingly stored, so that different sources of the audio data can be effectively distinguished, and the implementation complexity, the implementation cost and the like are reduced.

The main executing body of the scheme of the method embodiment can be a server, and the audio equipment of the access terminal equipment can be audio equipment participating in the voice conference.

Accordingly, the acquired audio data may be audio data generated by the audio device when the voice conference in which the audio device participates is in an ongoing state. In addition, a conference identifier of the voice conference can be obtained, the audio data is stored in a corresponding audio file, and the audio file takes the combination of the conference identifier and the equipment identifier as a file identifier.

That is, after audio data sent by different clients are acquired, the audio data can be written into different audio files according to the dimensions of the conference identifier and the device identifier. That is, for any audio file, its file identification may be composed of the corresponding conference identification and device identification, i.e., the file identification is conference identification + device identification.

In addition, a long link can be established with the client, and the conference state of the voice conference is synchronized to the client through the established long link.

The acquired audio data can be converted into text information, no limitation is imposed on how to convert the text information, and the text information obtained by conversion and the corresponding equipment identification can be sent to the terminal equipment participating in the voice conference for display. For example, the information may be presented on a browser-related interface of each terminal device participating in the voice conference.

Through the processing, each user participating in the voice conference can conveniently check the audio content/speaking content of other users, and the user corresponding to the speaking content can be clear.

And after the voice conference is finished, combining the audio files corresponding to the audio devices participating in the voice conference according to the file identification, thereby obtaining the conference audio file of the voice conference.

Taking the voice conference x as an example, since the file identifier of each audio file consists of a conference identifier and an equipment identifier, after the voice conference x is finished, each audio file belonging to the voice conference x, namely the audio file corresponding to each audio equipment participating in the voice conference x, can be conveniently and accurately found out according to the file identifier of each audio file, and then each audio file can be merged by calling a command line tool and the like, so that the required conference audio file of the voice conference x is obtained, and the conference content can be conveniently listened back subsequently.

Fig. 3 is a flowchart of a third embodiment of the audio data processing method according to the present disclosure. As shown in fig. 3, the following detailed implementation is included.

In step 301, a device identifier from a server is obtained, where the device identifier is a device identifier of an audio device that generates audio data on a client and is obtained by the server.

In step 302, the acquired device identifier is displayed.

In the solution of the above method embodiment, the device identifier of the audio device that generates the audio data may be displayed, so that the user may effectively distinguish different sources of the audio data based on the displayed content.

Optionally, the text information from the server and the device identifier corresponding to the text information may be acquired, the text information is obtained by converting the acquired audio data by the server, the device identifier is a device identifier of an audio device that generates the audio data, and accordingly, the acquired text information and the device identifier may be displayed.

With the above introduction, fig. 4 is a schematic diagram of an overall implementation process of the audio data processing method according to the present disclosure.

As shown in fig. 4, the computer a, the computer B, and the computer c are three computers in different network environments, wherein the computer a and the computer B are in a network environment a, the computer c is in a network environment B, and the computer a, the computer B, and the computer c are respectively connected to two microphones by hub.

Assuming that users corresponding to these microphones are participating in the same voice conference x, taking the microphone 1 shown in fig. 4 as an example, the client a located on the computer a may perform the following processing for the microphone 1: when the voice conference x is determined to be in the proceeding state, the audio data corresponding to the microphone 1 is acquired, and the acquired audio data, the equipment identifier of the microphone 1 and the conference identifier of the voice conference x are sent to the server together.

Correspondingly, taking the microphone 1 shown in fig. 4 as an example, the server may store the acquired audio data in an audio file using the conference identifier of the voice conference x and the device identifier of the microphone 1 as file identifiers, convert the audio data into text information, and send the converted text information to the computer a, the computer b, and the computer c for display and the like.

Taking the microphone 1 shown in fig. 4 as an example, other microphones can be processed in the above manner.

After the voice conference x is finished, the server can also combine the audio files corresponding to the microphones, so as to obtain the conference audio file of the voice conference x.

It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts, those skilled in the art will appreciate that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the disclosure. In addition, for parts which are not described in detail in a certain embodiment, reference may be made to relevant descriptions in other embodiments.

The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below. The audio data processing device provided by the present disclosure is used for executing any one of the audio data processing methods described above.

Fig. 5 is a schematic structural diagram of a first embodiment 500 of an audio data processing apparatus according to the present disclosure. As shown in fig. 5, includes: a determining module 501, a first obtaining module 502 and a sending module 503.

A determining module 501, configured to determine an audio device of an access terminal device.

The first obtaining module 502 is configured to obtain the audio data when the audio device generates the audio data.

A sending module 503, configured to send the obtained audio data and the device identifier of the audio device to a server associated with the terminal device.

In the scheme of the embodiment of the apparatus, one or more audio devices can be accessed to the terminal device according to actual needs, and the device identifier of the audio device can be uploaded simultaneously when the audio data generated by any audio device is uploaded, so that different sources of the audio data can be effectively distinguished, and the implementation complexity, the implementation cost and the like are reduced.

The determining module 501 may periodically detect whether there is an audio device accessing the terminal device according to a preset time interval, that is, periodically perform polling, so as to determine the audio device accessing the terminal device. At different times, the number of audio devices accessing the terminal device may change, for example, the audio device that originally accesses the terminal device is the audio device 1, and then the audio device 2 is accessed, and by periodically polling, the newly accessed audio device can be found in time, and becomes effective in time.

In addition, the audio equipment can be accessed to the terminal equipment through a hub mode, and the access mode is very simple and convenient.

For any audio device, the first obtaining module 502 may obtain audio data corresponding to the audio device and send the audio data to the server when the voice conference in which the audio device participates is in an ongoing state and the audio device generates audio data.

The first obtaining module 502 may further obtain a conference identifier of the voice conference, and send the device identifier of the audio device corresponding to the audio data and the conference identifier of the corresponding voice conference to the server at the same time of sending the audio data.

In addition, the first obtaining module 502 may further establish a long link with the server, and obtain the conference state of the voice conference from the server through the long link.

Fig. 6 is a schematic structural diagram of a second embodiment 600 of an audio data processing apparatus according to the present disclosure. As shown in fig. 6, includes: a second obtaining module 601 and a saving module 602.

The second obtaining module 601 is configured to obtain audio data from the client and a device identifier of an audio device that generates the audio data, where the audio device is an audio device that accesses a terminal device where the client is located.

A saving module 602, configured to correspondingly save the acquired audio data and the device identifier.

In the scheme of the embodiment of the device, the acquired audio data from the client and the corresponding device identifier of the audio device can be correspondingly stored, so that different sources of the audio data can be effectively distinguished, and the implementation complexity, the implementation cost and the like are reduced.

The audio data acquired by the second acquiring module 601 may be: audio data generated by the audio device when a voice conference in which the audio device is participating is in progress.

Accordingly, the second obtaining module 601 may also obtain a conference identifier of the voice conference. The saving module 602 may save the acquired audio data in a corresponding audio file, where the audio file may use a combination of the conference identifier and the device identifier as a file identifier.

In addition, the second obtaining module 601 may further establish a long link with the client, and synchronize the conference state of the voice conference with the client through the long link.

The storage module 602 may further convert the acquired audio data into text information, and may send the text information obtained by conversion and the corresponding device identifier to the terminal device participating in the voice conference for display.

Further, after the voice conference is finished, the saving module 602 may further merge the audio files corresponding to the audio devices participating in the voice conference according to the file identifier, so as to obtain a conference audio file of the voice conference.

Fig. 7 is a schematic block diagram of an audio data processing apparatus 700 according to a third embodiment of the disclosure. As shown in fig. 7, includes: a third obtaining module 701 and a display module 702.

A third obtaining module 701, configured to obtain an apparatus identifier from the server, where the apparatus identifier is an apparatus identifier of an audio apparatus that generates audio data on the client and is obtained by the server.

A displaying module 702, configured to display the obtained device identifier.

In the solution of the embodiment of the apparatus, the device identifier of the audio device that generates the audio data may be displayed, so that the user may effectively distinguish different sources of the audio data based on the displayed content.

Optionally, the third obtaining module 701 may obtain text information from the server and an equipment identifier corresponding to the text information, where the text information is obtained by converting the obtained audio data by the server, and the equipment identifier is an equipment identifier of an audio equipment that generates the audio data, and accordingly, the displaying module 702 may display the obtained text information and the equipment identifier.

The present disclosure also discloses an audio data processing system. Fig. 8 is a schematic block diagram of an audio data processing system 800 according to a first embodiment of the disclosure. As shown in fig. 8, includes: a first audio data processing apparatus 500 and a second audio data processing apparatus 600.

The first audio data processing device 500 may be the audio data processing device in the embodiment shown in fig. 5, and the second audio data processing device 600 may be the audio data processing device in the embodiment shown in fig. 6.

By applying the scheme of the system embodiment, one or more audio devices can be accessed to the terminal device according to actual needs, and the device identifier of the audio device can be uploaded simultaneously when the audio data generated by any audio device is uploaded, and correspondingly, the audio data and the corresponding device identifier can be correspondingly stored, so that different audio data sources can be effectively distinguished, and the implementation complexity, the implementation cost and the like are reduced.

Fig. 9 is a schematic diagram illustrating a second embodiment 900 of an audio data processing system according to the present disclosure. As shown in fig. 9, includes: a first audio data processing device 500, a second audio data processing device 600 and a third audio data processing device 700.

The first audio data processing device 500 may be the audio data processing device in the embodiment shown in fig. 5, the second audio data processing device 600 may be the audio data processing device in the embodiment shown in fig. 6, and the third audio data processing device 700 may be the audio data processing device in the embodiment shown in fig. 7.

Compared with the embodiment shown in fig. 8, the audio data processing system shown in fig. 9 further includes a third audio data processing apparatus 700, and through the third audio data processing apparatus 700, the device identifier of the audio device that generates the audio data can be displayed, so that the user can effectively distinguish different sources of the audio data based on the displayed content.

For the specific work flow of the above device and system embodiments, please refer to the related description in the foregoing method embodiments, and further description is omitted.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 1001 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in this disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into RAM1003 and executed by computing unit 1001, one or more steps of the methods described in the present disclosure may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured by any other suitable means (e.g., by means of firmware) to perform the methods described by the present disclosure.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS). The server may also be a server of a distributed system, or a server incorporating a blockchain. Cloud computing refers to accessing an elastically extensible shared physical or virtual resource pool through a network, resources can include servers, operating systems, networks, software, applications, storage devices and the like, a technical system for deploying and managing the resources in a self-service mode as required can be achieved, and efficient and powerful data processing capacity can be provided for technical applications and model training of artificial intelligence, block chains and the like through a cloud computing technology.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

It should be noted that the scheme disclosed by the present disclosure can be applied to the field of artificial intelligence, in particular to the fields of intelligent speech, natural language processing, and the like.

Artificial intelligence is a subject for studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware technology and a software technology, the artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

Claims

1. An audio data processing method, comprising:

determining audio equipment of access terminal equipment;

when the audio equipment generates audio data, acquiring the audio data;

2. The method of claim 1, wherein the obtaining audio data when the audio device generates the audio data comprises:

and when the voice conference in which the audio equipment participates is in a running state and the audio equipment generates audio data, acquiring the audio data.

3. The method of claim 2, wherein the sending the audio data and the device identification of the audio device to the terminal device associated server comprises:

acquiring a conference identifier of the voice conference;

and sending the audio data, the equipment identifier and the conference identifier to the server.

4. The method of claim 1, further comprising:

and periodically detecting whether audio equipment accessed to the terminal equipment exists or not according to a preset time interval.

5. The method of claim 1, wherein,

the audio device includes: and accessing the audio equipment of the terminal equipment in a hub mode.

6. The method of claim 2, further comprising:

and establishing a long link with the server, and acquiring the conference state of the voice conference from the server through the long link.

7. An audio data processing method, comprising:

and correspondingly storing the audio data and the equipment identifier.

8. The method of claim 7, wherein the audio data comprises:

audio data generated by the audio device when a voice conference in which the audio device is participating is in progress.

9. The method of claim 8, further comprising:

acquiring a conference identifier of the voice conference;

and storing the audio data into a corresponding audio file, wherein the audio file takes the combination of the conference identifier and the equipment identifier as a file identifier.

10. The method of claim 8, further comprising:

and establishing a long link with the client, and synchronizing the conference state of the voice conference to the client through the long link.

11. The method of claim 8, further comprising:

and converting the audio data into character information, and sending the character information and the equipment identification to terminal equipment participating in the voice conference for display.

12. The method of claim 9, further comprising:

and after the voice conference is finished, combining the audio files corresponding to the audio devices participating in the voice conference according to the file identification to obtain the conference audio file of the voice conference.

13. An audio data processing method, comprising:

and displaying the equipment identification.

14. An audio data processing apparatus comprising: the device comprises a determining module, a first obtaining module and a sending module;

15. The apparatus of claim 14, wherein,

the first obtaining module obtains the audio data when the voice conference in which the audio equipment participates is in a proceeding state and the audio equipment generates the audio data.

16. The apparatus of claim 15, wherein,

the first obtaining module is further configured to obtain a conference identifier of the voice conference;

the sending module is further configured to send the audio data, the device identifier, and the conference identifier to the server.

17. The apparatus of claim 14, wherein,

the determining module is further configured to periodically detect whether an audio device accessing the terminal device exists according to a preset time interval.

18. The apparatus of claim 14, wherein,

19. The apparatus of claim 15, wherein,

the first obtaining module is further configured to establish a long link with the server, and obtain a conference state of the voice conference from the server through the long link.

20. An audio data processing apparatus comprising: a second acquisition module and a storage module;

21. The apparatus of claim 20, wherein,

the audio data includes: audio data generated by the audio device when a voice conference in which the audio device is participating is in progress.

22. The apparatus of claim 21, wherein,

the second obtaining module is further configured to obtain a conference identifier of the voice conference;

the storage module is further configured to store the audio data in a corresponding audio file, where the audio file uses a combination of the conference identifier and the device identifier as a file identifier.

23. The apparatus of claim 21, wherein,

the second obtaining module is further configured to establish a long link with the client, and synchronize a conference state of the voice conference with the client through the long link.

24. The apparatus of claim 21, wherein,

the storage module is further used for converting the audio data into text information and sending the text information and the equipment identification to the terminal equipment participating in the voice conference for display.

25. The apparatus of claim 22, wherein,

and the storage module is further used for merging the audio files corresponding to the audio devices participating in the voice conference according to the file identification after the voice conference is finished to obtain the conference audio file of the voice conference.

26. An audio data processing apparatus comprising: a third acquisition module and a display module;

and the display module is used for displaying the equipment identifier.

27. An audio data processing system comprising:

apparatus as claimed in any one of claims 14 to 19 and apparatus as claimed in any one of claims 20 to 25.

28. The system of claim 27, further comprising: the apparatus of claim 26.

29. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

30. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-13.

31. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-13.