CN113643685A - Data processing method and device, electronic equipment and computer storage medium - Google Patents

Data processing method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN113643685A
CN113643685A CN202110951832.1A CN202110951832A CN113643685A CN 113643685 A CN113643685 A CN 113643685A CN 202110951832 A CN202110951832 A CN 202110951832A CN 113643685 A CN113643685 A CN 113643685A
Authority
CN
China
Prior art keywords
audio data
earphone
data
uplink
downlink
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110951832.1A
Other languages
Chinese (zh)
Inventor
蒋习旺
邢仁泰
王婧雅
王婷
张明昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN202110951832.1A priority Critical patent/CN113643685A/en
Publication of CN113643685A publication Critical patent/CN113643685A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)

Abstract

The disclosure provides a data processing method, a data processing device, electronic equipment and a computer storage medium, and relates to the fields of artificial intelligence, voice technology and the like. The specific implementation scheme is as follows: acquiring uplink audio data and downlink audio data of a target earphone end; generating combined data with a set format according to the uplink audio data and the downlink audio data; and sending the merged data to a designated receiving end. According to the technology disclosed by the invention, the problem of sending audio data at the target earphone end is solved, so that the audio data can be conveniently subjected to character conversion or other processing, the diversity of additional functions in the conversation process can be favorably improved, and the satisfaction degree of a user on conversation application is improved.

Description

Data processing method and device, electronic equipment and computer storage medium
Technical Field
The present disclosure relates to the field of computer technology, and more particularly to the fields of artificial intelligence, speech technology, and the like.
Background
With the development of computer technology, branch technologies in the computer field such as speech recognition are also rapidly developed. Speech recognition is a technology that takes speech as a research object and enables a computer to automatically recognize human language through a series of speech signal processing and pattern recognition. It requires parsing the audio and then identifying the parsing as a word.
Nowadays, various electronic product functions that people used in daily life are more and more abundant, for example, the earphone is used for receiving and transmitting the audio originally, and along with the development of intelligent earphone, the earphone has also gradually possessed more functions, and the user also is more and more high to the requirement of earphone.
For example, when a user wants to make a call, the user can automatically perform character recognition on the call content and record the call content for subsequent reference. Therefore, improvements in headphone products are needed to meet the increased demands of users on headphones.
Disclosure of Invention
The disclosure provides a data processing method, a data processing device, an electronic device and a computer storage medium.
According to an aspect of the present disclosure, there is provided a data processing method including:
acquiring uplink audio data and downlink audio data of a target earphone end;
generating combined data with a set format according to the uplink audio data and the downlink audio data;
and sending the merged data to a designated receiving end.
According to another aspect of the present disclosure, there is provided a data processing method including:
receiving merged data, wherein the merged data is generated by any one embodiment of the disclosure;
decomposing the merged data into uplink audio data and downlink audio data according to a set format;
and sending the uplink audio data and the downlink audio data to a cloud audio processing module.
According to another aspect of the present disclosure, there is provided a data processing apparatus including:
the acquisition module is used for acquiring uplink audio data and downlink audio data of a target earphone end;
the merging module is used for generating merged data with a set format according to the uplink audio data and the downlink audio data;
and the sending module is used for sending the merged data to the appointed receiving end.
According to another aspect of the present disclosure, there is provided a data processing apparatus including:
a receiving module, configured to receive merged data, where the merged data is generated by any one of the embodiments of the present disclosure;
the decomposition module is used for decomposing the combined data into uplink audio data and downlink audio data according to a set format;
and the forwarding module is used for sending the uplink audio data and the downlink audio data to the cloud audio processing module.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the problem of sending audio data at the target earphone end is solved, so that the audio data can be conveniently subjected to character conversion or other processing, the diversity of additional functions in the conversation process can be favorably improved, and the satisfaction degree of a user on conversation application is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a data processing method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a data processing method according to another embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a data processing method according to yet another embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a data processing method according to an example of the present disclosure;
FIG. 5 is a schematic diagram of a data processing method according to another example of the present disclosure;
FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a data processing apparatus according to another embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
FIG. 9 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
FIG. 10 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
FIG. 11 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
FIG. 12 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
FIG. 13 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
fig. 14 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An embodiment of the present disclosure first provides a data processing method, as shown in fig. 1, including:
step S11: acquiring uplink audio data and downlink audio data of a target earphone end;
step S12: generating combined data with a set format according to the uplink audio data and the downlink audio data;
step S13: and sending the merged data to a designated receiving end.
In this embodiment, the steps S11-S13 may be executed at the earphone end, and specifically, may be executed cyclically in the call process according to a set period.
In this embodiment, the target earphone end may be a data processing end of any one of left and right earphones of a user, a data processing end specially configured for a dual-side earphone, a data processing end of a single-side earphone separately produced and sold, a data processing end of any one of left and right earphones which are sold in pairs and can be separated wirelessly, a data processing end of a headphone or a wired dual-earphone, or a data processing end of any electronic product having a microphone and a speaker.
In this embodiment, the uplink audio data may be audio data received by a microphone for transmission.
Further, the uplink audio data may be separate audio data, or audio data in a video call or audio data in a video file.
In this embodiment, the downstream audio data may be audio data received or played through a speaker for a user at a target earphone end to listen to.
Further, the downstream audio data may be separate audio data, or audio data in a video call or audio data in a video file.
In this embodiment, the obtaining of the uplink audio data and the downlink audio data of the target earphone end may be obtaining the uplink audio data and the downlink audio data through the target earphone end, or obtaining the audio data transmitted through the target earphone end.
In a specific implementation manner, null data may exist in the uplink audio data and the downlink audio data, that is, in a call scenario, only one of the calling party and the called party transmits voice for a long time, and correspondingly, the downlink audio data or the uplink audio data may be null data for a long time. For example, in a scene of playing an audio or video file, only downlink audio data exists, and uplink audio data is empty.
In another specific implementation manner, the acquiring of the uplink audio data and the downlink audio data of the target earphone end may be acquiring at least one of the uplink audio data and the downlink audio data of the target earphone end, for example, in a case of acquiring only the uplink audio data, it may be considered that the step of acquiring the uplink audio data and the downlink audio data of the target earphone end is completed. In the case where only the downstream audio data is acquired, it is also considered that the step of acquiring the upstream audio data and the downstream audio data of the target earphone end is completed.
In another case, if the uplink audio data and the downlink audio data of the target earphone end are obtained, only the uplink audio data or the downlink audio data is obtained within a limited acquisition time due to an excessively long speaking time of the audio calling party, it may be considered that the step of obtaining the uplink audio data and the downlink audio data of the target earphone end is completed.
In a specific implementation manner, the merged data with a set format is generated according to the uplink audio data and the downlink audio data, and the uplink audio data and the downlink audio data may be directly merged to obtain merged audio data as the merged data.
In the case where only one of the upstream audio data and the downstream audio data is collected in the collection period, the single upstream audio data or the downstream audio data may be used as the merged data, or the collected one of the upstream audio data and the downstream audio data and the null audio data may be merged into the merged data.
Generating combined data with a set format according to the uplink audio data and the downlink audio data, or processing at least one of the uplink audio data and the downlink audio data and combining the processed uplink audio data and the processed downlink audio data. For example, the uplink audio data or the downlink audio data may be compressed, and the compressed uplink audio data and the uncompressed downlink audio data, or the uncompressed uplink audio data and the compressed downlink audio data, or the compressed uplink audio data and the downlink audio data may be merged to obtain merged data.
In this embodiment, the merged data may be data that can be independently transmitted.
In this embodiment, the designated receiving end may be a terminal connected to the earphone, for example, a mobile phone end, a desktop computer end, a notebook computer end, a tablet computer end, an intelligent wearable device end, and the like.
In one implementation, the merged data is sent to the designated receiving end, which may be after being compressed.
The merged data may be sent to a designated receiving end, or the merged data may be sent to a designated terminal according to a set transmission manner, for example, the merged data may be sent to the designated terminal through bluetooth transmission, NFC (Near Field Communication) transmission, DMA (Direct Memory Access) protocol transmission, or the like. The transmission of the merged data may also be via other protocols recognized by both parties.
In this embodiment, the uplink audio data and the downlink audio data received at the target earphone terminal can be converted into merged data, and the merged data is sent to the designated receiving terminal, so that other receiving terminals can receive the uplink audio data and the downlink audio data (or at least one of the uplink audio data and the downlink audio data), and thus the audio processing is performed on the uplink audio data and the downlink audio data (or at least one of the uplink audio data and the downlink audio data). Under the condition that the target earphone end is a data processing end of a single earphone, the receiving and sending of combined data can be realized only by one earphone end, so that the processing of audio data is not influenced by the specific condition that a user wears a single earphone or double earphones.
In an example of the present disclosure, the data processing method may be applied to solve a voice call use scenario of a True Wireless intelligent (TWS) headset.
Specifically, the embodiment of the disclosure can be used for collecting call audio from an earphone end in the process of transcribing the call audio into characters when earphones such as a TWS (two way communication system) are adopted. Therefore, the method can conveniently record the conversation contents of both parties of the conversation by using the characters as carriers, and can also conveniently realize later consultation and sharing.
In the intelligent real wireless earphone call use scene, how to transmit the uplink and downlink audio data from the earphone end to the mobile phone end can adopt various different methods. One of the methods may be to take the data processing end corresponding to the left earphone and the right earphone as the target earphone end, collect and compress two paths of audio on the master earphone and the slave earphone respectively, and add the streamID (stream identification code) to the head of the compressed audio frame. The master earphone may be one of the left and right earphones, and the slave earphone may be the other of the left and right earphones. The processed data are transmitted to the main earphone through the TWS connection between the two ears by the auxiliary earphone, and the main earphone transmits the audio data of the local terminal and the audio data of the auxiliary earphone to user terminals such as mobile phones and the like in a unified mode. The user terminals such as the mobile phone and the like distinguish the uplink audio data from the downlink audio data through streamID, and send the received audio data to the cloud for identification or character conversion and other operations.
In this embodiment, a call transfer mode of the headset may be set through an APP (Application), so that the data processing method according to the embodiment of the present disclosure is executed in this mode. Meanwhile, the APP can also start and close the call transfer function at any time, and accordingly the data processing method provided by the embodiment of the disclosure is stopped or started to be executed.
In other specific embodiments, the user can be set to freely select the dual-earphone audio collection or the single-earphone audio collection, so that in some cases (such as the situation that the power is about to be exhausted but important speech of the other party needs to be recorded), only the operation of collecting and transmitting the uplink audio data or the downlink audio data can be performed by using the single earphone.
In other implementation manners, the embodiment of the disclosure can also be applied to activities related to the microphone function or the loudspeaker function of the earphone, such as unilateral recording.
In one embodiment, acquiring the upstream audio data and the downstream audio data at the target earphone end includes:
acquiring audio data received by a microphone at a target earphone end as uplink audio data;
and acquiring the loudspeaker data of the target earphone end as downlink audio data.
It can be understood that the step of acquiring the uplink audio data and the step of acquiring the downlink audio may be performed simultaneously, or may be performed separately according to any sequence.
In this embodiment, the speaker data at the target earphone end may be reference data sent by the other end received by the target earphone end, or data generated by the target earphone end for playing audio to the user.
The audio data received by the microphone at the target earphone end may be audio data received by a microphone having a connection relationship with the target earphone end, or audio data acquired by a microphone of the target earphone end itself.
In this embodiment, the microphone and the speaker are used to obtain the uplink audio data and the downlink audio data, so that the audio data can be conveniently merged and transmitted. Under the condition that the target earphone end is a data processing end corresponding to the left earphone or a data processing end corresponding to the right earphone, the collection of audio data needing to be converted into characters can be completed on the single earphone end, and then the character conversion function can be realized under the condition that a user only uses one earphone. And if the uplink audio data and the downlink audio data are acquired on the single earphone end, the data transmission can be reduced as much as possible, and the requirement on the performance of the earphone is lowered.
In one embodiment, generating merged data in a set format according to upstream audio data and downstream audio data includes:
and carrying out interleaving combination on the uplink audio data and the downlink audio data to obtain combined data.
In another implementation, the uplink audio data and the downlink audio data may also be spliced back and forth, and an identifiable mark is set at the spliced position to distinguish two different audio data. It is also possible to add start and/or end symbols at the start and stop positions of at least one of the upstream audio data and the downstream audio data, and then splice the two together.
In other implementation manners, more than two merging manners can be adopted, and signals can be added into the merged data to distinguish different merging manners, so that various modes of processing audio data, such as high energy consumption, high performance, low energy consumption, low performance and the like of the earphone are realized.
In this embodiment, the uplink audio data and the downlink audio data are interleaved and combined to generate one audio data, and only one compression and transmission instance needs to be generated during subsequent compression and transmission, so that the data processing performance requirement of the processor at the multi-earphone end is reduced.
In one embodiment, interleaving and combining the upstream audio data and the downstream audio data includes:
and combining the uplink audio data and the downlink audio data according to a mode that the single audio frames are sequentially arranged adjacently and in a staggered manner.
In one specific implementation, one audio frame may include 16-bit audio data, and thus, the structure of the merged data may be organized in a manner such as table 1, for example, such that the merged data includes upstream audio data and downstream audio data arranged in a single frame staggered manner.
Figure BDA0003217269650000081
TABLE 1
In other implementation manners, the uplink audio data and the downlink audio data may be combined in a manner that two audio frames are sequentially arranged adjacently and alternately.
In other implementations, the interleaving and combining of the uplink audio data and the downlink audio data may be performed according to the uplink audio data of the first setting frame + the downlink audio data of the second setting frame.
In this embodiment, the uplink and downlink audio data are combined in a manner that single audio frames are sequentially adjacent and staggered, so that two pieces of audio data can be transmitted as one piece of audio data, the number of instances (such as compression examples) related to transmission operation is reduced, and the requirement on the performance of the target earphone end is lowered.
In one embodiment, the target earphone end is one of a left earphone end and a right earphone end.
In a specific implementation, the left and right earphone ends may be left and right earphone ends of a wireless earphone or an intelligent earphone.
The left and right earphone ends may also be two earphone ends connected by wire.
In this embodiment, the left earphone end may be a data processing end of an earphone worn on the left ear of the user; the right earphone end may be a data processing end of an earphone worn on the right ear of the user.
In this embodiment, the target earphone end is a data processing end of a single earphone, so that transmission steps of data between two earphone ends are reduced, and when a user uses a single earphone, uplink and downlink audio data can be sent to a designated receiving end to perform audio conversion of characters or other operations.
In one embodiment, as shown in fig. 2, the data processing method further includes:
step S21: acquiring position information of a left earphone and a right earphone;
step S22: determining a main earphone in a worn and used state in the left earphone and the right earphone according to the position information of the left earphone and the right earphone;
step S23: and taking the earphone end corresponding to the main earphone as a target earphone end.
The left earphone and the right earphone can be earphones which are respectively worn on the left ear and the right ear of the human body.
The position information of the left earphone and the right earphone may be physical position information of the left earphone and the right earphone, for example, whether the left earphone is worn on the human ear, whether the right earphone is worn on the human ear, and the like.
In the present embodiment, the wearing state may be a state of being worn on the ear of a person. The in-use state may be a state in which audio is being transmitted.
The primary earpiece may be one of a left earpiece and a right earpiece.
The earphone end corresponding to the main earphone can be an earphone data processing end corresponding to the main earphone.
In this embodiment, the earphone end corresponding to the main earphone is used as the target earphone end, so that the uplink audio data and the downlink audio data are collected, combined and transmitted through a single earphone, the situation that the audio transmission function is unavailable when the earphone is worn alone is avoided, and the transmission frequency of the data is reduced compared with the situation that two earphones collect the audio data.
In other embodiments, the primary earphone of the left and right earphones may be determined only based on whether the earphones are worn.
In other implementations, the target earphone end may also be the earphone end corresponding to the slave earphone.
In one embodiment, determining a primary earphone in a worn and used state of a left earphone and a right earphone according to position information of the left earphone and the right earphone comprises:
determining that the earphone in the worn and used state is the main earphone under the condition that only one earphone in the left earphone and the right earphone is in the worn state according to the position information;
or, under the condition that the left earphone and the right earphone are both in the worn state according to the position information, determining the earphone in the use state as the main earphone;
or, under the condition that the left earphone and the right earphone are both worn according to the position information, determining one main earphone of the two earphones in the using state according to default setting.
In other implementations, the primary earpiece may also be determined based solely on earpiece position. For example, in the case where both earphones are worn, the master earphone may be determined by default. In the case where only one of the two earphones is worn, it may be determined that the worn earphone is the master earphone.
In this embodiment, the main earphones in the left earphone and the right earphone are determined according to the wearing and using states, so that whether the user uses the earphones or not can be automatically identified, the earphones which can acquire uplink and downlink audio and can be sent in a combined manner can be accurately determined when the user only wears one earphone or one earphone is abnormal in function, and the use of the user can be better matched.
In one embodiment, sending the merged data to a designated receiving end includes:
and compressing the merged data, and sending the compressed merged data to a designated receiving end.
In this embodiment, the merged data is compressed, and may be that the upstream audio data and the downstream audio data are merged into one audio data as merged data, and the merged data is compressed.
And the compressed combined data is sent to the appointed receiving terminal, so that the information transmission bandwidth between the target earphone terminal and the appointed receiving terminal can be saved, meanwhile, the combined data is compressed, only one compression example is started, and compared with respective compression, the number of the compression examples is reduced.
An embodiment of the present disclosure further provides a data processing method, as shown in fig. 3, including:
step S31: receiving merged data, wherein the merged data are generated in any embodiment of the disclosure;
step S32: decomposing the merged data into uplink audio data and downlink audio data according to a set format;
step S33: and sending the uplink audio data and the downlink audio data to a cloud audio processing module.
In this embodiment, the merged data may be merged data sent by the target earphone end, and may be merged data in a compressed format.
The compressed merged data may be decompressed before being decomposed into upstream audio data and downstream audio data.
The merged data is decomposed into the uplink audio data and the downlink audio data according to a set format, and the merged data can be split according to a merging mode when the uplink audio data and the downlink audio data are merged, and the merged data is decomposed into the uplink audio data and the downlink audio data.
The set format may be a format agreed upon by the designated receiving end and the target earphone end. For example, if the uplink and downlink audio data are combined in a manner that the single audio frames are sequentially adjacent and staggered, the single audio frame is extracted as one of the uplink audio data and the downlink audio data, and the even audio frame is extracted as the other of the uplink audio data and the downlink audio data.
In this embodiment, the steps S31-S33 may be executed at the designated receiving end of the merged data, or may be executed at the cloud. That is, the operation of splitting the merged data into the upstream audio data and the downstream audio data may be performed at the cloud.
When the terminal executes the splitting operation, the terminal can analyze and judge the audio data, and when the uplink audio data or the downlink audio data are identified to be empty, only the audio data which is not empty can be sent.
The designated receiving end can be a user terminal such as a mobile phone.
The cloud audio processing module can be used for converting at least part of audio data in the uplink audio data and the downlink audio data into characters or information required by other users.
In this embodiment, the merged data is split into the uplink audio data and the downlink audio data according to a set format, and the split uplink and downlink audio data is sent to the cloud audio processing module, so that the audio can be converted into information required by users such as characters.
In one embodiment, the data processing method further comprises:
and receiving the converted text information generated by the cloud audio processing module according to the uplink audio data and the downlink audio data.
In this embodiment, the converted text information generated according to the uplink audio data and the downlink audio data may be converted text information generated according to at least a part of audio in the uplink audio data and the downlink audio data.
In other embodiments, other processing may be performed on the upstream audio data and the downstream audio data, for example, adding an effect.
In this embodiment, in the ancient city where the uplink audio data and the downlink audio data are converted into characters, VAD (Voice Activity Detection) and framing processing may be performed on an audio portion to be converted, feature value extraction may be performed on the framed audio, then the audio frame with the extracted feature value is identified as a state, phonemes are formed from the states, and finally the phonemes are formed into words, thereby realizing conversion from the audio to the characters.
It is considered that not only the acoustic model can be used to convert the audio frame into a state during the speech recognition process, but also a state network can be constructed in which a best matching path is found for the sound. Therefore, speech recognition processing, conversion to text, may require a great deal of effort and memory. Thus, in the embodiments of the present disclosure, such complex speech recognition processing can be performed by placing it in the cloud, and a user terminal with limited capability can only provide VAD audio or original audio for the cloud.
In this embodiment, the received converted text information may be displayed on a display interface of a designated terminal, or the converted text information may be stored in a file such as a document or a notepad for viewing by a user.
In one example of the present disclosure, a data processing method includes the steps shown in fig. 4:
step S41: and acquiring uplink audio data and downlink audio data in the call process.
In the disclosed example, the uplink audio data and the downlink audio data during the call can be collected only on a single main earphone. To collect the upstream audio data, a copy of the upstream data may be acquired somewhere on the data link at the microphone (Micphone) of the primary earpiece, as the upstream audio data collected at the single primary earpiece in this example. To obtain the downstream data of the call, the reference data of the master earpiece may be used. For processing convenience, the position for acquiring the downlink data is consistent with the position for acquiring the uplink data, that is, the uplink audio data and the downlink audio data are intercepted at the same position of the earphone software module.
In this example, by acquiring the uplink audio data and the downlink audio data of the call only on the single main ear, the audio acquisition task for the call transcription and other functions can be completed even in a single-ear call scene, and the use scene of the call transcription function is expanded.
Step S42: and (5) data compression.
In this example, after the uplink audio data and the downlink audio data in the call process are collected, data may be compressed. In particular implementations, audio data compression may be performed in a number of suitable ways. In this example, the Opus compression method may be employed to compress the upstream audio data and the downstream audio data. The Opus compression method used in this example is a format of lossy voice coding, developed by xiph.
In this example, the audio frames may be compressed in a unified compression manner, where the uplink audio data and the downlink audio data are combined in an interleaved manner to generate merged data, and then each frame in the merged data is compressed. By using a uniform compression mode, the compression of audio data can be realized only by starting one path of compression example, which not only reduces the complexity of realizing compression, but also reduces the computing power requirement of a compression algorithm on an earphone MCU (Micro Controller Unit).
In other implementation manners, the uplink audio data and the downlink audio data may be compressed separately by using a separate compression manner. In the process of respectively compressing, a compression example is started for each path of data, and two paths of data are respectively compressed.
Step S43: and audio data transmission.
In order to realize that uplink audio data and downlink audio data acquired by the earphone in the call process are finally transmitted to the cloud, a DMA (direct memory access) protocol can be adopted to transmit the compressed two-way call data to a designated receiving end such as a mobile phone. And a designated receiving terminal such as a mobile phone decompresses the compressed audio data after receiving the compressed audio data, restores the decompressed data into uplink audio data and downlink audio data, and transmits the uplink audio data and/or the downlink audio data to the cloud terminal by using a wireless network or a wired network.
And after receiving the uplink audio data and/or the downlink audio data, the cloud performs voice recognition, and returns recognized characters or other recognition results to specified terminals such as a mobile phone. Finally, the recognized characters are displayed in the conversation APP for displaying.
In one example of the present disclosure, a data processing method is shown in fig. 5, and includes:
step S51: and acquiring uplink audio data and downlink audio data in the voice or video call process at the target earphone end, interleaving and combining the acquired audio data, and compressing the combined data.
Step S52: and sending the compressed combined data to a specified terminal through a DMA protocol. The designated terminal may be, for example, a mobile phone or the like. The merged data can be transmitted from the target earphone end to the designated receiving end in a short-distance transmission mode.
Step S53: and transmitting the uplink audio data and the downlink audio data obtained according to the compressed combined data to a cloud terminal through a wired network or a wireless network.
Step S54: and converting the uplink audio data and the downlink audio data at the cloud. Specifically, it can be converted into text.
Step S55: and displaying the conversion result on a designated APP display interface or other designated display interfaces.
An embodiment of the present disclosure further provides a data processing apparatus, as shown in fig. 6, including:
an obtaining module 61, configured to obtain uplink audio data and downlink audio data of a target earphone end;
a merging module 62, configured to generate merged data in a set format according to the uplink audio data and the downlink audio data;
and a sending module 63, configured to send the merged data to a specified receiving end.
In one embodiment, as shown in fig. 7, the obtaining module includes:
an uplink unit 71, configured to acquire audio data received by a microphone at a target earphone end as uplink audio data;
the downstream unit 72 acquires speaker data at the target earphone end as downstream audio data.
In one embodiment, as shown in fig. 8, the merging module further comprises:
the interleaving and merging unit 81 is configured to interleave and merge the uplink audio data and the downlink audio data to obtain merged data.
In one embodiment, the interleaving-merging unit is further configured to:
and combining the uplink audio data and the downlink audio data according to a mode that the single audio frames are sequentially arranged adjacently and in a staggered manner.
In one embodiment, the target earphone end is one of a left earphone end and a right earphone end.
In one embodiment, as shown in fig. 9, the data processing apparatus further includes:
a position module 91, configured to obtain position information of the left earphone and the right earphone;
the main earphone module 92 is used for determining a main earphone in a worn and used state in the left earphone and the right earphone according to the position information of the left earphone and the right earphone;
and a target earphone end module 93, configured to use an earphone end corresponding to the main earphone as a target earphone end.
In one embodiment, as shown in fig. 10, the primary earpiece module includes at least one of:
a first unit 101, configured to determine that an earphone in a worn and used state is a main earphone when it is determined that only one of the left earphone and the right earphone is in a worn state according to the position information;
the second unit 102 is configured to determine that the earphone in the use state is the main earphone when it is determined that the left earphone and the right earphone are both in the worn state according to the position information;
a third unit 103, configured to determine, according to a default setting, that one of the two earphones which are in the use state is the main earphone if it is determined that both the left earphone and the right earphone are worn according to the position information.
In one embodiment, as shown in fig. 11, the sending module further includes:
a compression unit 111 for compressing the merged data;
and the execution unit 112 is configured to send the compressed merged data to a designated receiving end.
An embodiment of the present disclosure further provides a data processing apparatus, as shown in fig. 12, including:
a receiving module 121, configured to receive merged data, where the merged data is generated in any embodiment of the present disclosure;
a decomposition module 122, configured to decompose the merged data into uplink audio data and downlink audio data according to a set format;
and the forwarding module 123 is configured to send the uplink audio data and the downlink audio data to the cloud audio processing module.
In one embodiment, as shown in fig. 13, the data processing apparatus further includes:
the text receiving module 131 is configured to receive converted text information generated by the cloud audio processing module according to the uplink audio data and the downlink audio data.
The functions of each unit, module or sub-module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method embodiments, and are not described herein again.
The embodiment of the disclosure relates to the technical field of computers, in particular to the fields of artificial intelligence, voice technology and the like.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 14 shows a schematic block diagram of an example electronic device 140 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 14, the electronic device 140 includes a computing unit 141, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)142 or a computer program loaded from a storage unit 148 into a Random Access Memory (RAM) 143. In the RAM 143, various programs and data required for the operation of the electronic apparatus 140 can also be stored. The calculation unit 141, the ROM 142, and the RAM 143 are connected to each other via a bus 144. An input/output (I/O) interface 145 is also connected to bus 144.
A number of components in the electronic device 140 are connected to the I/O interface 145, including: an input unit 146 such as a keyboard, a mouse, or the like; an output unit 147 such as various types of displays, speakers, and the like; a storage unit 148 such as a magnetic disk, optical disk, or the like; and a communication unit 149 such as a network card, modem, wireless communication transceiver, etc. The communication unit 149 allows the electronic device 140 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
Computing unit 141 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 141 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 141 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 148. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 140 via ROM 142 and/or communications unit 149. When loaded into RAM 143 and executed by computing unit 141, a computer program may perform one or more steps of the data processing method described above. Alternatively, in other embodiments, the computing unit 141 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (23)

1. A method of data processing, comprising:
acquiring uplink audio data and downlink audio data of a target earphone end;
generating combined data with a set format according to the uplink audio data and the downlink audio data;
and sending the merged data to a designated receiving end.
2. The method of claim 1, wherein said obtaining upstream audio data and downstream audio data at the target earphone end comprises:
acquiring audio data received by a microphone at the target earphone end as the uplink audio data;
and acquiring the loudspeaker data of the target earphone end as the downlink audio data.
3. The method according to claim 1 or 2, wherein the generating of the merged data in a set format according to the upstream audio data and the downstream audio data comprises:
and carrying out staggered combination on the uplink audio data and the downlink audio data to obtain the combined data.
4. The method of claim 3, wherein said interleaving the upstream audio data and the downstream audio data comprises:
and combining the uplink audio data and the downlink audio data according to a mode that single audio frames are sequentially arranged adjacently and in a staggered manner.
5. The method of any of claims 1-4, wherein the target earphone end is one of a left earphone end and a right earphone end.
6. The method of claim 5, further comprising:
acquiring position information of a left earphone and a right earphone;
determining a main earphone which is worn and in a use state in the left earphone and the right earphone according to the position information of the left earphone and the right earphone;
and taking the earphone end corresponding to the main earphone as the target earphone end.
7. The method of claim 6, wherein the determining a primary earpiece of the left and right earpieces that is worn and in use from the position information of the left and right earpieces comprises:
determining that the earphone in the worn and used state is the main earphone under the condition that only one earphone in the left earphone and the right earphone is in the worn state according to the position information;
or, under the condition that the left earphone and the right earphone are both in the worn state, determining the earphone in the use state as the main earphone according to the position information;
or, when the left earphone and the right earphone are both worn according to the position information, determining one of the two earphones in the use state as the main earphone according to default setting.
8. The method according to any one of claims 1-7, wherein the sending the merged data to a designated receiving end comprises:
and compressing the merged data, and sending the compressed merged data to a designated receiving end.
9. A method of data processing, comprising:
receiving merged data, the merged data being the merged data of any one of claims 1-8;
decomposing the merged data into uplink audio data and downlink audio data according to a set format;
and sending the uplink audio data and the downlink audio data to a cloud audio processing module.
10. The method of claim 9, further comprising:
and receiving converted text information generated by the cloud audio processing module according to the uplink audio data and the downlink audio data.
11. A data processing apparatus comprising:
the acquisition module is used for acquiring uplink audio data and downlink audio data of a target earphone end;
the merging module is used for generating merged data with a set format according to the uplink audio data and the downlink audio data;
and the sending module is used for sending the merged data to a designated receiving end.
12. The apparatus of claim 11, wherein the means for obtaining comprises:
an uplink unit, configured to acquire audio data received by a microphone at the target earphone end as the uplink audio data;
and the downlink unit is used for acquiring the loudspeaker data of the target earphone end as the downlink audio data.
13. The apparatus of claim 11 or 12, wherein the means for merging further comprises:
and the interleaving and merging unit is used for interleaving and merging the uplink audio data and the downlink audio data to obtain the merged data.
14. The apparatus of claim 13, wherein the interleaving and combining unit is further configured to:
and combining the uplink audio data and the downlink audio data according to a mode that single audio frames are sequentially arranged adjacently and in a staggered manner.
15. The apparatus of any of claims 11-14, wherein the target earphone end is one of a left earphone end and a right earphone end.
16. The apparatus of claim 15, wherein the apparatus further comprises:
the position module is used for acquiring the position information of the left earphone and the right earphone;
the main earphone module is used for determining a main earphone which is worn and in a use state in the left earphone and the right earphone according to the position information of the left earphone and the right earphone;
and the target earphone end module is used for taking the earphone end corresponding to the main earphone as the target earphone end.
17. The apparatus of claim 16, wherein the primary earpiece module comprises at least one of:
a first unit, configured to determine that an earphone in a worn and used state is the primary earphone when it is determined according to the location information that only one of the left earphone and the right earphone is in a worn state;
a second unit, configured to determine that the earphone in a use state is the primary earphone when it is determined according to the location information that both the left earphone and the right earphone are in a worn state;
and the third unit is used for determining one of the two earphones in the use state as the main earphone according to default setting under the condition that the left earphone and the right earphone are both worn according to the position information.
18. The apparatus of any of claims 11-17, wherein the means for transmitting further comprises:
a compression unit for compressing the merged data;
and the execution unit is used for sending the compressed combined data to a specified receiving end.
19. A data processing apparatus comprising:
a receiving module, configured to receive merged data, where the merged data is the merged data according to any one of claims 11 to 18;
the decomposition module is used for decomposing the merged data into uplink audio data and downlink audio data according to a set format;
and the forwarding module is used for sending the uplink audio data and the downlink audio data to the cloud audio processing module.
20. The apparatus of claim 19, wherein the apparatus further comprises:
and the character receiving module is used for receiving the converted character information generated by the cloud audio processing module according to the uplink audio data and the downlink audio data.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-10.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.
CN202110951832.1A 2021-08-18 2021-08-18 Data processing method and device, electronic equipment and computer storage medium Pending CN113643685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110951832.1A CN113643685A (en) 2021-08-18 2021-08-18 Data processing method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110951832.1A CN113643685A (en) 2021-08-18 2021-08-18 Data processing method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN113643685A true CN113643685A (en) 2021-11-12

Family

ID=78422835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110951832.1A Pending CN113643685A (en) 2021-08-18 2021-08-18 Data processing method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN113643685A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107105192A (en) * 2016-02-20 2017-08-29 广州市番禺区西派电子电器厂(普通合伙) A kind of IP network safety manages monitor
CN109587085A (en) * 2018-12-29 2019-04-05 深圳市网心科技有限公司 A kind of data transmission method and its relevant device
CN110012376A (en) * 2019-03-25 2019-07-12 歌尔科技有限公司 A kind of control method, earphone and the storage medium of earphone sound channel
CN111031168A (en) * 2019-12-09 2020-04-17 上海传英信息技术有限公司 Voice call recording method, device and readable storage medium
CN112770209A (en) * 2021-01-07 2021-05-07 深圳市天诺泰科技有限公司 Bluetooth headset, Bluetooth headset control system and method and storage medium
CN113014732A (en) * 2021-02-04 2021-06-22 腾讯科技(深圳)有限公司 Conference record processing method and device, computer equipment and storage medium
CN113132376A (en) * 2021-04-14 2021-07-16 腾讯科技(深圳)有限公司 Media data processing method, device and system, electronic equipment and storage medium
CN113242358A (en) * 2021-04-25 2021-08-10 百度在线网络技术(北京)有限公司 Audio data processing method, device and system, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107105192A (en) * 2016-02-20 2017-08-29 广州市番禺区西派电子电器厂(普通合伙) A kind of IP network safety manages monitor
CN109587085A (en) * 2018-12-29 2019-04-05 深圳市网心科技有限公司 A kind of data transmission method and its relevant device
CN110012376A (en) * 2019-03-25 2019-07-12 歌尔科技有限公司 A kind of control method, earphone and the storage medium of earphone sound channel
CN111031168A (en) * 2019-12-09 2020-04-17 上海传英信息技术有限公司 Voice call recording method, device and readable storage medium
CN112770209A (en) * 2021-01-07 2021-05-07 深圳市天诺泰科技有限公司 Bluetooth headset, Bluetooth headset control system and method and storage medium
CN113014732A (en) * 2021-02-04 2021-06-22 腾讯科技(深圳)有限公司 Conference record processing method and device, computer equipment and storage medium
CN113132376A (en) * 2021-04-14 2021-07-16 腾讯科技(深圳)有限公司 Media data processing method, device and system, electronic equipment and storage medium
CN113242358A (en) * 2021-04-25 2021-08-10 百度在线网络技术(北京)有限公司 Audio data processing method, device and system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US9294834B2 (en) Method and apparatus for reducing noise in voices of mobile terminal
JP2015060423A (en) Voice translation system, method of voice translation and program
US9311920B2 (en) Voice processing method, apparatus, and system
WO2016101571A1 (en) Voice translation method, communication method and related device
CN108810860B (en) Audio transmission method, terminal equipment and main earphone
CN105793922B (en) Apparatus, method, and computer readable medium for multi-path audio processing
CN112995730A (en) Sound and picture synchronous adjustment method and device, electronic equipment and medium
CN109803189B (en) Bluetooth headset based on BLE agreement
CN105825854B (en) A kind of audio signal processing method, device and mobile terminal
WO2021103741A1 (en) Content processing method and apparatus, computer device, and storage medium
WO2018132737A1 (en) Wired wearable audio video to wireless audio video bridging device
CN113643685A (en) Data processing method and device, electronic equipment and computer storage medium
CN112259076A (en) Voice interaction method and device, electronic equipment and computer readable storage medium
US20190004766A1 (en) Wired wearable audio video to wireless audio video bridging device
US20160182599A1 (en) Remedying distortions in speech audios received by participants in conference calls using voice over internet protocol (voip)
CN110035308A (en) Data processing method, equipment and storage medium
CN113808592A (en) Method and device for transcribing call recording, electronic equipment and storage medium
US20220124193A1 (en) Presentation of communications
US10580410B2 (en) Transcription of communications
CN112804613A (en) Bone conduction communication device
CN114221940B (en) Audio data processing method, system, device, equipment and storage medium
CN116546126B (en) Noise suppression method and electronic equipment
CN114267358B (en) Audio processing method, device, equipment and storage medium
CN114448957B (en) Audio data transmission method and device
WO2024021270A1 (en) Voice activity detection method and apparatus, and terminal device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination