WO2020237886A1 - Voice and text conversion transmission method and system, and computer device and storage medium - Google Patents

Voice and text conversion transmission method and system, and computer device and storage medium Download PDF

Info

Publication number
WO2020237886A1
WO2020237886A1 PCT/CN2019/103634 CN2019103634W WO2020237886A1 WO 2020237886 A1 WO2020237886 A1 WO 2020237886A1 CN 2019103634 W CN2019103634 W CN 2019103634W WO 2020237886 A1 WO2020237886 A1 WO 2020237886A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
voice
current network
extremely low
bandwidth
Prior art date
Application number
PCT/CN2019/103634
Other languages
French (fr)
Chinese (zh)
Inventor
齐燕
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020237886A1 publication Critical patent/WO2020237886A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of communication technology, and in particular to a voice and text conversion transmission method, system, computer equipment and storage medium.
  • audio and video conferences solve the problem of poor network transmission and low bandwidth, and usually adopt methods to reduce the bit rate of video and audio.
  • it is not applicable to scenarios with very low bandwidth, because the lowest bit rate of audio and video encoding is still higher than the available bandwidth.
  • audio information cannot be transmitted or packet loss occurs in the transmitted audio information.
  • audio and video may be interrupted and the purpose of information transmission may not be achieved. Therefore, there is an urgent need for a method that can communicate normally under extremely low bandwidth.
  • the main purpose of this application is to provide a voice and text conversion transmission method, system, computer equipment, and storage medium, aiming to solve the problem that audio conferences cannot be conducted under extremely low bandwidth.
  • this application provides a voice and text conversion transmission method, which includes the steps:
  • the sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
  • the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system
  • the voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end, where the target text includes a feature code and a text field.
  • the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
  • the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
  • the step of recognizing the voice information spoken by the user and converting it into target text includes:
  • the voice information is converted into text fields, and the audio information features in the voice information are extracted to generate a feature code; the audio information features include a voiceprint spectrum and a PCM code stream, and the feature code is generated according to the voiceprint A string of symbols
  • the feature code is added to the text field in a preset manner to obtain the target text.
  • the method further includes:
  • This application also proposes a voice and text conversion transmission method, including the steps:
  • the receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
  • the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
  • Receive the target text from the sender recognize the target text, convert the target text into voice information, and play it.
  • the step of the receiving end detecting whether the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth includes:
  • the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth.
  • the step of receiving the target text from the sending end, recognizing the target text, and converting the target text into voice information further includes:
  • the spectrum information and PCM code stream obtained by text conversion are exchanged with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
  • This application also proposes a voice and text conversion transmission system, including: a sending end and a receiving end;
  • the sending end is used to detect whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and to detect whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
  • the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system
  • the receiving end is used to detect whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and to detect whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
  • the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
  • Receive the target text from the sender recognize the target text, convert the target text into voice information, and play it.
  • This application also proposes a computer device including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the steps of any one of the above methods when the computer-readable instructions are executed by the processor.
  • This application also proposes a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of any one of the above methods are implemented.
  • a voice and text conversion transmission system, method, computer equipment, and storage medium provided in this application detect whether the network transmission bandwidth is extremely low bandwidth. If the network transmission bandwidth is extremely low bandwidth, the voice recognition system is activated. The sending end recognizes the user's voice information, converts the voice information into target text with feature information, and sends the target text to the receiving end. The receiving end receives the target text sent by the sending end, recognizes the target text, and The target text is converted into voice information and played.
  • the system automatically detects the network bandwidth, adaptively switches the transmission mode, and can still interact with the remote end smoothly when the network is not ideal, which solves the problem of voice transmission under extremely low bandwidth and achieves the purpose of information interaction.
  • the self-built speech model is used for conversion, which improves the fidelity.
  • Figure 1 is a schematic diagram of the steps of a voice and text conversion transmission method in an embodiment of the present application
  • Figure 2 is a schematic diagram of the steps of another voice and text conversion transmission method in an embodiment of the present application.
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
  • this application proposes a voice and text conversion transmission method, including the steps:
  • the sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detecting whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
  • the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, start the speech-to-text system and send the The receiving end sends a signal communicated through the voice-to-text system;
  • the network transmission bandwidth refers to the data transmission capacity in actual signal transmission; extremely low bandwidth refers to 10% lower than the theoretical value of the normal communication bandwidth.
  • the bandwidth rate is 4M/S
  • the theoretical value is 512KB/S
  • the actual value is about 400KB/S
  • the extremely low bandwidth means that the bandwidth rate is below 52KB/S.
  • step S2 after determining that the current network belongs to a very low bandwidth, the speech-to-text system is activated. Since the network speed is limited in the state of extremely low bandwidth, packet loss is very likely to occur in video and audio transmission, and the function of the speech recognition system is to ensure that the information used for communication is still normal in the state of extremely low bandwidth. transmission. Therefore, the client of the voice-to-text system needs to be activated as the sender.
  • the above and sending the signal to the receiving end to communicate through the voice-to-text system is to prompt or control the receiving end to start the text-to-speech system client installed at one end of the receiving end to communicate.
  • the sending end refers to a terminal that sends out the target text
  • the terminal may be a PC, a notebook computer, a tablet computer, and other intelligent terminal devices that can be connected to the network.
  • bandwidth is divided into uplink bandwidth and downlink bandwidth.
  • the upstream bandwidth and downstream bandwidth will not have an impact, but IP protocol transmission requires two-way interaction, which actually has some impact.
  • the extremely low bandwidth is not conducive to data transmission. Therefore, when the sender sends the target text to the receiver, in order to improve the efficiency of data transmission, the downlink bandwidth can be limited to a minimum before sending the target text. Restore after completion. Can achieve the purpose of improving the efficiency of data transmission.
  • this application receives the target text through the receiving end.
  • Corresponding clients are installed on the sender and receiver.
  • the receiving end also realizes the recognition of the target text through the client of the text-to-speech system, and converts the target text into voice information and plays it.
  • the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
  • the aforementioned preset network speed is the theoretical value of the network speed actually accessed in normal communication. By detecting the proportion of the network speed in the preset network speed, you can know whether the network transmission bandwidth belongs to the extremely low bandwidth.
  • the step S3 of recognizing the voice information spoken by the user through the voice-to-text system and converting it into target text includes:
  • the voice information feature includes a voiceprint spectrum and a PCM code stream, and the feature code is based on the voiceprint A string of symbols generated;
  • step S31 the aforementioned voice information refers to the words spoken by the user, and the aforementioned text field refers to the text generated by recognizing the words spoken by the same user in a continuous period of time.
  • the purpose of this step is to recognize what the user said and convert the content of the recognized user’s words into a paragraph of text.
  • the above-mentioned audio information feature refers to the information of the user's voiceprint spectrum and the PCM code stream in the generated recording file to identify what the user said.
  • the above feature code refers to the character string generated by the user's voiceprint feature. Because the user's voiceprint feature is unique, the generated character string is also unique, and can be used as a kind of identification information to extract the corresponding speech The human voice model is guaranteed to be error-free.
  • special information from the beginning of the character string to the end of the character string is added (for example, ## feature code## text field). When the speech recognition system recognizes the text field, it will automatically extract the feature code, and the feature code will not affect the recognition of the text field.
  • multiple target texts can be further packaged and compressed together, which is convenient for sending and further reduces space saving. Packing and compressing multiple target texts at one time can prevent data loss when transmitting data.
  • the method further includes:
  • inputting the extracted audio information features into the preset voice model means that since the pronunciation of each word is composed of syllables, the preset voice model is to record all the voices of the same user.
  • the audio information feature of the syllable is extracted from the audio information feature of all syllables spoken by the same user from the user's recording file, and then input into the preset voice model, and the obtained voice model has all the syllable features of the user's pronunciation.
  • the voice model is sent to the receiving end through step S3202.
  • the frequency characteristics of the pronunciation of the corresponding syllable can be synthesized through the syllable characteristics, and these frequency points can be converted
  • the PCM signal (through the inverse Fourier transform) can synthesize a personalized voice with user voice characteristics for language simulation.
  • this application also proposes a voice and text conversion transmission method, including the steps:
  • the receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
  • S30 Receive the target text sent by the sender, recognize the target text, convert the target text into voice information, and play it.
  • the actual rate when the user goes online is usually lower than the theoretical rate value.
  • the above-mentioned network transmission bandwidth refers to the data transmission capacity in actual signal transmission; extremely low bandwidth refers to 10% lower than the theoretical value of the normal communication bandwidth.
  • the bandwidth rate is 4M/S
  • the theoretical value is 512KB/S
  • the actual value is about 400KB/S
  • the extremely low bandwidth means that the bandwidth rate is below 52KB/S.
  • step S20 after it is determined that the current network belongs to a very low bandwidth, the text-to-speech system is activated.
  • the network speed is limited in the state of extremely low bandwidth, packet loss may occur in video and audio transmission.
  • the function of the text-to-speech system is to ensure that the information used for communication can still be used in the state of extremely low bandwidth. Normal transmission. Therefore, the client of the text-to-speech system needs to be activated as the receiver.
  • the above and sending the signal for communication through the text-to-speech system to the sending end is to prompt or control the sending end to start the voice-to-text system client installed at the sending end to communicate.
  • the above sending end refers to a terminal that sends out the target text
  • the terminal may be a PC, a notebook computer, a tablet computer, and other intelligent terminal devices that can be connected to the network.
  • the upstream bandwidth and downstream bandwidth will not have an impact, but IP protocol transmission requires two-way interaction, which actually has some impact.
  • the extremely low bandwidth is not conducive to data transmission. Therefore, when the receiving end receives the target text from the sending end, in order to improve the efficiency of data transmission, the uplink bandwidth can be limited to a minimum when receiving the target text, and the reception is completed Restore later. Can achieve the purpose of improving the efficiency of data transmission.
  • this application sends the target text through the sender.
  • Corresponding clients are installed on the sender and receiver.
  • the sending end also recognizes the voice information spoken by the user through the speech-to-text system, converts it into target text, and sends the target text to the receiving end.
  • the step S10 of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
  • S101 Monitor the current network speed of the sending end in real time, and compare the current network speed with a preset network speed;
  • the aforementioned preset network speed is the theoretical value of the network speed actually accessed in normal communication. By detecting the proportion of the network speed in the preset network speed, you can know whether the network transmission bandwidth belongs to the extremely low bandwidth.
  • the step S30 of receiving the target text from the sending end, recognizing the target text, and converting the target text into voice information further includes:
  • S301 Extract text fields according to feature information attached to the target text
  • S302 Convert the text in the text field into pronunciation syllables, and obtain spectrum information and PCM code streams corresponding to the syllables;
  • S303 Search for a voice model corresponding to the user in the local voice database according to the feature information attached to the target text;
  • S304 Convert the spectrum information and PCM code stream obtained by text conversion with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
  • the above-mentioned target text is converted from the words spoken by the user by the sending end.
  • the feature information can divide the target text into multiple paragraphs.
  • the segments all contain feature information of the corresponding user, that is, the target text is composed of multiple text fields, and each text field contains feature information. It means that the text field is converted from what a specific user said.
  • the target text contains A feature, B feature, A feature, and C feature; from this, the target text is a paragraph spoken by user A, a paragraph spoken by user B, and a paragraph spoken by user A And the conversion of a passage from the C user.
  • step S302 the text in the text field is converted into pronunciation syllables to obtain audio information, and the audio information includes the frequency spectrum information and the PCM code stream corresponding to the syllables.
  • step S303 in addition to extracting text fields, the feature information attached to the target text is also used to find a voice model.
  • the process is to compare the feature information attached to the target text with the user features contained in the voice model in the voice database. If the matching is successful, it indicates that the text field is the words spoken by the user corresponding to the voice model.
  • step S304 the above-mentioned adjustment of the spectrum information and the PCM code stream refers to replacing the characteristic spectrum segment and the PCM code stream in the user's voice model with the corresponding part of the spectrum information and the PCM code stream obtained by text conversion, namely Corresponding syllable phase replacement. Obtain audio information close to what the real user said. The sound heard by playing the audio information is close to the user's original words.
  • the specific generation process of the above-mentioned feature information can be summarized as: extracting the speaker's audio information features, such as the audio signal PCM code stream, and the spectral characteristics of the sound, and then summarizing and counting this information for a long time.
  • the above-mentioned spectral characteristics refer to: the PCM signal of speech is transformed into the frequency domain after Fourier transform: the value of each frequency point represents the magnitude of the frequency.
  • the sound is composed of many sine waves of different frequencies, and the frequency characteristic refers to the size of the sine wave of each frequency.
  • the specific process is to sample the analog signal such as voice at regular intervals to discretize it, and at the same time round the sampled value according to the stratification unit, and at the same time use a set of binary codes to represent the amplitude of the sampled pulse .
  • the user's voice characteristics can be extracted from the frequency characteristics. According to the obtained frequency characteristics, the user's voice characteristics can be extracted: for example, the value of the energy corresponding to each frequency, or the average value and variance of the energy of all frequency points, etc. can be taken.
  • the receiving end can synthesize the frequency characteristics of the corresponding syllable pronunciation by using the received text and combining the syllable characteristics of the model, and convert these frequency points to the PCM signal (through inverse Fourier transform) to synthesize a voice characteristic of the user , A personalized voice.
  • the voice and text conversion transmission method, system, computer equipment and storage medium proposed in this application detect whether the network transmission bandwidth belongs to extremely low bandwidth. If the network transmission bandwidth is extremely low bandwidth, the voice recognition system is activated. The sending end recognizes the user's voice information, converts the voice information into target text with feature information, and sends the target text to the receiving end. The receiving end receives the target text sent by the sending end, recognizes the target text, and The target text is converted into voice information and played.
  • the system automatically detects the network bandwidth, adaptively switches the transmission mode, and can still interact with the remote end smoothly when the network is not ideal, which solves the problem of voice transmission under extremely low bandwidth and achieves the purpose of information interaction.
  • the self-built speech model is used for conversion, which improves the fidelity.
  • An embodiment of the present application also proposes a voice and text conversion transmission system, including: a sending end and a receiving end;
  • the sending end is used to detect whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth, and to detect whether a signal with the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth is received; If the current network transmission bandwidth belongs to the extremely low bandwidth and/or the signal with the second current network transmission bandwidth belonging to the extremely low bandwidth of the receiving end is received, the speech-to-text system is activated, and the communication via the speech-to-text system is sent to the receiving end The signal; the voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end;
  • the receiving end is used to detect whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and to detect whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received; If the current network transmission bandwidth belongs to the extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received, the text-to-speech system is activated, and the text-to-speech system communication is sent to the sending end
  • the signal receive the target text sent by the sender, recognize the target text, convert the target text into voice information, and play.
  • an embodiment of the present application also proposes a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as a guide plan library.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer readable instructions are executed by the processor to realize a voice and text conversion transmission method.
  • the above-mentioned processor executes the steps of the above-mentioned method:
  • the sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
  • the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system
  • the voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end.
  • the foregoing processor executes the steps of the foregoing method:
  • the receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
  • the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
  • Receive the target text from the sender recognize the target text, convert the target text into voice information, and play it.
  • An embodiment of the present application also proposes a computer-readable storage medium on which computer-readable instructions are stored.
  • a method for voice and text conversion and transmission is realized, including the steps:
  • the sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
  • the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system
  • the voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end.
  • the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
  • the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
  • the step of recognizing the voice information spoken by the user and converting it into target text includes:
  • the feature code is added to the text field in a preset manner to obtain the target text.
  • the method further includes:
  • Another embodiment of the present application also provides a computer-readable storage medium on which computer-readable instructions are stored.
  • a method for converting and transmitting voice to text is realized, including the steps:
  • the receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
  • the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
  • Receive the target text from the sender recognize the target text, convert the target text into voice information, and play it.
  • the step of the receiving end detecting whether the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth includes:
  • the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth.
  • the step of receiving the target text from the sending end, recognizing the target text, and converting the target text into voice information further includes:
  • the spectrum information and PCM code stream obtained by text conversion are exchanged with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present application provides a voice and text conversion transmission method and system, and a computer device and a storage medium. The method comprises: detecting whether a network transmission bandwidth belongs to an extremely low bandwidth or not; and if yes, starting a voice recognition system. A sending end identifies user voice information, converts the voice information into a target text with characteristic information, and sends the target text to a receiving end; and the receiving end receives the target text sent by the sending end, identifies the target text, converts the target text into the voice information, and plays the voice information.

Description

语音和文字转换传输方法、系统、计算机设备和存储介质Voice and text conversion and transmission method, system, computer equipment and storage medium
本申请要求于2019年5月30日提交中国专利局、申请号为201910465416.3,发明名称为“语音和文字转换传输方法、系统、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 30, 2019, the application number is 201910465416.3, and the invention title is "Speech and text conversion transmission methods, systems, computer equipment and storage media". The entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及通信技术领域,特别涉及一种语音和文字转换传输方法、系统、计算机设备和存储介质。This application relates to the field of communication technology, and in particular to a voice and text conversion transmission method, system, computer equipment and storage medium.
背景技术Background technique
目前音视频会议解决网络传输不好,低带宽的情况,通常采取降低视频和音频的码率的方法。然而,对于极低带宽下的场景不适用,这是因为音视频编码的码率最低值任然高于可用带宽。在带宽较低的情况下,无法传输音频信息或者传输的音频信息出现丢包现象,结果可能造成音视频的断续,达不到传输信息的目的。因此,亟需一种能在极低带宽下正常通信的方法。At present, audio and video conferences solve the problem of poor network transmission and low bandwidth, and usually adopt methods to reduce the bit rate of video and audio. However, it is not applicable to scenarios with very low bandwidth, because the lowest bit rate of audio and video encoding is still higher than the available bandwidth. In the case of low bandwidth, audio information cannot be transmitted or packet loss occurs in the transmitted audio information. As a result, audio and video may be interrupted and the purpose of information transmission may not be achieved. Therefore, there is an urgent need for a method that can communicate normally under extremely low bandwidth.
技术问题technical problem
本申请的主要目的为提供一种语音和文字转换传输方法、系统、计算机设备和存储介质,旨在解决在极低带宽下无法进行音频会议的问题。The main purpose of this application is to provide a voice and text conversion transmission method, system, computer equipment, and storage medium, aiming to solve the problem that audio conferences cannot be conducted under extremely low bandwidth.
技术解决方案Technical solutions
为实现上述目的,本申请提供了一种语音和文字转换传输方法,包括步骤:In order to achieve the above objective, this application provides a voice and text conversion transmission method, which includes the steps:
发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;The sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;If the first current network transmission bandwidth of the sending end belongs to the extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system;
通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端,其中所述目标文字包括特征码和文字段。The voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end, where the target text includes a feature code and a text field.
进一步地,所述发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽的步骤,包括:Further, the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
实时监测所述发送端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the sending end in real time, and compare the current network speed with the preset network speed;
若当前网络速度大于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the first current network transmission bandwidth of the sending end does not belong to extremely low bandwidth;
若当前网络速度小于等于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
进一步地,所述对用户所说的语音信息进行识别,并转化为目标文字的步骤,包括:Further, the step of recognizing the voice information spoken by the user and converting it into target text includes:
识别所述用户的语音信息;包括语义识别和声纹识别;Recognizing the voice information of the user; including semantic recognition and voiceprint recognition;
将所述语音信息转换成文字段,以及提取所述语音信息中的音频信息特征,生成特征码;所述音频信息特征包括声纹频谱和PCM码流,所述特征码为根据声纹生成的一串符号;The voice information is converted into text fields, and the audio information features in the voice information are extracted to generate a feature code; the audio information features include a voiceprint spectrum and a PCM code stream, and the feature code is generated according to the voiceprint A string of symbols
将特征码以预设方式加入所述文字段,得到所述目标文字。The feature code is added to the text field in a preset manner to obtain the target text.
进一步地,所述提取所述语音信息中的音频信息特征,生成特征码的步骤之后,还包括:Further, after the step of extracting audio information features in the voice information and generating a feature code, the method further includes:
将提取到的音频信息特征输入到预设的语音模型中,并以所生成的特征码命名所述语音模型;所述特征码作为调用语音模型的唯一识别标识;Input the extracted audio information features into a preset voice model, and name the voice model with the generated feature code; the feature code serves as a unique identification identifier for calling the voice model;
将所述语音模型发送给所述接收端。Sending the voice model to the receiving end.
本申请同时提出一种语音和文字转换传输方法,包括步骤:This application also proposes a voice and text conversion transmission method, including the steps:
接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;The receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;If the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received as a signal of extremely low bandwidth, the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。Receive the target text from the sender, recognize the target text, convert the target text into voice information, and play it.
进一步地,所述接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽的步骤,包括:Further, the step of the receiving end detecting whether the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth includes:
实时监测所述接收端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the receiving end in real time, and compare the current network speed with the preset network speed;
若当前网络速度大于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the second current network transmission bandwidth of the receiving end does not belong to extremely low bandwidth;
若当前网络速度小于等于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth.
进一步地,所述接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息的步骤,还包括:Further, the step of receiving the target text from the sending end, recognizing the target text, and converting the target text into voice information, further includes:
根据目标文字附带的特征信息提取文字段;Extract text fields based on the feature information attached to the target text;
将所述文字段中的文字转换成发音的音节,得到与音节对应的频谱信息和PCM码流;Converting the text in the text field into pronunciation syllables to obtain the spectrum information and PCM code stream corresponding to the syllables;
根据目标文字附带的特征信息查找本地语音库中对应用户的语音模型;Find the corresponding user's voice model in the local voice database according to the feature information attached to the target text;
将文字转换得到的频谱信息和PCM码流,与对应用户的语音模型中的频谱信息和PCM码流进行调换,得到所述用户与所述文字段对应的频谱信息和PCM码流。The spectrum information and PCM code stream obtained by text conversion are exchanged with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
本申请同时提出一种语音和文字转换传输系统,包括:发送端和接收端;This application also proposes a voice and text conversion transmission system, including: a sending end and a receiving end;
所述发送端用于检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;The sending end is used to detect whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and to detect whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;If the first current network transmission bandwidth of the sending end belongs to the extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system;
通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端;Recognizing the voice information spoken by the user through the voice-to-text system, converting it into target text, and sending the target text to the receiving end;
所述接收端用于检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;The receiving end is used to detect whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and to detect whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;If the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received as a signal of extremely low bandwidth, the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。Receive the target text from the sender, recognize the target text, convert the target text into voice information, and play it.
本申请同时提出一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现上述任一项所述方法的步骤。This application also proposes a computer device including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the steps of any one of the above methods when the computer-readable instructions are executed by the processor.
有益效果Beneficial effect
本申请同时提出一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述任一项所述的方法的步骤。This application also proposes a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of any one of the above methods are implemented.
本申请中提供的一种语音和文字转换传输系统、方法、计算机设备和存储介质,通过检测网络传输带宽是否属于极低带宽。若网络传输带宽属于极低带宽,则启动语音识别系统。发送端识别用户语音信息,将所述语音信息转化为附带特征信息的目标文字,并将所述目标文字发送给接收端,接收端接收发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。本申请通过系统自动检测网络带宽,自适应地切换传输模式,对于网络不理想的情况仍然可以流畅地同远端交互,解决了极低带宽下传输语音的问题,达到信息交互的目的。此外,在文字转换成语音时,利用自建语音模型进行转化,提升了逼真度。A voice and text conversion transmission system, method, computer equipment, and storage medium provided in this application detect whether the network transmission bandwidth is extremely low bandwidth. If the network transmission bandwidth is extremely low bandwidth, the voice recognition system is activated. The sending end recognizes the user's voice information, converts the voice information into target text with feature information, and sends the target text to the receiving end. The receiving end receives the target text sent by the sending end, recognizes the target text, and The target text is converted into voice information and played. In this application, the system automatically detects the network bandwidth, adaptively switches the transmission mode, and can still interact with the remote end smoothly when the network is not ideal, which solves the problem of voice transmission under extremely low bandwidth and achieves the purpose of information interaction. In addition, when text is converted into speech, the self-built speech model is used for conversion, which improves the fidelity.
附图说明Description of the drawings
图1是本申请一实施例中语音和文字转换传输方法步骤示意图;Figure 1 is a schematic diagram of the steps of a voice and text conversion transmission method in an embodiment of the present application;
图2是本申请一实施例中另一语音和文字转换传输方法步骤示意图;Figure 2 is a schematic diagram of the steps of another voice and text conversion transmission method in an embodiment of the present application;
图3是本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
本发明的最佳实施方式The best mode of the invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.
参照图1,本申请提出一种语音和文字转换传输方法,包括步骤:1, this application proposes a voice and text conversion transmission method, including the steps:
S1、发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;S1. The sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detecting whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
S2、若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;S2. If the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, start the speech-to-text system and send the The receiving end sends a signal communicated through the voice-to-text system;
S3、通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端,其中所述目标文字包括特征码和文字段。S3. Recognize the voice information spoken by the user through the voice-to-text system, convert it into target text, and send the target text to the receiving end, where the target text includes a feature code and a text field.
如上述步骤S1所述,由于网络传输受到用户电脑软硬件的配置、所浏览网站的地址、对端网站、对端服务器带宽等情况的影响,因此,用户上网时实际的速率通常低于理论速率值。上述网络传输带宽是指实际信号传输中的数据传输能力;极低带宽指的是低于正常通信带宽理论值的10%。比如正常通信时带宽速率为4M/S,理论值为512KB/S,而实际则为400KB/S左右,极低带宽则是指带宽速率为52KB/S以下。当网络传输带宽属于极低带宽时,数据传输不稳定,将导致丢包率上升。以致很多数据无法正常传输。As mentioned in step S1 above, because the network transmission is affected by the configuration of the user’s computer software and hardware, the address of the website being browsed, the peer website, the bandwidth of the peer server, etc., the actual rate when the user goes online is usually lower than the theoretical rate value. The above-mentioned network transmission bandwidth refers to the data transmission capacity in actual signal transmission; extremely low bandwidth refers to 10% lower than the theoretical value of the normal communication bandwidth. For example, in normal communication, the bandwidth rate is 4M/S, the theoretical value is 512KB/S, but the actual value is about 400KB/S, and the extremely low bandwidth means that the bandwidth rate is below 52KB/S. When the network transmission bandwidth is extremely low, the data transmission is unstable, which will cause the packet loss rate to increase. As a result, a lot of data cannot be transmitted normally.
如上述步骤S2所述,在确定当前网络属于极低带宽之后,则启动语音转文字系统。由于在极低带宽的状态下,网络速度被限制,视频、音频传输极可能发生丢包的情况,而语音识别系统的功能就是在极低带宽的状态下,保证用于通讯的信息仍然能够正常传输。因此需启动语音转文字系统客户端作为发送端。上述并向所述接收端发送通过语音转文字系统通信的信号即是提示或控制接收端启动安装于接收端一端的文字转语音系统客户端进行通信。As described in step S2 above, after determining that the current network belongs to a very low bandwidth, the speech-to-text system is activated. Since the network speed is limited in the state of extremely low bandwidth, packet loss is very likely to occur in video and audio transmission, and the function of the speech recognition system is to ensure that the information used for communication is still normal in the state of extremely low bandwidth. transmission. Therefore, the client of the voice-to-text system needs to be activated as the sender. The above and sending the signal to the receiving end to communicate through the voice-to-text system is to prompt or control the receiving end to start the text-to-speech system client installed at one end of the receiving end to communicate.
如上述步骤S3所述,上述发送端指的是发出目标文字的终端,该终端可以是PC、笔记本电脑、平板电脑等可连接网络的智能终端设备。在本实施例中,带宽分为上行带宽和下行带宽。理论上上行带宽和下行带宽不会有影响,但是IP协议传输是要双向交互的,实质是有一些影响的。而极低带宽又不利于数据传输,因此,在发送端将所述目标文字发送给接收端时,为了提高数据传输的效率,可以在发送目标文字前将下行带宽限制为一极小值,发送完成后再还原。可以达到提高数据传输效率的目的。相应的,本申请则通过接收端接收目标文字。在发送端和接收端上均安装有相应的客户端。接收端还通过文字转语音系统的客户端实现对目标文字的识别,以及将目标文字转化为语音信息,并播放。As mentioned in step S3, the sending end refers to a terminal that sends out the target text, and the terminal may be a PC, a notebook computer, a tablet computer, and other intelligent terminal devices that can be connected to the network. In this embodiment, bandwidth is divided into uplink bandwidth and downlink bandwidth. Theoretically, the upstream bandwidth and downstream bandwidth will not have an impact, but IP protocol transmission requires two-way interaction, which actually has some impact. The extremely low bandwidth is not conducive to data transmission. Therefore, when the sender sends the target text to the receiver, in order to improve the efficiency of data transmission, the downlink bandwidth can be limited to a minimum before sending the target text. Restore after completion. Can achieve the purpose of improving the efficiency of data transmission. Correspondingly, this application receives the target text through the receiving end. Corresponding clients are installed on the sender and receiver. The receiving end also realizes the recognition of the target text through the client of the text-to-speech system, and converts the target text into voice information and plays it.
在一个实施例中,所述发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽的步骤,包括:In one embodiment, the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
S11、实时监测所述发送端的当前网络速度,并将当前网络速度与预设网络速度对比;S11. Monitor the current network speed of the sending end in real time, and compare the current network speed with a preset network speed;
S12、若当前网络速度大于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽不属于极低带宽;S12. If the current network speed is greater than 10% of the preset network speed, determine that the first current network transmission bandwidth of the sending end does not belong to the extremely low bandwidth;
S13、若当前网络速度小于等于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽属于极低带宽。S13. If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
在步骤S11-步骤S13中,简单来说,网络传输带宽使用的单位是bit(位),网络速度使用的单位是Byte(字节),而这两者之间的关系是1Byte=8bit。因此,上述网络传输带宽与网络速度呈正比关系,又由于检测网络速度比检测网络传输带宽方便得多,因此在本实施例中,通过检测网络速度达到检测网络传输带宽的目的。上述预设网络速度即是正常通信中实际接入的网络速度的理论值。检测出网络速度在预设网络速度中的占比即可知道网络传输带宽是否属于极低带宽。In step S11-step S13, in simple terms, the unit of network transmission bandwidth is bit (bit), and the unit of network speed is Byte (byte), and the relationship between the two is 1Byte=8bit. Therefore, the above-mentioned network transmission bandwidth is directly proportional to the network speed, and since detecting the network speed is much more convenient than detecting the network transmission bandwidth, in this embodiment, the purpose of detecting the network transmission bandwidth is achieved by detecting the network speed. The aforementioned preset network speed is the theoretical value of the network speed actually accessed in normal communication. By detecting the proportion of the network speed in the preset network speed, you can know whether the network transmission bandwidth belongs to the extremely low bandwidth.
在一个实施例中,所述通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字的步骤S3,包括:In one embodiment, the step S3 of recognizing the voice information spoken by the user through the voice-to-text system and converting it into target text includes:
S31、识别所述用户的语音信息;包括语义识别和声纹识别;S31. Recognizing the voice information of the user; including semantic recognition and voiceprint recognition;
S32、将所述语音信息转换成文字段,以及提取所述语音信息中的音频信息特征,生成特征码;所述音频信息特征包括声纹频谱和PCM码流,所述特征码为根据声纹生成的一串符号;S32. Convert the voice information into a text field, and extract audio information features in the voice information to generate a feature code; the audio information feature includes a voiceprint spectrum and a PCM code stream, and the feature code is based on the voiceprint A string of symbols generated;
S33、将特征码以预设方式加入所述文字段,得到所述目标文字。S33. Add a feature code to the text field in a preset manner to obtain the target text.
在步骤S31中,上述语音信息指的是用户所说的话,上述文字段指的是在连续的时间内,识别由同一用户所说的话,并生成的文字。本步骤的目的是识别用户所说的话,并将所识别到用户说的话中的内容转换为一段文字。In step S31, the aforementioned voice information refers to the words spoken by the user, and the aforementioned text field refers to the text generated by recognizing the words spoken by the same user in a continuous period of time. The purpose of this step is to recognize what the user said and convert the content of the recognized user’s words into a paragraph of text.
在步骤S32至S33中,上述音频信息特征指的是识别用户所说的话,生成的录音文件中用户的声纹频谱的信息和PCM码流。上述特征码指的由用户声纹特征生成的字符串,由于用户的声纹特征具有唯一性,所以生成的字符串相应的也具有唯一性,可以作为一种身份标识信息,用于提取对应说话人的语音模型,保证不会出现错误。另外,在字符串的识别中,加入字符串开始至字符串结束的特殊信息(例如##特征码##文字段)。则在语音识别系统识别文字段时会自动提取出特征码,特征码不会对文字段的识别造成影响。对于上述目标文字,可以进一步地将多个目标文字一起打包压缩,既方便发送,进一步又可以减少节省空间。把多个目标文字打包压缩一次性发送,可以在传输数据时防止出现数据丢失的现象。In steps S32 to S33, the above-mentioned audio information feature refers to the information of the user's voiceprint spectrum and the PCM code stream in the generated recording file to identify what the user said. The above feature code refers to the character string generated by the user's voiceprint feature. Because the user's voiceprint feature is unique, the generated character string is also unique, and can be used as a kind of identification information to extract the corresponding speech The human voice model is guaranteed to be error-free. In addition, in the identification of the character string, special information from the beginning of the character string to the end of the character string is added (for example, ## feature code## text field). When the speech recognition system recognizes the text field, it will automatically extract the feature code, and the feature code will not affect the recognition of the text field. For the above-mentioned target text, multiple target texts can be further packaged and compressed together, which is convenient for sending and further reduces space saving. Packing and compressing multiple target texts at one time can prevent data loss when transmitting data.
在一个实施例中,所述提取所述语音信息中的音频信息特征,生成特征码的步骤S32之后,还包括:In an embodiment, after the step S32 of extracting audio information features in the voice information and generating a feature code, the method further includes:
S3201、将提取到的音频信息特征输入到预设的语音模型中,并以所生成的特征码命名所述语音模型;所述特征码作为调用语音模型的唯一识别标识;S3201. Input the extracted audio information features into a preset voice model, and name the voice model with the generated feature code; the feature code serves as a unique identification identifier for calling the voice model;
S3202、将所述语音模型发送给所述接收端。S3202. Send the voice model to the receiving end.
在步骤S3201至S3202中,上述将提取到的音频信息特征输入到预设语音模型中指的是,由于每一个字的发音都是由音节组成,预设的语音模型中就是记录同一用户说话的所有音节的音频信息特征,将同一用户说话的所有音节的音频信息特征从用户的录音文件中提取出来,再输入预设语音模型中,得到的语音模型中具有该用户发音的所有音节特征。通过步骤S3202将所述语音模型发送给所述接收端,进一步地,在接收端有了该用户的语音模型,就可以通过音节特征就可以合成对应音节的发音的频率特性,将这些频率点转换到PCM信号(通过傅里叶逆变换)就可以合成出具有用户声音特性的,个性化的语音进行语言仿真了。In steps S3201 to S3202, inputting the extracted audio information features into the preset voice model means that since the pronunciation of each word is composed of syllables, the preset voice model is to record all the voices of the same user. The audio information feature of the syllable is extracted from the audio information feature of all syllables spoken by the same user from the user's recording file, and then input into the preset voice model, and the obtained voice model has all the syllable features of the user's pronunciation. The voice model is sent to the receiving end through step S3202. Furthermore, with the user's voice model at the receiving end, the frequency characteristics of the pronunciation of the corresponding syllable can be synthesized through the syllable characteristics, and these frequency points can be converted The PCM signal (through the inverse Fourier transform) can synthesize a personalized voice with user voice characteristics for language simulation.
参照图2,本申请同时提出一种语音和文字转换传输方法,包括步骤:Referring to Figure 2, this application also proposes a voice and text conversion transmission method, including the steps:
S10、接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;S10. The receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
S20、若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;S20. If the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth and/or a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received, start a text-to-speech system and send a message to the The sender sends the signal communicated through the text-to-speech system;
S30、接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。S30. Receive the target text sent by the sender, recognize the target text, convert the target text into voice information, and play it.
如上述步骤S10所述,由于网络传输受到用户电脑软硬件的配置、所浏览网站的地址、对端网站、对端服务器带宽等情况的影响,因此,用户上网时实际的速率通常低于理论速率值。上述网络传输带宽是指实际信号传输中的数据传输能力;极低带宽指的是低于正常通信带宽理论值的10%。比如正常通信时带宽速率为4M/S,理论值为512KB/S,而实际则为400KB/S左右,极低带宽则是指带宽速率为52KB/S以下。当网络传输带宽属于极低带宽时,数据传输不稳定,将导致丢包率上升。以致很多数据无法正常传输。As mentioned in step S10 above, since network transmission is affected by the configuration of the user’s computer software and hardware, the address of the website being browsed, the peer website, the bandwidth of the peer server, etc., the actual rate when the user goes online is usually lower than the theoretical rate value. The above-mentioned network transmission bandwidth refers to the data transmission capacity in actual signal transmission; extremely low bandwidth refers to 10% lower than the theoretical value of the normal communication bandwidth. For example, in normal communication, the bandwidth rate is 4M/S, the theoretical value is 512KB/S, but the actual value is about 400KB/S, and the extremely low bandwidth means that the bandwidth rate is below 52KB/S. When the network transmission bandwidth is extremely low, the data transmission is unstable, which will cause the packet loss rate to increase. As a result, a lot of data cannot be transmitted normally.
如上述步骤S20所述,在确定当前网络属于极低带宽之后,则启动文字转语音系统。由于在极低带宽的状态下,网络速度被限制,视频、音频传输极可能发生丢包的情况,而文字转语音系统的功能就是在极低带宽的状态下,保证用于通讯的信息仍然能够正常传输。因此需启动文字转语音系统客户端作为接收端。上述并向所述发送端发送通过文字转语音系统通信的信号即是提示或控制发送端启动安装于发送端一端的语音转文字系统客户端进行通信。As described in step S20 above, after it is determined that the current network belongs to a very low bandwidth, the text-to-speech system is activated. As the network speed is limited in the state of extremely low bandwidth, packet loss may occur in video and audio transmission. The function of the text-to-speech system is to ensure that the information used for communication can still be used in the state of extremely low bandwidth. Normal transmission. Therefore, the client of the text-to-speech system needs to be activated as the receiver. The above and sending the signal for communication through the text-to-speech system to the sending end is to prompt or control the sending end to start the voice-to-text system client installed at the sending end to communicate.
如上述步骤S30所述,上述发送端指的是发出目标文字的终端,该终端可以是PC、笔记本电脑、平板电脑等可连接网络的智能终端设备。理论上上行带宽和下行带宽不会有影响,但是IP协议传输是要双向交互的,实质是有一些影响的。而极低带宽又不利于数据传输,因此,在接收端接收发送端发来的目标文字时,为了提高数据传输的效率,可以在接收目标文字时将上行带宽限制为一极小值,接收完成后再还原。可以达到提高数据传输效率的目的。相应的,本申请则通过发送端发送目标文字。在发送端和接收端上均安装有相应的客户端。发送端还通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端。As mentioned in the above step S30, the above sending end refers to a terminal that sends out the target text, and the terminal may be a PC, a notebook computer, a tablet computer, and other intelligent terminal devices that can be connected to the network. Theoretically, the upstream bandwidth and downstream bandwidth will not have an impact, but IP protocol transmission requires two-way interaction, which actually has some impact. The extremely low bandwidth is not conducive to data transmission. Therefore, when the receiving end receives the target text from the sending end, in order to improve the efficiency of data transmission, the uplink bandwidth can be limited to a minimum when receiving the target text, and the reception is completed Restore later. Can achieve the purpose of improving the efficiency of data transmission. Correspondingly, this application sends the target text through the sender. Corresponding clients are installed on the sender and receiver. The sending end also recognizes the voice information spoken by the user through the speech-to-text system, converts it into target text, and sends the target text to the receiving end.
在一个实施例中,所述发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽的步骤S10,包括:In an embodiment, the step S10 of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
S101、实时监测所述发送端的当前网络速度,并将当前网络速度与预设网络速度对比;S101: Monitor the current network speed of the sending end in real time, and compare the current network speed with a preset network speed;
S102、若当前网络速度大于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽不属于极低带宽;S102: If the current network speed is greater than 10% of the preset network speed, determine that the first current network transmission bandwidth of the sending end does not belong to extremely low bandwidth;
S103、若当前网络速度小于等于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽属于极低带宽。S103: If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
在步骤S101-步骤S103中,简单来说,网络传输带宽使用的单位是bit(位),网络速度使用的单位是Byte(字节),而这两者之间的关系是1Byte=8bit。因此,上述网络传输带宽与网络速度呈正比关系,又由于检测网络速度比检测网络传输带宽方便得多,因此在本实施例中,通过检测网络速度达到检测网络传输带宽的目的。上述预设网络速度即是正常通信中实际接入的网络速度的理论值。检测出网络速度在预设网络速度中的占比即可知道网络传输带宽是否属于极低带宽。In step S101-step S103, in simple terms, the unit of network transmission bandwidth is bit (bit), the unit of network speed is Byte (byte), and the relationship between the two is 1Byte=8bit. Therefore, the above-mentioned network transmission bandwidth is directly proportional to the network speed, and since detecting the network speed is much more convenient than detecting the network transmission bandwidth, in this embodiment, the purpose of detecting the network transmission bandwidth is achieved by detecting the network speed. The aforementioned preset network speed is the theoretical value of the network speed actually accessed in normal communication. By detecting the proportion of the network speed in the preset network speed, you can know whether the network transmission bandwidth belongs to the extremely low bandwidth.
在一个实施例中,接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息的步骤S30,还包括:In one embodiment, the step S30 of receiving the target text from the sending end, recognizing the target text, and converting the target text into voice information further includes:
S301、根据目标文字附带的特征信息提取文字段;S301: Extract text fields according to feature information attached to the target text;
S302、将所述文字段中的文字转换成发音的音节,得到与音节对应的频谱信息和PCM码流;S302: Convert the text in the text field into pronunciation syllables, and obtain spectrum information and PCM code streams corresponding to the syllables;
S303、根据目标文字附带的特征信息查找本地语音库中对应用户的语音模型;S303: Search for a voice model corresponding to the user in the local voice database according to the feature information attached to the target text;
S304、将文字转换得到的频谱信息和PCM码流,与对应用户的语音模型中的频谱信息和PCM码流进行调换,得到所述用户与所述文字段对应的频谱信息和PCM码流。S304: Convert the spectrum information and PCM code stream obtained by text conversion with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
在步骤S301中,上述目标文字是由发送端将用户所说的话转换而来的,当目标文字包含多个用户所说的话时,此时特征信息则可将目标文字分为多段,每个文字段均包含对应用户的特征信息,即目标文字由多个文字段组成,每个文字段均包含有特征信息。代表这一文字段是由一个特定用户所说的话转换来的。例如,根据特征信息分析出,目标文字包含A特征、B特征、A特征、C特征;由此得到目标文字是由A用户说的一段话,B用户说的一段话,A用户说的一段话以及C用户说的一段话转换而来的。In step S301, the above-mentioned target text is converted from the words spoken by the user by the sending end. When the target text contains words spoken by multiple users, the feature information can divide the target text into multiple paragraphs. The segments all contain feature information of the corresponding user, that is, the target text is composed of multiple text fields, and each text field contains feature information. It means that the text field is converted from what a specific user said. For example, according to the analysis of the feature information, the target text contains A feature, B feature, A feature, and C feature; from this, the target text is a paragraph spoken by user A, a paragraph spoken by user B, and a paragraph spoken by user A And the conversion of a passage from the C user.
在步骤S302中,上述将文字段中的文字转换成发音的音节,得到音频信息,音频信息中包含音节对应的频谱信息和PCM码流。In step S302, the text in the text field is converted into pronunciation syllables to obtain audio information, and the audio information includes the frequency spectrum information and the PCM code stream corresponding to the syllables.
在步骤S303中,上述目标文字附带的特征信息除了用于提取文字段,还用于查找语音模型。其过程是将目标文字附带的特征信息与语音库中语音模型包含的用户特征进行比对,匹配成功则说明该文字段为语音模型所对应的用户所说的话。In step S303, in addition to extracting text fields, the feature information attached to the target text is also used to find a voice model. The process is to compare the feature information attached to the target text with the user features contained in the voice model in the voice database. If the matching is successful, it indicates that the text field is the words spoken by the user corresponding to the voice model.
在步骤S304中,上述对频谱信息和PCM码流进行调整指的是,将用户的语音模型中特征频谱段和PCM码流,替换由文字转换得到的频谱信息和PCM码流的对应部分,即对应音节相替换。得到接近真实的用户所说的话的音频信息。播放该音频信息听到的声音,接近用户的原话。In step S304, the above-mentioned adjustment of the spectrum information and the PCM code stream refers to replacing the characteristic spectrum segment and the PCM code stream in the user's voice model with the corresponding part of the spectrum information and the PCM code stream obtained by text conversion, namely Corresponding syllable phase replacement. Obtain audio information close to what the real user said. The sound heard by playing the audio information is close to the user's original words.
上述特征信息的具体生成过程可以概括为:提取说话人的音频信息特征,比如音频信号PCM码流,声音的频谱特性,然后将此信息进行长时间的归纳,统计。上述频谱特性指的是:语音的PCM信号经过傅里叶变换,转换到频域:每个频点的值代表了该频率的大小。声音是由许多不同频率的正弦波组成的,频率特性就是指每个频率的正弦波的大小。具体过程是,将话音等模拟信号每隔一定时间进行取样,使其离散化,同时将抽样值按分层单位四舍五入取整量化,同时将抽样值按一组二进制码来表示抽样脉冲的幅值。用户的语音特征从频率特性可以提取出来。根据得到的频率特性可以提取用户的语音特征:例如取每个频率所对应的能量的值,或者所有频点能量的平均值,方差等。对用户的语音PCM信号切分成一块块小的音节,比如a,u,e,i,u,yu等将这些音节的特征提取出来传到另外一端接收端,在接收端建立相应的模型。接收端利用收到的文字,结合模型的音节特征就可以合成对应的音节的发音的频率特性,将这些频率点转换到PCM信号(通过傅里叶逆变换)就可以合成出具有用户声音特性的,个性化的语音了。The specific generation process of the above-mentioned feature information can be summarized as: extracting the speaker's audio information features, such as the audio signal PCM code stream, and the spectral characteristics of the sound, and then summarizing and counting this information for a long time. The above-mentioned spectral characteristics refer to: the PCM signal of speech is transformed into the frequency domain after Fourier transform: the value of each frequency point represents the magnitude of the frequency. The sound is composed of many sine waves of different frequencies, and the frequency characteristic refers to the size of the sine wave of each frequency. The specific process is to sample the analog signal such as voice at regular intervals to discretize it, and at the same time round the sampled value according to the stratification unit, and at the same time use a set of binary codes to represent the amplitude of the sampled pulse . The user's voice characteristics can be extracted from the frequency characteristics. According to the obtained frequency characteristics, the user's voice characteristics can be extracted: for example, the value of the energy corresponding to each frequency, or the average value and variance of the energy of all frequency points, etc. can be taken. Divide the user's voice PCM signal into small syllables, such as a, u, e, i, u, yu, etc., extract the characteristics of these syllables and transmit them to the receiving end of the other end, and establish a corresponding model on the receiving end. The receiving end can synthesize the frequency characteristics of the corresponding syllable pronunciation by using the received text and combining the syllable characteristics of the model, and convert these frequency points to the PCM signal (through inverse Fourier transform) to synthesize a voice characteristic of the user , A personalized voice.
本申请中提出的一种语音和文字转换传输方法、系统、计算机设备和存储介质,通过检测网络传输带宽是否属于极低带宽。若网络传输带宽属于极低带宽,则启动语音识别系统。发送端识别用户语音信息,将所述语音信息转化为附带特征信息的目标文字,并将所述目标文字发送给接收端,接收端接收发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。本申请通过系统自动检测网络带宽,自适应地切换传输模式,对于网络不理想的情况仍然可以流畅地同远端交互,解决了极低带宽下传输语音的问题,达到信息交互的目的。此外,在文字转换成语音时,利用自建语音模型进行转化,提升了逼真度。The voice and text conversion transmission method, system, computer equipment and storage medium proposed in this application detect whether the network transmission bandwidth belongs to extremely low bandwidth. If the network transmission bandwidth is extremely low bandwidth, the voice recognition system is activated. The sending end recognizes the user's voice information, converts the voice information into target text with feature information, and sends the target text to the receiving end. The receiving end receives the target text sent by the sending end, recognizes the target text, and The target text is converted into voice information and played. In this application, the system automatically detects the network bandwidth, adaptively switches the transmission mode, and can still interact with the remote end smoothly when the network is not ideal, which solves the problem of voice transmission under extremely low bandwidth and achieves the purpose of information interaction. In addition, when text is converted into speech, the self-built speech model is used for conversion, which improves the fidelity.
本申请一实施例还提出一种语音和文字转换传输系统,包括:发送端和接收端;An embodiment of the present application also proposes a voice and text conversion transmission system, including: a sending end and a receiving end;
所述发送端用于检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端;The sending end is used to detect whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth, and to detect whether a signal with the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth is received; If the current network transmission bandwidth belongs to the extremely low bandwidth and/or the signal with the second current network transmission bandwidth belonging to the extremely low bandwidth of the receiving end is received, the speech-to-text system is activated, and the communication via the speech-to-text system is sent to the receiving end The signal; the voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end;
所述接收端用于检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。The receiving end is used to detect whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and to detect whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received; If the current network transmission bandwidth belongs to the extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received, the text-to-speech system is activated, and the text-to-speech system communication is sent to the sending end The signal; receive the target text sent by the sender, recognize the target text, convert the target text into voice information, and play.
参照图3,本申请实施例中还提出一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储指导方案库等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种语音和文字转换传输方法。Referring to FIG. 3, an embodiment of the present application also proposes a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store data such as a guide plan library. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer readable instructions are executed by the processor to realize a voice and text conversion transmission method.
上述处理器执行上述方法的步骤:The above-mentioned processor executes the steps of the above-mentioned method:
发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;The sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;If the first current network transmission bandwidth of the sending end belongs to the extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system;
通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端。The voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end.
在另一实施例中,上述处理器执行上述方法的步骤:In another embodiment, the foregoing processor executes the steps of the foregoing method:
接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;The receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;If the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received as a signal of extremely low bandwidth, the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。Receive the target text from the sender, recognize the target text, convert the target text into voice information, and play it.
本申请一实施例还提出一种计算机可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时实现一种语音和文字转换传输方法,包括步骤:An embodiment of the present application also proposes a computer-readable storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor, a method for voice and text conversion and transmission is realized, including the steps:
发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;The sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;If the first current network transmission bandwidth of the sending end belongs to the extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system;
通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端。The voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end.
在一个实施例中,所述发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽的步骤,包括:In one embodiment, the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth includes:
实时监测所述发送端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the sending end in real time, and compare the current network speed with the preset network speed;
若当前网络速度大于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the first current network transmission bandwidth of the sending end does not belong to extremely low bandwidth;
若当前网络速度小于等于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
在一个实施例中,所述对用户所说的语音信息进行识别,并转化为目标文字的步骤,包括:In an embodiment, the step of recognizing the voice information spoken by the user and converting it into target text includes:
识别所述用户的语音信息;Identifying the voice information of the user;
将所述语音信息转换成文字段,以及提取所述语音信息中的音频信息特征,生成特征码;Converting the voice information into text fields, and extracting audio information features in the voice information to generate a feature code;
将特征码以预设方式加入所述文字段,得到所述目标文字。The feature code is added to the text field in a preset manner to obtain the target text.
在一个实施例中,所述提取所述语音信息中的音频信息特征,生成特征码的步骤之后,还包括:In an embodiment, after the step of extracting audio information features in the voice information and generating a feature code, the method further includes:
将提取到的音频信息特征输入到预设的语音模型中,并以所生成的特征码命名所述语音模型;Input the extracted audio information features into a preset voice model, and name the voice model with the generated feature code;
将所述语音模型发送给所述接收端。Sending the voice model to the receiving end.
本申请另一实施例还提出一种计算机可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时实现一种语音和文字转换传输方法,包括步骤:Another embodiment of the present application also provides a computer-readable storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor, a method for converting and transmitting voice to text is realized, including the steps:
接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;The receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;If the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received as a signal of extremely low bandwidth, the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。Receive the target text from the sender, recognize the target text, convert the target text into voice information, and play it.
在一个实施例中,所述接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽的步骤,包括:In one embodiment, the step of the receiving end detecting whether the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth includes:
实时监测所述接收端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the receiving end in real time, and compare the current network speed with the preset network speed;
若当前网络速度大于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the second current network transmission bandwidth of the receiving end does not belong to extremely low bandwidth;
若当前网络速度小于等于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth.
在一个实施例中,所述接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息的步骤,还包括:In an embodiment, the step of receiving the target text from the sending end, recognizing the target text, and converting the target text into voice information further includes:
根据目标文字附带的特征信息提取文字段;Extract text fields based on the feature information attached to the target text;
将所述文字段中的文字转换成发音的音节,得到与音节对应的频谱信息和PCM码流;Converting the text in the text field into pronunciation syllables to obtain the spectrum information and PCM code stream corresponding to the syllables;
根据目标文字附带的特征信息查找本地语音库中对应用户的语音模型;Find the corresponding user's voice model in the local voice database according to the feature information attached to the target text;
将文字转换得到的频谱信息和PCM码流,与对应用户的语音模型中的频谱信息和PCM码流进行调换,得到所述用户与所述文字段对应的频谱信息和PCM码流。The spectrum information and PCM code stream obtained by text conversion are exchanged with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the specification and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种语音和文字转换传输方法,其特征在于,包括步骤:A voice and text conversion transmission method, characterized in that it comprises the steps:
    发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;The sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
    若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;If the first current network transmission bandwidth of the sending end belongs to the extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system;
    通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端,其中所述目标文字包括特征码和文字段。The voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end, where the target text includes a feature code and a text field.
  2. 根据权利要求1所述的语音和文字转换传输方法,其特征在于,所述发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽的步骤,包括:The voice and text conversion transmission method according to claim 1, wherein the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to extremely low bandwidth comprises:
    实时监测所述发送端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the sending end in real time, and compare the current network speed with the preset network speed;
    若当前网络速度大于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the first current network transmission bandwidth of the sending end does not belong to extremely low bandwidth;
    若当前网络速度小于等于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
  3. 根据权利要求1所述的语音和文字转换传输方法,其特征在于,所述对用户所说的语音信息进行识别,并转化为目标文字的步骤,包括:The voice and text conversion transmission method according to claim 1, wherein the step of recognizing the voice information spoken by the user and converting it into target text comprises:
    识别所述用户的语音信息;包括语义识别和声纹识别;Recognizing the voice information of the user; including semantic recognition and voiceprint recognition;
    将所述语音信息转换成文字段,以及提取所述语音信息中的音频信息特征,生成特征码;所述音频信息特征包括声纹频谱和PCM码流,所述特征码为根据声纹生成的一串符号;The voice information is converted into text fields, and the audio information features in the voice information are extracted to generate a feature code; the audio information features include a voiceprint spectrum and a PCM code stream, and the feature code is generated according to the voiceprint A string of symbols
    将特征码以预设方式加入所述文字段,得到所述目标文字。The feature code is added to the text field in a preset manner to obtain the target text.
  4. 根据权利要求3所述的语音和文字转换传输方法,其特征在于,所述提取所述语音信息中的音频信息特征,生成特征码的步骤之后,还包括:The voice and text conversion transmission method according to claim 3, characterized in that, after the step of extracting the audio information feature in the voice information and generating a feature code, it further comprises:
    将提取到的音频信息特征输入到预设的语音模型中,并以所生成的特征码命名所述语音模型;所述特征码作为调用语音模型的唯一识别标识;Input the extracted audio information features into a preset voice model, and name the voice model with the generated feature code; the feature code serves as a unique identification identifier for calling the voice model;
    将所述语音模型发送给所述接收端。Sending the voice model to the receiving end.
  5. 一种语音和文字转换传输方法,其特征在于,包括步骤:A voice and text conversion transmission method, characterized in that it comprises the steps:
    接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;The receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
    若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;If the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received as a signal of extremely low bandwidth, the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
    接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。Receive the target text from the sender, recognize the target text, convert the target text into voice information, and play it.
  6. 根据权利要求5所述的语音和文字转换传输方法,其特征在于,所述接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽的步骤,包括:The voice and text conversion transmission method according to claim 5, wherein the step of the receiving end detecting whether the second current network transmission bandwidth of the receiving end belongs to a very low bandwidth comprises:
    实时监测所述接收端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the receiving end in real time, and compare the current network speed with the preset network speed;
    若当前网络速度大于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the second current network transmission bandwidth of the receiving end does not belong to extremely low bandwidth;
    若当前网络速度小于等于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth.
  7. 根据权利要求5所述的语音和文字转换传输方法,其特征在于,所述接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息的步骤,还包括:The voice and text conversion transmission method according to claim 5, wherein the step of receiving the target text sent by the sending end, recognizing the target text, and converting the target text into voice information, further include:
    根据目标文字附带的特征信息提取文字段;Extract text fields based on the feature information attached to the target text;
    将所述文字段中的文字转换成发音的音节,得到与音节对应的频谱信息和PCM码流;Converting the text in the text field into pronunciation syllables to obtain the spectrum information and PCM code stream corresponding to the syllables;
    根据目标文字附带的特征信息查找本地语音库中对应用户的语音模型;Find the corresponding user's voice model in the local voice database according to the feature information attached to the target text;
    将文字转换得到的频谱信息和PCM码流,与对应用户的语音模型中的频谱信息和PCM码流进行调换,得到所述用户与所述文字段对应的频谱信息和PCM码流。The spectrum information and PCM code stream obtained by text conversion are exchanged with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
  8. 一种语音和文字转换传输系统,其特征在于,包括:发送端和接收端;A voice and text conversion transmission system, which is characterized in that it comprises: a sending end and a receiving end;
    所述发送端用于检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;The sending end is used to detect whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and to detect whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
    若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;If the first current network transmission bandwidth of the sending end belongs to the extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system;
    通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端;Recognizing the voice information spoken by the user through the voice-to-text system, converting it into target text, and sending the target text to the receiving end;
    所述接收端用于检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;The receiving end is used to detect whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and to detect whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
    若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;If the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received as a signal of extremely low bandwidth, the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
    接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。Receive the target text from the sender, recognize the target text, convert the target text into voice information, and play it.
  9. 根据权利要求8所述语音和文字转换传输系统,其特征在于,所述发送端还用于:The voice and text conversion transmission system according to claim 8, wherein the sending end is further used for:
    实时监测所述发送端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the sending end in real time, and compare the current network speed with the preset network speed;
    若当前网络速度大于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the first current network transmission bandwidth of the sending end does not belong to extremely low bandwidth;
    若当前网络速度小于等于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
  10. 根据权利要求8所述语音和文字转换传输系统,其特征在于,所述发送端还用于:The voice and text conversion transmission system according to claim 8, wherein the sending end is further used for:
    识别所述用户的语音信息;包括语义识别和声纹识别;Recognizing the voice information of the user; including semantic recognition and voiceprint recognition;
    将所述语音信息转换成文字段,以及提取所述语音信息中的音频信息特征,生成特征码;所述音频信息特征包括声纹频谱和PCM码流,所述特征码为根据声纹生成的一串符号;The voice information is converted into text fields, and the audio information features in the voice information are extracted to generate a feature code; the audio information features include a voiceprint spectrum and a PCM code stream, and the feature code is generated according to the voiceprint A string of symbols
    将特征码以预设方式加入所述文字段,得到所述目标文字。The feature code is added to the text field in a preset manner to obtain the target text.
  11. 根据权利要求8所述语音和文字转换传输系统,其特征在于,所述发送端还用于:The voice and text conversion transmission system according to claim 8, wherein the sending end is further used for:
    将提取到的音频信息特征输入到预设的语音模型中,并以所生成的特征码命名所述语音模型;所述特征码作为调用语音模型的唯一识别标识;Input the extracted audio information features into a preset voice model, and name the voice model with the generated feature code; the feature code serves as a unique identification identifier for calling the voice model;
    将所述语音模型发送给所述接收端。Sending the voice model to the receiving end.
  12. 根据权利要求8所述语音和文字转换传输系统,其特征在于,所述接收端还用于:The voice and text conversion transmission system according to claim 8, wherein the receiving end is further used for:
    根据目标文字附带的特征信息提取文字段;Extract text fields based on the feature information attached to the target text;
    将所述文字段中的文字转换成发音的音节,得到与音节对应的频谱信息和PCM码流;Converting the text in the text field into pronunciation syllables to obtain the spectrum information and PCM code stream corresponding to the syllables;
    根据目标文字附带的特征信息查找本地语音库中对应用户的语音模型;Find the corresponding user's voice model in the local voice database according to the feature information attached to the target text;
    将文字转换得到的频谱信息和PCM码流,与对应用户的语音模型中的频谱信息和PCM码流进行调换,得到所述用户与所述文字段对应的频谱信息和PCM码流。The spectrum information and PCM code stream obtained by text conversion are exchanged with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
  13. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现语音和文字转换传输方法的步骤:A computer device includes a memory and a processor, the memory stores computer readable instructions, and is characterized in that, when the processor executes the computer readable instructions, the steps of the voice and text conversion transmission method are realized:
    发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽,以及检测是否接收到接收端的第二当前网络传输带宽属于极低带宽的信号;The sending end detects whether the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth, and detects whether a signal whose second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth is received;
    若所述发送端的第一当前网络传输带宽属于极低带宽和/或接收到所述接收端的第二当前网络传输带宽属于极低带宽的信号,则启动语音转文字系统,并向所述接收端发送通过语音转文字系统通信的信号;If the first current network transmission bandwidth of the sending end belongs to the extremely low bandwidth and/or the second current network transmission bandwidth of the receiving end is received as a signal of extremely low bandwidth, the speech-to-text system is activated and sent to the receiving end Send a signal for communication through a voice-to-text system;
    通过语音转文字系统对用户所说的语音信息进行识别,并转化为目标文字,将所述目标文字发送给所述接收端,其中所述目标文字包括特征码和文字段。The voice information spoken by the user is recognized through the voice-to-text system, and converted into target text, and the target text is sent to the receiving end, where the target text includes a feature code and a text field.
  14. 根据权利要求13所述的计算机设备,其特征在于,所述发送端检测所述发送端的第一当前网络传输带宽是否属于极低带宽的步骤,包括:The computer device according to claim 13, wherein the step of the sending end detecting whether the first current network transmission bandwidth of the sending end belongs to a very low bandwidth comprises:
    实时监测所述发送端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the sending end in real time, and compare the current network speed with the preset network speed;
    若当前网络速度大于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the first current network transmission bandwidth of the sending end does not belong to extremely low bandwidth;
    若当前网络速度小于等于预设网络速度的10%,则判定所述发送端的第一当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth.
  15. 根据权利要求13所述的计算机设备,其特征在于,所述对用户所说的语音信息进行识别,并转化为目标文字的步骤,包括:The computer device according to claim 13, wherein the step of recognizing the voice information spoken by the user and converting it into target text comprises:
    识别所述用户的语音信息;包括语义识别和声纹识别;Recognizing the voice information of the user; including semantic recognition and voiceprint recognition;
    将所述语音信息转换成文字段,以及提取所述语音信息中的音频信息特征,生成特征码;所述音频信息特征包括声纹频谱和PCM码流,所述特征码为根据声纹生成的一串符号;The voice information is converted into text fields, and the audio information features in the voice information are extracted to generate a feature code; the audio information features include a voiceprint spectrum and a PCM code stream, and the feature code is generated according to the voiceprint A string of symbols
    将特征码以预设方式加入所述文字段,得到所述目标文字。The feature code is added to the text field in a preset manner to obtain the target text.
  16. 根据权利要求15所述的计算机设备,其特征在于,所述提取所述语音信息中的音频信息特征,生成特征码的步骤之后,还包括:The computer device according to claim 15, characterized in that, after the step of extracting the audio information feature in the voice information and generating the feature code, it further comprises:
    将提取到的音频信息特征输入到预设的语音模型中,并以所生成的特征码命名所述语音模型;所述特征码作为调用语音模型的唯一识别标识;Input the extracted audio information features into a preset voice model, and name the voice model with the generated feature code; the feature code serves as a unique identification identifier for calling the voice model;
    将所述语音模型发送给所述接收端。Sending the voice model to the receiving end.
  17. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现语音和文字转换传输方法的步骤:A computer device includes a memory and a processor, the memory stores computer readable instructions, and is characterized in that, when the processor executes the computer readable instructions, the steps of the voice and text conversion transmission method are realized:
    接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽,以及检测是否接收到发送端的第一当前网络传输带宽属于极低带宽的信号;The receiving end detects whether the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth, and detects whether a signal whose first current network transmission bandwidth of the sending end belongs to an extremely low bandwidth is received;
    若所述接收端的第二当前网络传输带宽属于极低带宽和/或接收到所述发送端的第一当前网络传输带宽属于极低带宽的信号,则启动文字转语音系统,并向所述发送端发送通过文字转语音系统通信的信号;If the second current network transmission bandwidth of the receiving end belongs to extremely low bandwidth and/or the first current network transmission bandwidth of the sending end is received as a signal of extremely low bandwidth, the text-to-speech system is activated and sent to the sending end Send signals for communication through text-to-speech systems;
    接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息,并播放。Receive the target text from the sender, recognize the target text, convert the target text into voice information, and play it.
  18. 根据权利要求17所述的计算机设备,其特征在于,所述接收端检测所述接收端的第二当前网络传输带宽是否属于极低带宽的步骤,包括:18. The computer device according to claim 17, wherein the step of the receiving end detecting whether the second current network transmission bandwidth of the receiving end belongs to a very low bandwidth comprises:
    实时监测所述接收端的当前网络速度,并将当前网络速度与预设网络速度对比;Monitor the current network speed of the receiving end in real time, and compare the current network speed with the preset network speed;
    若当前网络速度大于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽不属于极低带宽;If the current network speed is greater than 10% of the preset network speed, determining that the second current network transmission bandwidth of the receiving end does not belong to extremely low bandwidth;
    若当前网络速度小于等于预设网络速度的10%,则判定所述接收端的第二当前网络传输带宽属于极低带宽。If the current network speed is less than or equal to 10% of the preset network speed, it is determined that the second current network transmission bandwidth of the receiving end belongs to an extremely low bandwidth.
  19. 根据权利要求17所述的计算机设备,其特征在于,所述接收所述发送端发来的目标文字,识别所述目标文字,将所述目标文字转化为语音信息的步骤,还包括:18. The computer device according to claim 17, wherein the step of receiving the target text sent by the sending end, recognizing the target text, and converting the target text into voice information, further comprises:
    根据目标文字附带的特征信息提取文字段;Extract text fields according to the feature information attached to the target text;
    将所述文字段中的文字转换成发音的音节,得到与音节对应的频谱信息和PCM码流;Converting the text in the text field into pronunciation syllables to obtain the spectrum information and PCM code stream corresponding to the syllables;
    根据目标文字附带的特征信息查找本地语音库中对应用户的语音模型;Find the corresponding user's voice model in the local voice database according to the feature information attached to the target text;
    将文字转换得到的频谱信息和PCM码流,与对应用户的语音模型中的频谱信息和PCM码流进行调换,得到所述用户与所述文字段对应的频谱信息和PCM码流。The spectrum information and PCM code stream obtained by text conversion are exchanged with the spectrum information and PCM code stream in the voice model of the corresponding user to obtain the spectrum information and PCM code stream corresponding to the user and the text field.
  20. 一种计算机可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。A computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the steps of the method according to any one of claims 1 to 7 when executed by a processor.
PCT/CN2019/103634 2019-05-30 2019-08-30 Voice and text conversion transmission method and system, and computer device and storage medium WO2020237886A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910465416.3 2019-05-30
CN201910465416.3A CN110349581B (en) 2019-05-30 2019-05-30 Voice and character conversion transmission method, system, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020237886A1 true WO2020237886A1 (en) 2020-12-03

Family

ID=68174517

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103634 WO2020237886A1 (en) 2019-05-30 2019-08-30 Voice and text conversion transmission method and system, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110349581B (en)
WO (1) WO2020237886A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270919B (en) * 2020-09-14 2022-11-22 深圳随锐视听科技有限公司 Method, system, storage medium and electronic device for automatically complementing sound of video conference
CN112637613A (en) * 2020-11-16 2021-04-09 深圳市声扬科技有限公司 Live broadcast audio processing method and device, computer equipment and storage medium
CN112992149B (en) * 2021-03-05 2024-04-16 中海油信息科技有限公司 Information transmission method and system for very high frequency radio station of offshore oil platform
CN113066497A (en) * 2021-03-18 2021-07-02 Oppo广东移动通信有限公司 Data processing method, device, system, electronic equipment and readable storage medium
CN112822297A (en) * 2021-04-01 2021-05-18 深圳市顺易通信息科技有限公司 Parking lot service data transmission method and related equipment
CN113257271B (en) * 2021-05-17 2023-01-10 浙江大学 Method and device for acquiring sounding motion characteristic waveform of multi-sounder and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020143543A1 (en) * 2001-03-30 2002-10-03 Sudheer Sirivara Compressing & using a concatenative speech database in text-to-speech systems
CN102223406A (en) * 2011-06-09 2011-10-19 华平信息技术股份有限公司 System and method for network-based digitalized real-time transmission of video information
CN102710539A (en) * 2012-05-02 2012-10-03 中兴通讯股份有限公司 Method and device for transferring voice messages
CN104285428A (en) * 2012-05-08 2015-01-14 三星电子株式会社 Method and system for operating communication service
CN108173740A (en) * 2017-11-30 2018-06-15 维沃移动通信有限公司 A kind of method and apparatus of voice communication

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993018505A1 (en) * 1992-03-02 1993-09-16 The Walt Disney Company Voice transformation system
US9106616B2 (en) * 2005-07-27 2015-08-11 International Business Machines Corporation Systems and method for secure delivery of files to authorized recipients
CN102348117A (en) * 2010-08-03 2012-02-08 深圳Tcl新技术有限公司 System of transmitting digital high definition signal with low bandwidth, method thereof and network multimedia television
CN102968991B (en) * 2012-11-29 2015-01-21 华为技术有限公司 Method, device and system for sorting voice conference minutes
CN106683682A (en) * 2015-11-05 2017-05-17 湖南德海通信设备制造有限公司 Method for improving speech transmission efficiency
CN107438056B (en) * 2016-05-26 2021-02-09 深圳富泰宏精密工业有限公司 VoIP communication module, electronic device and VoIP communication method
KR101874451B1 (en) * 2017-08-07 2018-08-02 시스템베이스 주식회사 Method and device for processing voice based on low bandwidth wireless communication

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020143543A1 (en) * 2001-03-30 2002-10-03 Sudheer Sirivara Compressing & using a concatenative speech database in text-to-speech systems
CN102223406A (en) * 2011-06-09 2011-10-19 华平信息技术股份有限公司 System and method for network-based digitalized real-time transmission of video information
CN102710539A (en) * 2012-05-02 2012-10-03 中兴通讯股份有限公司 Method and device for transferring voice messages
CN104285428A (en) * 2012-05-08 2015-01-14 三星电子株式会社 Method and system for operating communication service
CN108173740A (en) * 2017-11-30 2018-06-15 维沃移动通信有限公司 A kind of method and apparatus of voice communication

Also Published As

Publication number Publication date
CN110349581B (en) 2023-04-18
CN110349581A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
WO2020237886A1 (en) Voice and text conversion transmission method and system, and computer device and storage medium
CN110832579B (en) Audio playing system, streaming audio player and related methods
JP6113302B2 (en) Audio data transmission method and apparatus
US6173250B1 (en) Apparatus and method for speech-text-transmit communication over data networks
US20070274296A1 (en) Voip barge-in support for half-duplex dsr client on a full-duplex network
US20020152076A1 (en) System for permanent alignment of text utterances to their associated audio utterances
US20020103646A1 (en) Method and apparatus for performing text-to-speech conversion in a client/server environment
US20070050188A1 (en) Tone contour transformation of speech
US20190304472A1 (en) User authentication
CN110992955A (en) Voice operation method, device, equipment and storage medium of intelligent equipment
US20080059197A1 (en) System and method for providing real-time communication of high quality audio
CN111107284B (en) Real-time generation system and generation method for video subtitles
US6615173B1 (en) Real time audio transmission system supporting asynchronous input from a text-to-speech (TTS) engine
US11328131B2 (en) Real-time chat and voice translator
US6501751B1 (en) Voice communication with simulated speech data
KR102181583B1 (en) System for voice recognition of interactive robot and the method therof
CN110534084B (en) Intelligent voice control method and system based on FreeWITCH
KR101952730B1 (en) Radio Communication Systems capable of Voice Recognition with Voting Technology for Communication Contents
CN114648989A (en) Voice information processing method and device implemented in electronic equipment and storage medium
US6980957B1 (en) Audio transmission system with reduced bandwidth consumption
Maes et al. Conversational networking: conversational protocols for transport, coding, and control.
CN115457977A (en) Two-way dual-mode audio interaction system of transceiving end
US11367446B2 (en) Information dissemination system and method thereof
Liu et al. Design and realization of dialect interaction system based on VAD
Dantas Communications Through Speech-to-speech Piplines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19930497

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19930497

Country of ref document: EP

Kind code of ref document: A1