CN107863981B - Method for shortening call setup time and interphone - Google Patents
Method for shortening call setup time and interphone Download PDFInfo
- Publication number
- CN107863981B CN107863981B CN201711404367.XA CN201711404367A CN107863981B CN 107863981 B CN107863981 B CN 107863981B CN 201711404367 A CN201711404367 A CN 201711404367A CN 107863981 B CN107863981 B CN 107863981B
- Authority
- CN
- China
- Prior art keywords
- voice
- speech
- voices
- receiving device
- inactive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000004904 shortening Methods 0.000 title claims abstract description 17
- 238000001514 detection method Methods 0.000 claims description 19
- 230000000694 effects Effects 0.000 claims description 16
- 239000012634 fragment Substances 0.000 claims description 11
- 238000012790 confirmation Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 abstract description 3
- 238000003825 pressing Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- JNRLEMMIVRBKJE-UHFFFAOYSA-N 4,4'-Methylenebis(N,N-dimethylaniline) Chemical compound C1=CC(N(C)C)=CC=C1CC1=CC=C(N(C)C)C=C1 JNRLEMMIVRBKJE-UHFFFAOYSA-N 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B1/00—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
- H04B1/38—Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
- H04B1/3827—Portable transceivers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0018—Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/06—Selective distribution of broadcast services, e.g. multimedia broadcast multicast service [MBMS]; Services to user groups; One-way selective calling services
- H04W4/10—Push-to-Talk [PTT] or Push-On-Call services
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
The application provides a method for shortening call setup time and an interphone, which can improve the experience that a user can transmit voice only after sounding a prompt tone after call setup. When the PTT key is not pressed down, and the contact time of the finger of the user and the PTT key is greater than a preset value, the voice is collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. In addition, after the voice is collected, the voice can be stored, or the voice is subjected to voice coding to obtain coded voice, and then the coded voice is stored. And the voice or the voice processed by coding is split to obtain active voice and inactive voice, the inactive voice is discarded according to a preset proportion, the transmission delay can be reduced, in the process of establishing the call, the call establishment time is equivalently shortened to 0, and a user can realize push-to-talk.
Description
Technical Field
The invention relates to the field of communication, in particular to a method for shortening call setup time and an interphone.
Background
The interphone is a two-way mobile communication tool, can be used for communication without any network support, has no telephone charge, and is suitable for occasions with relatively fixed and frequent communication.
At present, when an interphone is used, only after call establishment is completed, voice input by a user can be collected. The process of call setup includes: a processor in the interphone detects that the PTT key is pressed, carries out identification processing of the PTT key, and then informs a user that the user can start speaking by voice prompt tone after connection with a called party is established. Namely, when the interphone is used, after the voice prompt tone is played, the voice input by the user can be collected and sent to the called party.
However, after the user generally presses the PTT button, the user starts speaking, and the speech spoken by the user before the alert tone is played is not collected, so that the speech input by the user before the alert tone is played is lost, and the integrity of the two-party conversation is affected.
Disclosure of Invention
In view of this, the present invention provides a method for shortening call setup time and an intercom, so as to solve the problem that a user starts speaking after pressing a PTT button, and then the voice spoken by the user before the prompt tone is played is not collected, thereby causing the voice input by the user before the prompt tone is played to be lost, and affecting the integrity of the two parties' conversation.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for shortening call setup time is applied to an interphone and comprises the following steps:
detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point of a finger of a user and the PTT key;
when the touch point on the PTT key is detected to exist, judging whether the time for the touch point to exist is greater than a preset numerical value or not;
when the time for the touch point to exist is judged to be greater than the preset value, voice is collected;
and when the fact that the PTT key is pressed down and the interphone is connected with a receiving device is detected, the collected voice is sent to the receiving device.
Preferably, after the voice is collected, the method further comprises:
saving the voice;
splitting the speech into a plurality of active speech and a plurality of inactive speech;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, the sending the collected voice to the receiving device specifically includes:
transmitting the user input speech to the receiving device.
Preferably, splitting the speech into a plurality of active speech and a plurality of inactive speech comprises:
splitting the voice into a plurality of the active voices and a plurality of the inactive voices by adopting a voice activity detection algorithm.
Preferably, transmitting the user input voice to the receiving apparatus includes:
carrying out voice coding and channel coding on the voice input by the user to obtain coded voice;
transmitting the encoded speech to the receiving device.
Preferably, after the voice is collected, the method further comprises:
carrying out voice coding on the voice to obtain coded voice;
saving the encoded processed speech;
splitting the coded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in a voice coding process;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, the sending the collected voice to the receiving device specifically includes:
transmitting the user input speech to the receiving device.
Preferably, transmitting the user input voice to the receiving apparatus includes:
carrying out channel coding on the user input voice to obtain channel coded voice;
transmitting the channel-coded speech to the receiving device.
An interphone comprises a memory and a processor;
wherein the memory is used for storing programs;
a processor for executing a program, wherein when the processor executes the program:
detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point of a finger of a user and the PTT key;
when the touch point on the PTT key is detected to exist, judging whether the time for the touch point to exist is greater than a preset numerical value or not;
when the time for the touch point to exist is judged to be longer than the preset value, voice is collected;
and when the fact that the PTT key is pressed down and the interphone is connected with a receiving device is detected, the collected voice is sent to the receiving device.
Preferably, after the processor is configured to collect the speech, the processor is further configured to:
saving the voice;
splitting the speech into a plurality of active speech and a plurality of inactive speech;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, when the processor is configured to send the collected voice to the receiving device, the processor is specifically configured to:
transmitting the user input speech to the receiving device.
Preferably, the processor is configured to, when splitting the speech into a plurality of active speeches and a plurality of inactive speeches, specifically:
splitting the voice into a plurality of the active voices and a plurality of the inactive voices by adopting a voice activity detection algorithm.
Preferably, the processor is configured to, when sending the user input speech to the receiving device, specifically:
carrying out voice coding and channel coding on the voice input by the user to obtain coded voice;
transmitting the encoded speech to the receiving device.
Preferably, after the processor is configured to collect the speech, the processor is further configured to:
carrying out voice coding on the voice to obtain coded voice;
saving the encoded processed speech;
splitting the coded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in a voice coding process;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, when the processor is configured to send the collected voice to the receiving device, the processor is specifically configured to:
transmitting the user input speech to the receiving device.
Preferably, the processor is configured to, when sending the user input speech to the receiving device, specifically:
carrying out channel coding on the user input voice to obtain channel coded voice;
transmitting the channel-coded speech to the receiving device.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a method for shortening call setup time and an interphone. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.
In addition, each inactive voice is discarded by a preset proportion of partial inactive voice, so that the voice playing time of the called party can be reduced, the voice delay time can be further reduced, and the transmission delay can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a method for shortening call setup time according to the present invention;
FIG. 2 is a flow chart of another method for reducing call setup time according to the present invention;
FIG. 3 is a flowchart of a method for reducing call setup time according to another embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an apparatus for shortening call setup time according to the present invention;
FIG. 5 is a schematic structural diagram of another apparatus for shortening call setup time according to the present invention;
fig. 6 is a schematic structural diagram of another apparatus for shortening call setup time according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a method for shortening call setup time, which is applied to an interphone, and with reference to FIG. 1, the method for shortening the call setup time comprises the following steps:
s101, detecting whether a touch point exists on a PTT key of the interphone; when the touch point on the PTT key is detected, step S102 is executed.
The touch point is a contact point between a finger of a user and the PTT key. Whether a touch point exists on a PTT key of the interphone is detected, namely whether a finger of a user is in contact with the PTT key is detected. The PTT key in this embodiment is a PTT key having a touch sensing function.
It should be noted that the mode used by the intercom is the PTT mode, the PTT mode refers to half-duplex conversation, the whole conversation process requires pressing the PTT button to speak, and when a user generally uses the intercom, the user firstly has the action of placing a finger on the PTT button, and after the position of the button is determined, the user presses the PTT button to start speaking, and tens of milliseconds are required from placing the finger on the PTT button to pressing the PTT button.
S102, judging whether the existing time of the touch point is larger than a preset numerical value or not; when it is determined that the time for which the touch point exists is greater than the preset value, step S103 is performed.
Specifically, it is determined whether the time that the touch point exists is greater than a preset value, that is, it is determined whether the contact time between the user's finger and the PTT key is greater than the preset value before the user presses the PTT key, where the preset value may be several seconds or tens of seconds.
It should be noted that the determination of whether the touch point exists for a time longer than the preset value is to prevent the user from accidentally touching the PTT button but not speaking.
S103, voice is collected.
Specifically, the process of collecting speech includes: the microphone of the intercom is switched on and the analog-to-digital converter ADC of the CODEC. The microphone can receive a sound signal input by a user and convert the sound signal into an analog signal, and the analog converter converts the analog signal into a digital signal and stores the digital signal.
And S104, when the fact that the PTT key is pressed down and the interphone is connected with the receiving equipment is detected, the collected voice is sent to the receiving equipment.
Optionally, in another embodiment of the present invention, the process of establishing connection between the intercom and the receiving device includes:
and generating and sending call request information to the receiving equipment, and receiving the connection confirmation information sent by the receiving equipment.
After the call request information is sent to the receiving equipment, the receiving equipment sends the connection confirmation information to the interphone through the air interface in the next time slot of the time slot in which the call request information is received. The air interface is an interface between a base station and a mobile terminal in a mobile communication network.
Wherein, the receiving device can be a base station or other interphones. The base station can be a digital mobile radio DMR, a digital trunked PDT, a terrestrial trunked radio TETRA base station, and the intercom includes a mobile station or a vehicle station.
In this embodiment, when the PTT button is not pressed, and the contact time between the finger of the user and the PTT button is greater than a preset value, the voice starts to be collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.
Optionally, in another embodiment of the present invention, after step S103, the method further includes:
s204, voice is saved;
the interphone is provided with a buffer area used for storing collected voice data, wherein the voice data of the buffer area is stored in a first-in first-out mode. I.e., voice data stored preferentially, is preferentially transmitted to the receiving apparatus.
The size of the buffer can be determined according to the usage situation, mainly considering the allowable voice delay, the statistical length of the lost information, and the size of the space supported by the memory, etc., in the worst case DMR 1: 4 power saving for example, 480ms pre-carrier is sent, and the PTT button software processing time is 100ms, approximately one space is opened to store 580ms of voice data, but this space is not fully used, some inactive voice re-storage is removed, and 580ms is the largest and most secure space opened.
When the storage space can store 580ms of voice data, if the received voice data exceeds 580ms, the voice data that exceeds the storage space sequentially overwrites the first stored voice data.
For example, assuming that 600ms of voice data is stored, since the storage space can only store 580ms of voice data, 580-600ms of data is overwritten to 580ms of data between 0-20 ms.
Since the storage unit generally uses byte as a unit, when storing voice data, it is necessary to see how much space 580ms of voice data needs to occupy, and sampling with 16 bytes at 8k sampling rate, then 100ms will occupy 800 × 2 bytes, and 580ms will occupy 800 × 2 × 5.8 bytes.
S205, splitting the voice into a plurality of active voices and a plurality of inactive voices;
optionally, in another embodiment of the present invention, step S205 includes:
and splitting the voice into a plurality of active voices and a plurality of inactive voices by adopting a voice activity detection algorithm.
Voice Activity Detection (VAD) is also called Voice endpoint Detection, Voice boundary Detection. The method aims to identify and eliminate a long-time mute period from a sound signal flow so as to achieve the effect of saving speech channel resources under the condition of not reducing service quality and be beneficial to reducing end-to-end time delay felt by a user.
The active speech includes speech of the user speaking, and the inactive speech is silent, that is, the inactive speech does not include speech of the user speaking.
Wherein active speech is not discardable and inactive speech is discardable.
S206, discarding a part of inactive voice with a preset proportion from each inactive voice to obtain a plurality of segment voices;
specifically, for the inactive speech, a preset proportion of the inactive speech is discarded, where the preset proportion may be 0.5, 0.8, or 1, and taking a segment of inactive speech of 10ms as an example, the preset proportion is 0.5, then 0.5 of the inactive speech is discarded, that is, the segment of inactive speech of 10ms is changed into a segment of 5 ms.
It should be noted that, experiments prove that, 120ms of inactive speech is stolen from 1s of speech, the remaining speech is spliced to form continuous speech and then transmitted, and a user hardly perceives a difference from the original speech.
S207, combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
the voice generation time is a time when the user utters the voice. Now, a process of combining a plurality of active voices and a plurality of segment voices according to a voice generation time to obtain a user input voice will be described.
Suppose that a voice is split into 3 active voices and 1 inactive voice, and the voice generation time sequence of the 3 active voices and 1 inactive voice is active voice 1, inactive voice, active voice 2, and active voice 3.
And discarding partial inactive voice with a preset proportion from the inactive voice to obtain 1 segment voice, placing the active voice 1 in the first segment voice and the segment voice in the second segment voice in the process of combining the active voice and the segment voice, copying the active voice 2 and the active voice 3, and sequentially placing the active voice and the segment voice behind.
Accordingly, step S104 is modified to step S208:
the user input speech is transmitted to the receiving device.
Optionally, in another embodiment of the present invention, sending the user input voice to the receiving device includes:
and carrying out voice coding and channel coding on the voice input by the user to obtain coded voice, and sending the coded voice to the receiving equipment.
The speech coding, i.e. vocoder coding, is based on a digital model generated from a speech signal, and analyzes the digital speech to provide a set of feature parameters. The characteristic parameters mainly refer to excitation parameters representing glottic vibration and sound channel parameters representing sound channel characteristics, the parameters carry main information of the voice signals, the parameters only need a small number of bits when being coded, the parameters can be used for re-synthesizing the voice signals after being decoded, and the coding rate can be as low as 2.4kbit/s and below.
Channel coding is error correction and detection coding of digital signals to be transmitted in a channel. Specifically, during channel coding, forward error correction FEC check may be added to the digital signal converted by the analog-to-digital converter.
In this embodiment, each inactive voice is discarded by a predetermined proportion of inactive voices, so that the voice playing time of the called party can be reduced, and further the voice delay time can be reduced, i.e., the transmission delay can be reduced.
Optionally, in another embodiment of the present invention, referring to fig. 3, after acquiring the speech, the method further includes:
s304, carrying out voice coding on the voice to obtain a coded voice;
for the process of speech coding, please refer to the description in the above embodiments, which is not repeated herein.
S305, saving the coding processing voice;
taking a 2.4Kbps vocoder with FEC, the overall rate is 3.6Kbps, and the storage space occupied by 580ms of speech is 261 Bytes.
S306, based on the voice activity detection result in the voice coding process, splitting the coded voice into a plurality of active voices and a plurality of inactive voices;
in the process of encoding speech, it is possible to distinguish whether speech included in the encoded speech is active speech or inactive speech.
S307, discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
s308, combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
step S104 is changed to step S309:
the user input speech is transmitted to the receiving device.
Optionally, in another embodiment of the present invention, sending the user input voice to the receiving device includes:
carrying out channel coding on the voice input by the user to obtain channel coded voice;
the channel coded speech is transmitted to a receiving device.
In this embodiment, after speech coding is performed first, the coded speech is split into a plurality of active speech and a plurality of inactive speech, and the time for obtaining the plurality of active speech and the plurality of inactive speech by splitting in the above embodiment is different, and further, when the plurality of active speech and the plurality of inactive speech are obtained by splitting, different processing methods are provided.
Optionally, another embodiment of the present invention provides a device for shortening call setup time, which is applied to an intercom, and referring to fig. 4, the device for shortening call setup time includes:
the detection unit 101 is used for detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point between a finger of a user and the PTT key;
the judging unit 102 is configured to, when the detecting unit detects that a touch point exists on the PTT key, judge whether time for which the touch point exists is greater than a preset value;
the acquisition unit 103 is used for acquiring voice when the judgment unit 102 judges that the time for the touch point to exist is greater than a preset value;
and the sending unit 104 is configured to send the collected voice to the receiving device after detecting that the PTT key is pressed and the intercom is connected to the receiving device.
Optionally, in another embodiment of the present invention, when the sending unit sends the collected voice to the receiving device, the sending unit is specifically configured to:
carrying out voice coding and channel coding on voice input by a user to obtain coded voice;
the encoded speech is transmitted to a receiving device.
In this embodiment, when the PTT button is not pressed, and the contact time between the finger of the user and the PTT button is longer than a preset value, the voice starts to be collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.
It should be noted that, for the working process of each unit in this embodiment, please refer to the description in the above embodiments, which is not described herein again.
Optionally, in another embodiment of the present invention, referring to fig. 5, further including:
the storage unit 105 is used for storing the voice after the acquisition unit 103 acquires the voice;
a splitting unit 106, configured to split the voice into a plurality of active voices and a plurality of inactive voices;
a voice discarding unit 107, configured to discard a preset proportion of inactive voices of each inactive voice to obtain multiple segment voices;
a voice combining unit 108, configured to combine the multiple active voices and the multiple segment voices according to the voice generation time to obtain a user input voice;
correspondingly, when the sending unit 104 is configured to send the collected voice to the receiving device, it is specifically configured to: the user input speech is transmitted to the receiving device.
Optionally, in another embodiment of the present invention, the splitting unit 106 includes:
and the splitting subunit is used for splitting the voice into a plurality of active voices and a plurality of inactive voices by adopting a voice activity detection algorithm.
In this embodiment, each inactive voice is discarded by a predetermined proportion of inactive voices, so that the voice playing time of the called party can be reduced, and the voice delay time can be further reduced.
It should be noted that, for the working process of each unit in this embodiment, please refer to the description in the above embodiments, which is not described herein again.
Optionally, in another embodiment of the present invention, the apparatus for shortening call setup time further includes:
the voice coding unit 109 is configured to perform voice coding on the voice after the acquisition unit 103 acquires the voice, so as to obtain a coded voice;
a speech saving unit 110 for saving the encoding-processed speech;
a voice splitting unit 111, configured to split the encoded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in the voice encoding process;
a discarding unit 112, configured to discard a preset proportion of inactive voices of each inactive voice to obtain multiple segment voices;
a combining unit 113, configured to combine the multiple active voices and the multiple segment voices according to the voice generation time to obtain a user input voice;
correspondingly, when the sending unit 104 sends the collected voice to the receiving device, it is specifically configured to:
the user input speech is transmitted to the receiving device.
Optionally, in another embodiment of the present invention, the sending unit 104 includes:
the encoding unit is used for carrying out channel encoding on the voice input by the user to obtain channel encoded voice;
a voice transmitting unit for transmitting the channel-coded voice to the receiving apparatus.
In this embodiment, after speech coding is performed first, the coded speech is split into a plurality of active speech and a plurality of inactive speech, and the time for obtaining the plurality of active speech and the plurality of inactive speech by splitting in the above embodiment is different, and further, when the plurality of active speech and the plurality of inactive speech are obtained by splitting, different processing methods are provided.
It should be noted that, for the working process of each unit in this embodiment, please refer to the description in the above embodiments, which is not described herein again.
Optionally, another embodiment of the present invention provides an intercom, including a memory and a processor;
wherein the memory is used for storing programs;
a processor for executing a program, wherein when the processor executes the program:
detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point of a finger of a user and the PTT key;
when the touch point on the PTT key is detected to exist, judging whether the time for the touch point to exist is greater than a preset numerical value or not;
when the time for the touch point to exist is judged to be longer than the preset value, voice is collected;
and when the fact that the PTT key is pressed down and the interphone is connected with a receiving device is detected, the collected voice is sent to the receiving device.
On the basis of the above embodiment, after the processor is configured to collect the speech, the processor is further configured to:
saving the voice;
splitting the speech into a plurality of active speech and a plurality of inactive speech;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, when the processor is configured to send the collected voice to the receiving device, the processor is specifically configured to:
transmitting the user input speech to the receiving device.
On the basis of the foregoing embodiment, when the processor is configured to split the speech into a plurality of active speeches and a plurality of inactive speeches, the processor is specifically configured to:
splitting the voice into a plurality of the active voices and a plurality of the inactive voices by adopting a voice activity detection algorithm.
On the basis of the foregoing embodiment, when the processor is configured to send the user input speech to the receiving device, the processor is specifically configured to:
carrying out voice coding and channel coding on the voice input by the user to obtain coded voice;
transmitting the encoded speech to the receiving device.
On the basis of the above embodiment, after the processor is configured to collect the speech, the processor is further configured to:
carrying out voice coding on the voice to obtain coded voice;
saving the encoded processed speech;
splitting the coded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in a voice coding process;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, when the processor is configured to send the collected voice to the receiving device, the processor is specifically configured to:
transmitting the user input speech to the receiving device.
On the basis of the foregoing embodiment, when the processor is configured to send the user input speech to the receiving device, the processor is specifically configured to:
carrying out channel coding on the user input voice to obtain channel coded voice;
transmitting the channel-coded speech to the receiving device.
In the embodiment, when the PTT key is not pressed down, and the contact time between the finger of the user and the PTT key is greater than the preset value, the voice starts to be collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (12)
1. A method for shortening call setup time is applied to an interphone and comprises the following steps: detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point of a finger of a user and the PTT key;
when the touch point on the PTT key is detected to exist, judging whether the time for the touch point to exist before the PTT key is pressed is greater than a preset numerical value or not;
when the time for the touch point to exist is judged to be greater than the preset value, voice is collected;
when the fact that the PTT key is pressed down and the interphone is connected with a receiving device is detected, the collected voice is sent to the receiving device; the process for detecting that the interphone is connected with the receiving equipment comprises the following steps: generating and transmitting call request information to a receiving device; receiving the connection confirmation information sent by the receiving equipment in the next time slot of the time slot receiving the call request information;
the PTT button is used for establishing a call connection, and the call connection is used for sending voice.
2. The method of claim 1, wherein after the collecting the speech, further comprising:
saving the voice;
splitting the speech into a plurality of active speech and a plurality of inactive speech;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, the sending the collected voice to the receiving device specifically includes:
transmitting the user input speech to the receiving device.
3. The method of claim 2, wherein splitting the speech into a plurality of active speech and a plurality of inactive speech comprises:
splitting the voice into a plurality of the active voices and a plurality of the inactive voices by adopting a voice activity detection algorithm.
4. The method of claim 2, wherein transmitting the user input speech to the receiving device comprises:
carrying out voice coding and channel coding on the voice input by the user to obtain coded voice;
transmitting the encoded speech to the receiving device.
5. The method of claim 1, wherein after the collecting the speech, further comprising:
carrying out voice coding on the voice to obtain coded voice;
saving the encoded processed speech;
splitting the coded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in a voice coding process;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, the sending the collected voice to the receiving device specifically includes:
transmitting the user input speech to the receiving device.
6. The method of claim 5, wherein sending the user input speech to the receiving device comprises:
carrying out channel coding on the user input voice to obtain channel coded voice;
transmitting the channel-coded speech to the receiving device.
7. An interphone is characterized by comprising a memory and a processor;
wherein the memory is used for storing programs;
a processor for executing a program, wherein when the processor executes the program:
detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point of a finger of a user and the PTT key;
when the touch point on the PTT key is detected to exist, judging whether the time for the touch point to exist before the PTT key is pressed is greater than a preset numerical value or not;
when the time for the touch point to exist is judged to be longer than the preset value, voice is collected;
when the fact that the PTT key is pressed down and the interphone is connected with a receiving device is detected, the collected voice is sent to the receiving device; the process for detecting that the interphone is connected with the receiving equipment comprises the following steps: generating and transmitting call request information to a receiving device; receiving the connection confirmation information sent by the receiving equipment in the next time slot of the time slot receiving the call request information;
the PTT button is used for establishing a call connection, and the call connection is used for sending voice.
8. The intercom of claim 7, wherein the processor, after being configured to collect the voice, is further configured to:
saving the voice;
splitting the speech into a plurality of active speech and a plurality of inactive speech;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, when the processor is configured to send the collected voice to the receiving device, the processor is specifically configured to:
transmitting the user input speech to the receiving device.
9. The intercom of claim 8, wherein the processor is configured to, when splitting the voice into a plurality of active voices and a plurality of inactive voices, specifically:
splitting the voice into a plurality of the active voices and a plurality of the inactive voices by adopting a voice activity detection algorithm.
10. The intercom of claim 8, wherein the processor, when configured to send the user input speech to the receiving device, is specifically configured to:
carrying out voice coding and channel coding on the voice input by the user to obtain coded voice;
transmitting the encoded speech to the receiving device.
11. The intercom of claim 7, wherein the processor, after being configured to collect the voice, is further configured to:
carrying out voice coding on the voice to obtain coded voice;
saving the encoded processed speech;
splitting the coded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in a voice coding process;
discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;
combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;
correspondingly, when the processor is configured to send the collected voice to the receiving device, the processor is specifically configured to:
transmitting the user input speech to the receiving device.
12. The intercom of claim 11, wherein the processor, when configured to send the user input speech to the receiving device, is specifically configured to:
carrying out channel coding on the user input voice to obtain channel coded voice;
transmitting the channel-coded speech to the receiving device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711404367.XA CN107863981B (en) | 2017-12-22 | 2017-12-22 | Method for shortening call setup time and interphone |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711404367.XA CN107863981B (en) | 2017-12-22 | 2017-12-22 | Method for shortening call setup time and interphone |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107863981A CN107863981A (en) | 2018-03-30 |
CN107863981B true CN107863981B (en) | 2020-08-28 |
Family
ID=61706999
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711404367.XA Active CN107863981B (en) | 2017-12-22 | 2017-12-22 | Method for shortening call setup time and interphone |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107863981B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019119406A1 (en) * | 2017-12-22 | 2019-06-27 | 海能达通信股份有限公司 | Method, device and two-way radio for shortening call establishment time |
CN116095054A (en) * | 2022-11-03 | 2023-05-09 | 国网北京市电力公司 | Voice playing method and device, computer readable storage medium and computer equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008109187A (en) * | 2006-10-23 | 2008-05-08 | Sharp Corp | Mobile information terminal and control method therefor |
CN102761658A (en) * | 2011-10-26 | 2012-10-31 | 北京推博信息技术有限公司 | Method and device for performing voice talkback through mobile terminal |
WO2017019508A1 (en) * | 2015-07-29 | 2017-02-02 | Motorola Solutions, Inc. | Push-to-talk function enabled by eye and voice detection in a mobile device |
CN106799736A (en) * | 2017-01-19 | 2017-06-06 | 深圳市鑫益嘉科技股份有限公司 | The interactive triggering method and robot of a kind of robot |
CN107040359A (en) * | 2017-05-08 | 2017-08-11 | 海能达通信股份有限公司 | Method, device and the equipment of channel associated signalling are carried in a kind of voice call procedure |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101764882A (en) * | 2009-12-31 | 2010-06-30 | 深圳市戴文科技有限公司 | PTT conversation device and method for realizing PTT conversation |
CN106303644B (en) * | 2016-09-08 | 2020-03-31 | 康佳集团股份有限公司 | Voice remote controller and voice acquisition method and system thereof |
CN106604096A (en) * | 2016-12-30 | 2017-04-26 | 深圳Tcl数字技术有限公司 | Sound-recording control method and apparatus |
-
2017
- 2017-12-22 CN CN201711404367.XA patent/CN107863981B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008109187A (en) * | 2006-10-23 | 2008-05-08 | Sharp Corp | Mobile information terminal and control method therefor |
CN102761658A (en) * | 2011-10-26 | 2012-10-31 | 北京推博信息技术有限公司 | Method and device for performing voice talkback through mobile terminal |
WO2017019508A1 (en) * | 2015-07-29 | 2017-02-02 | Motorola Solutions, Inc. | Push-to-talk function enabled by eye and voice detection in a mobile device |
CN106799736A (en) * | 2017-01-19 | 2017-06-06 | 深圳市鑫益嘉科技股份有限公司 | The interactive triggering method and robot of a kind of robot |
CN107040359A (en) * | 2017-05-08 | 2017-08-11 | 海能达通信股份有限公司 | Method, device and the equipment of channel associated signalling are carried in a kind of voice call procedure |
Also Published As
Publication number | Publication date |
---|---|
CN107863981A (en) | 2018-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5701916B2 (en) | Method and system for writing a telephone conversation into text | |
US20070225049A1 (en) | Voice controlled push to talk system | |
US20070136055A1 (en) | System for data communication over voice band robust to noise | |
US11501779B2 (en) | Bluetooth speaker base, method and system for controlling thereof | |
CN102984666B (en) | Address list voice information processing method in a kind of communication process and system | |
CN1507593A (en) | On-line music data providing system VIA bluetooth headset | |
CN102781075A (en) | Method for reducing communication power consumption of mobile terminal and mobile terminal | |
CN107863981B (en) | Method for shortening call setup time and interphone | |
CN110351419B (en) | Intelligent voice system and voice processing method thereof | |
CN105551491A (en) | Voice recognition method and device | |
JP4445515B2 (en) | Information processing device | |
JP6549009B2 (en) | Communication terminal and speech recognition system | |
CN106656274B (en) | Voice transmission system | |
CN112911062B (en) | Voice processing method, control device, terminal device and storage medium | |
WO2019119406A1 (en) | Method, device and two-way radio for shortening call establishment time | |
US20210312934A1 (en) | Communication method, apparatus, and system for digital enhanced cordless telecommunications (dect) base station | |
US11924717B2 (en) | System and method for data analytics for communications in walkie-talkie network | |
US6625474B1 (en) | Method and apparatus for audio signal based answer call message generation | |
JP2007104184A (en) | Information storage device, and information processing method | |
KR20060089493A (en) | Apparatus and method for using voice recognition remote controller in mobile communication terminal | |
CN116489629B (en) | Bluetooth audio data transmission method, device, equipment and medium | |
KR100291002B1 (en) | Method for communication control regist ration and recognition by speech in digital hand phone | |
KR20080066136A (en) | Play method for music using bluetooth headset | |
KR100428717B1 (en) | Speech signal transmission method on data channel | |
CN101577751A (en) | Method for mobile terminal to realize function of hearing aid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |