CN107863981B

CN107863981B - Method for shortening call setup time and interphone

Info

Publication number: CN107863981B
Application number: CN201711404367.XA
Authority: CN
Inventors: 黄妮; 谢汉雄; 郭飞
Original assignee: Hytera Communications Corp Ltd
Current assignee: Hytera Communications Corp Ltd
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2020-08-28
Anticipated expiration: 2037-12-22
Also published as: CN107863981A

Abstract

The application provides a method for shortening call setup time and an interphone, which can improve the experience that a user can transmit voice only after sounding a prompt tone after call setup. When the PTT key is not pressed down, and the contact time of the finger of the user and the PTT key is greater than a preset value, the voice is collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. In addition, after the voice is collected, the voice can be stored, or the voice is subjected to voice coding to obtain coded voice, and then the coded voice is stored. And the voice or the voice processed by coding is split to obtain active voice and inactive voice, the inactive voice is discarded according to a preset proportion, the transmission delay can be reduced, in the process of establishing the call, the call establishment time is equivalently shortened to 0, and a user can realize push-to-talk.

Description

Method for shortening call setup time and interphone

Technical Field

The invention relates to the field of communication, in particular to a method for shortening call setup time and an interphone.

Background

The interphone is a two-way mobile communication tool, can be used for communication without any network support, has no telephone charge, and is suitable for occasions with relatively fixed and frequent communication.

At present, when an interphone is used, only after call establishment is completed, voice input by a user can be collected. The process of call setup includes: a processor in the interphone detects that the PTT key is pressed, carries out identification processing of the PTT key, and then informs a user that the user can start speaking by voice prompt tone after connection with a called party is established. Namely, when the interphone is used, after the voice prompt tone is played, the voice input by the user can be collected and sent to the called party.

However, after the user generally presses the PTT button, the user starts speaking, and the speech spoken by the user before the alert tone is played is not collected, so that the speech input by the user before the alert tone is played is lost, and the integrity of the two-party conversation is affected.

Disclosure of Invention

In view of this, the present invention provides a method for shortening call setup time and an intercom, so as to solve the problem that a user starts speaking after pressing a PTT button, and then the voice spoken by the user before the prompt tone is played is not collected, thereby causing the voice input by the user before the prompt tone is played to be lost, and affecting the integrity of the two parties' conversation.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method for shortening call setup time is applied to an interphone and comprises the following steps:

detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point of a finger of a user and the PTT key;

when the touch point on the PTT key is detected to exist, judging whether the time for the touch point to exist is greater than a preset numerical value or not;

when the time for the touch point to exist is judged to be greater than the preset value, voice is collected;

and when the fact that the PTT key is pressed down and the interphone is connected with a receiving device is detected, the collected voice is sent to the receiving device.

Preferably, after the voice is collected, the method further comprises:

saving the voice;

splitting the speech into a plurality of active speech and a plurality of inactive speech;

discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;

combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;

correspondingly, the sending the collected voice to the receiving device specifically includes:

transmitting the user input speech to the receiving device.

Preferably, splitting the speech into a plurality of active speech and a plurality of inactive speech comprises:

splitting the voice into a plurality of the active voices and a plurality of the inactive voices by adopting a voice activity detection algorithm.

Preferably, transmitting the user input voice to the receiving apparatus includes:

carrying out voice coding and channel coding on the voice input by the user to obtain coded voice;

transmitting the encoded speech to the receiving device.

Preferably, after the voice is collected, the method further comprises:

carrying out voice coding on the voice to obtain coded voice;

saving the encoded processed speech;

splitting the coded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in a voice coding process;

transmitting the user input speech to the receiving device.

carrying out channel coding on the user input voice to obtain channel coded voice;

transmitting the channel-coded speech to the receiving device.

An interphone comprises a memory and a processor;

wherein the memory is used for storing programs;

a processor for executing a program, wherein when the processor executes the program:

when the time for the touch point to exist is judged to be longer than the preset value, voice is collected;

Preferably, after the processor is configured to collect the speech, the processor is further configured to:

saving the voice;

correspondingly, when the processor is configured to send the collected voice to the receiving device, the processor is specifically configured to:

transmitting the user input speech to the receiving device.

Preferably, the processor is configured to, when splitting the speech into a plurality of active speeches and a plurality of inactive speeches, specifically:

Preferably, the processor is configured to, when sending the user input speech to the receiving device, specifically:

transmitting the encoded speech to the receiving device.

carrying out voice coding on the voice to obtain coded voice;

saving the encoded processed speech;

transmitting the user input speech to the receiving device.

transmitting the channel-coded speech to the receiving device.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a method for shortening call setup time and an interphone. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.

In addition, each inactive voice is discarded by a preset proportion of partial inactive voice, so that the voice playing time of the called party can be reduced, the voice delay time can be further reduced, and the transmission delay can be reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart of a method for shortening call setup time according to the present invention;

FIG. 2 is a flow chart of another method for reducing call setup time according to the present invention;

FIG. 3 is a flowchart of a method for reducing call setup time according to another embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for shortening call setup time according to the present invention;

FIG. 5 is a schematic structural diagram of another apparatus for shortening call setup time according to the present invention;

fig. 6 is a schematic structural diagram of another apparatus for shortening call setup time according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a method for shortening call setup time, which is applied to an interphone, and with reference to FIG. 1, the method for shortening the call setup time comprises the following steps:

s101, detecting whether a touch point exists on a PTT key of the interphone; when the touch point on the PTT key is detected, step S102 is executed.

The touch point is a contact point between a finger of a user and the PTT key. Whether a touch point exists on a PTT key of the interphone is detected, namely whether a finger of a user is in contact with the PTT key is detected. The PTT key in this embodiment is a PTT key having a touch sensing function.

It should be noted that the mode used by the intercom is the PTT mode, the PTT mode refers to half-duplex conversation, the whole conversation process requires pressing the PTT button to speak, and when a user generally uses the intercom, the user firstly has the action of placing a finger on the PTT button, and after the position of the button is determined, the user presses the PTT button to start speaking, and tens of milliseconds are required from placing the finger on the PTT button to pressing the PTT button.

S102, judging whether the existing time of the touch point is larger than a preset numerical value or not; when it is determined that the time for which the touch point exists is greater than the preset value, step S103 is performed.

Specifically, it is determined whether the time that the touch point exists is greater than a preset value, that is, it is determined whether the contact time between the user's finger and the PTT key is greater than the preset value before the user presses the PTT key, where the preset value may be several seconds or tens of seconds.

It should be noted that the determination of whether the touch point exists for a time longer than the preset value is to prevent the user from accidentally touching the PTT button but not speaking.

S103, voice is collected.

Specifically, the process of collecting speech includes: the microphone of the intercom is switched on and the analog-to-digital converter ADC of the CODEC. The microphone can receive a sound signal input by a user and convert the sound signal into an analog signal, and the analog converter converts the analog signal into a digital signal and stores the digital signal.

And S104, when the fact that the PTT key is pressed down and the interphone is connected with the receiving equipment is detected, the collected voice is sent to the receiving equipment.

Optionally, in another embodiment of the present invention, the process of establishing connection between the intercom and the receiving device includes:

and generating and sending call request information to the receiving equipment, and receiving the connection confirmation information sent by the receiving equipment.

After the call request information is sent to the receiving equipment, the receiving equipment sends the connection confirmation information to the interphone through the air interface in the next time slot of the time slot in which the call request information is received. The air interface is an interface between a base station and a mobile terminal in a mobile communication network.

Wherein, the receiving device can be a base station or other interphones. The base station can be a digital mobile radio DMR, a digital trunked PDT, a terrestrial trunked radio TETRA base station, and the intercom includes a mobile station or a vehicle station.

In this embodiment, when the PTT button is not pressed, and the contact time between the finger of the user and the PTT button is greater than a preset value, the voice starts to be collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.

Optionally, in another embodiment of the present invention, after step S103, the method further includes:

s204, voice is saved;

the interphone is provided with a buffer area used for storing collected voice data, wherein the voice data of the buffer area is stored in a first-in first-out mode. I.e., voice data stored preferentially, is preferentially transmitted to the receiving apparatus.

The size of the buffer can be determined according to the usage situation, mainly considering the allowable voice delay, the statistical length of the lost information, and the size of the space supported by the memory, etc., in the worst case DMR 1: 4 power saving for example, 480ms pre-carrier is sent, and the PTT button software processing time is 100ms, approximately one space is opened to store 580ms of voice data, but this space is not fully used, some inactive voice re-storage is removed, and 580ms is the largest and most secure space opened.

When the storage space can store 580ms of voice data, if the received voice data exceeds 580ms, the voice data that exceeds the storage space sequentially overwrites the first stored voice data.

For example, assuming that 600ms of voice data is stored, since the storage space can only store 580ms of voice data, 580-600ms of data is overwritten to 580ms of data between 0-20 ms.

Since the storage unit generally uses byte as a unit, when storing voice data, it is necessary to see how much space 580ms of voice data needs to occupy, and sampling with 16 bytes at 8k sampling rate, then 100ms will occupy 800 × 2 bytes, and 580ms will occupy 800 × 2 × 5.8 bytes.

S205, splitting the voice into a plurality of active voices and a plurality of inactive voices;

optionally, in another embodiment of the present invention, step S205 includes:

and splitting the voice into a plurality of active voices and a plurality of inactive voices by adopting a voice activity detection algorithm.

Voice Activity Detection (VAD) is also called Voice endpoint Detection, Voice boundary Detection. The method aims to identify and eliminate a long-time mute period from a sound signal flow so as to achieve the effect of saving speech channel resources under the condition of not reducing service quality and be beneficial to reducing end-to-end time delay felt by a user.

The active speech includes speech of the user speaking, and the inactive speech is silent, that is, the inactive speech does not include speech of the user speaking.

Wherein active speech is not discardable and inactive speech is discardable.

S206, discarding a part of inactive voice with a preset proportion from each inactive voice to obtain a plurality of segment voices;

specifically, for the inactive speech, a preset proportion of the inactive speech is discarded, where the preset proportion may be 0.5, 0.8, or 1, and taking a segment of inactive speech of 10ms as an example, the preset proportion is 0.5, then 0.5 of the inactive speech is discarded, that is, the segment of inactive speech of 10ms is changed into a segment of 5 ms.

It should be noted that, experiments prove that, 120ms of inactive speech is stolen from 1s of speech, the remaining speech is spliced to form continuous speech and then transmitted, and a user hardly perceives a difference from the original speech.

S207, combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;

the voice generation time is a time when the user utters the voice. Now, a process of combining a plurality of active voices and a plurality of segment voices according to a voice generation time to obtain a user input voice will be described.

Suppose that a voice is split into 3 active voices and 1 inactive voice, and the voice generation time sequence of the 3 active voices and 1 inactive voice is active voice 1, inactive voice, active voice 2, and active voice 3.

And discarding partial inactive voice with a preset proportion from the inactive voice to obtain 1 segment voice, placing the active voice 1 in the first segment voice and the segment voice in the second segment voice in the process of combining the active voice and the segment voice, copying the active voice 2 and the active voice 3, and sequentially placing the active voice and the segment voice behind.

Accordingly, step S104 is modified to step S208:

the user input speech is transmitted to the receiving device.

Optionally, in another embodiment of the present invention, sending the user input voice to the receiving device includes:

and carrying out voice coding and channel coding on the voice input by the user to obtain coded voice, and sending the coded voice to the receiving equipment.

The speech coding, i.e. vocoder coding, is based on a digital model generated from a speech signal, and analyzes the digital speech to provide a set of feature parameters. The characteristic parameters mainly refer to excitation parameters representing glottic vibration and sound channel parameters representing sound channel characteristics, the parameters carry main information of the voice signals, the parameters only need a small number of bits when being coded, the parameters can be used for re-synthesizing the voice signals after being decoded, and the coding rate can be as low as 2.4kbit/s and below.

Channel coding is error correction and detection coding of digital signals to be transmitted in a channel. Specifically, during channel coding, forward error correction FEC check may be added to the digital signal converted by the analog-to-digital converter.

In this embodiment, each inactive voice is discarded by a predetermined proportion of inactive voices, so that the voice playing time of the called party can be reduced, and further the voice delay time can be reduced, i.e., the transmission delay can be reduced.

Optionally, in another embodiment of the present invention, referring to fig. 3, after acquiring the speech, the method further includes:

s304, carrying out voice coding on the voice to obtain a coded voice;

for the process of speech coding, please refer to the description in the above embodiments, which is not repeated herein.

S305, saving the coding processing voice;

taking a 2.4Kbps vocoder with FEC, the overall rate is 3.6Kbps, and the storage space occupied by 580ms of speech is 261 Bytes.

S306, based on the voice activity detection result in the voice coding process, splitting the coded voice into a plurality of active voices and a plurality of inactive voices;

in the process of encoding speech, it is possible to distinguish whether speech included in the encoded speech is active speech or inactive speech.

S307, discarding a part of inactive voices in a preset proportion from each inactive voice to obtain a plurality of fragment voices;

s308, combining the plurality of active voices and the plurality of segment voices according to voice generation time to obtain user input voice;

step S104 is changed to step S309:

the user input speech is transmitted to the receiving device.

carrying out channel coding on the voice input by the user to obtain channel coded voice;

the channel coded speech is transmitted to a receiving device.

In this embodiment, after speech coding is performed first, the coded speech is split into a plurality of active speech and a plurality of inactive speech, and the time for obtaining the plurality of active speech and the plurality of inactive speech by splitting in the above embodiment is different, and further, when the plurality of active speech and the plurality of inactive speech are obtained by splitting, different processing methods are provided.

Optionally, another embodiment of the present invention provides a device for shortening call setup time, which is applied to an intercom, and referring to fig. 4, the device for shortening call setup time includes:

the detection unit 101 is used for detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point between a finger of a user and the PTT key;

the judging unit 102 is configured to, when the detecting unit detects that a touch point exists on the PTT key, judge whether time for which the touch point exists is greater than a preset value;

the acquisition unit 103 is used for acquiring voice when the judgment unit 102 judges that the time for the touch point to exist is greater than a preset value;

and the sending unit 104 is configured to send the collected voice to the receiving device after detecting that the PTT key is pressed and the intercom is connected to the receiving device.

Optionally, in another embodiment of the present invention, when the sending unit sends the collected voice to the receiving device, the sending unit is specifically configured to:

carrying out voice coding and channel coding on voice input by a user to obtain coded voice;

the encoded speech is transmitted to a receiving device.

In this embodiment, when the PTT button is not pressed, and the contact time between the finger of the user and the PTT button is longer than a preset value, the voice starts to be collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.

It should be noted that, for the working process of each unit in this embodiment, please refer to the description in the above embodiments, which is not described herein again.

Optionally, in another embodiment of the present invention, referring to fig. 5, further including:

the storage unit 105 is used for storing the voice after the acquisition unit 103 acquires the voice;

a splitting unit 106, configured to split the voice into a plurality of active voices and a plurality of inactive voices;

a voice discarding unit 107, configured to discard a preset proportion of inactive voices of each inactive voice to obtain multiple segment voices;

a voice combining unit 108, configured to combine the multiple active voices and the multiple segment voices according to the voice generation time to obtain a user input voice;

correspondingly, when the sending unit 104 is configured to send the collected voice to the receiving device, it is specifically configured to: the user input speech is transmitted to the receiving device.

Optionally, in another embodiment of the present invention, the splitting unit 106 includes:

and the splitting subunit is used for splitting the voice into a plurality of active voices and a plurality of inactive voices by adopting a voice activity detection algorithm.

In this embodiment, each inactive voice is discarded by a predetermined proportion of inactive voices, so that the voice playing time of the called party can be reduced, and the voice delay time can be further reduced.

Optionally, in another embodiment of the present invention, the apparatus for shortening call setup time further includes:

the voice coding unit 109 is configured to perform voice coding on the voice after the acquisition unit 103 acquires the voice, so as to obtain a coded voice;

a speech saving unit 110 for saving the encoding-processed speech;

a voice splitting unit 111, configured to split the encoded voice into a plurality of active voices and a plurality of inactive voices based on a voice activity detection result in the voice encoding process;

a discarding unit 112, configured to discard a preset proportion of inactive voices of each inactive voice to obtain multiple segment voices;

a combining unit 113, configured to combine the multiple active voices and the multiple segment voices according to the voice generation time to obtain a user input voice;

correspondingly, when the sending unit 104 sends the collected voice to the receiving device, it is specifically configured to:

the user input speech is transmitted to the receiving device.

Optionally, in another embodiment of the present invention, the sending unit 104 includes:

the encoding unit is used for carrying out channel encoding on the voice input by the user to obtain channel encoded voice;

a voice transmitting unit for transmitting the channel-coded voice to the receiving apparatus.

Optionally, another embodiment of the present invention provides an intercom, including a memory and a processor;

wherein the memory is used for storing programs;

On the basis of the above embodiment, after the processor is configured to collect the speech, the processor is further configured to:

saving the voice;

transmitting the user input speech to the receiving device.

On the basis of the foregoing embodiment, when the processor is configured to split the speech into a plurality of active speeches and a plurality of inactive speeches, the processor is specifically configured to:

On the basis of the foregoing embodiment, when the processor is configured to send the user input speech to the receiving device, the processor is specifically configured to:

transmitting the encoded speech to the receiving device.

carrying out voice coding on the voice to obtain coded voice;

saving the encoded processed speech;

transmitting the user input speech to the receiving device.

transmitting the channel-coded speech to the receiving device.

In the embodiment, when the PTT key is not pressed down, and the contact time between the finger of the user and the PTT key is greater than the preset value, the voice starts to be collected, so that the loss of the voice input by the user before the prompt tone is played is avoided. The problem that the user starts speaking after pressing a PTT button generally, and the voice spoken by the user before the prompt tone is played cannot be collected, so that the voice input by the user before the prompt tone is played is lost, and the integrity of the conversation between the two parties is influenced is solved.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for shortening call setup time is applied to an interphone and comprises the following steps: detecting whether a touch point exists on a PTT key of the interphone; the touch point is a contact point of a finger of a user and the PTT key;

when the touch point on the PTT key is detected to exist, judging whether the time for the touch point to exist before the PTT key is pressed is greater than a preset numerical value or not;

when the fact that the PTT key is pressed down and the interphone is connected with a receiving device is detected, the collected voice is sent to the receiving device; the process for detecting that the interphone is connected with the receiving equipment comprises the following steps: generating and transmitting call request information to a receiving device; receiving the connection confirmation information sent by the receiving equipment in the next time slot of the time slot receiving the call request information;

the PTT button is used for establishing a call connection, and the call connection is used for sending voice.

2. The method of claim 1, wherein after the collecting the speech, further comprising:

saving the voice;

transmitting the user input speech to the receiving device.

3. The method of claim 2, wherein splitting the speech into a plurality of active speech and a plurality of inactive speech comprises:

4. The method of claim 2, wherein transmitting the user input speech to the receiving device comprises:

transmitting the encoded speech to the receiving device.

5. The method of claim 1, wherein after the collecting the speech, further comprising:

carrying out voice coding on the voice to obtain coded voice;

saving the encoded processed speech;

transmitting the user input speech to the receiving device.

6. The method of claim 5, wherein sending the user input speech to the receiving device comprises:

transmitting the channel-coded speech to the receiving device.

7. An interphone is characterized by comprising a memory and a processor;

wherein the memory is used for storing programs;

8. The intercom of claim 7, wherein the processor, after being configured to collect the voice, is further configured to:

saving the voice;

transmitting the user input speech to the receiving device.

9. The intercom of claim 8, wherein the processor is configured to, when splitting the voice into a plurality of active voices and a plurality of inactive voices, specifically:

10. The intercom of claim 8, wherein the processor, when configured to send the user input speech to the receiving device, is specifically configured to:

transmitting the encoded speech to the receiving device.

11. The intercom of claim 7, wherein the processor, after being configured to collect the voice, is further configured to:

carrying out voice coding on the voice to obtain coded voice;

saving the encoded processed speech;

transmitting the user input speech to the receiving device.

12. The intercom of claim 11, wherein the processor, when configured to send the user input speech to the receiving device, is specifically configured to:

transmitting the channel-coded speech to the receiving device.