CN102523329A

CN102523329A - Recording method based on voice communication, recording system and communication terminals

Info

Publication number: CN102523329A
Application number: CN2011103421070A
Authority: CN
Inventors: 徐晶明; 林福辉; 李昙; 韩大晗; 吴晟; 张本好
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2011-11-02
Filing date: 2011-11-02
Publication date: 2012-06-27
Anticipated expiration: 2031-11-02
Also published as: CN102523329B

Abstract

The invention discloses a recording method based on voice communication, a recording system and communication terminals, wherein the recording method based on the voice communication comprises the following steps of: acquiring a voice data section in the conversation process among the communication terminals; carrying out voice detection on the voice data section so as to judge whether voice information is included in the voice data section; and forming recording data based on the voice data section including the voice information. According to the technical scheme, the voice encoding process in the recording system based on the voice communication is reduced, so that the computational complexity of the recording process is reduced.

Description

Based on the way of recording of voice communication and recording system, communication terminal

Technical field

The present invention relates to the voice communication technical field, particularly based on the way of recording of voice communication and recording system, communication terminal.

Background technology

Recording system based on voice communication is a kind of equipment that can carry out multiplex telephony real-time recording and speech play simultaneously, is computer technology and the combining of voice technology.Owing to adopted advanced digital recording technology, be equipped with powerful, reliable software, and by big capacity computer hard disk as storage medium, broken through traditional telephonograph notion fully.Can realize writing down automatically calling number and called number through this recording system, simultaneously to recording of multi-path voice passage or monitoring, automated back-up, and the inquiry mode of recording flexibly.Simultaneously, this recording system can be safeguarded hard disk by customer demand automatically, thus uninterrupted, the steady operation of the system of assurance.

With reference to shown in Figure 1 be the structural representation of existing recording system based on voice communication.As shown in Figure 1, said recording system 10 comprises Voice decoder 101, voice mixing device 102 and speech coder 103.According to recording system shown in Figure 1 10, the way of recording that adopts usually is: (promptly encoding) the downlink voice data of at first will not decoding are decoded through said Voice decoder 101, generate decoded downlink voice data; Then, with decoded downlink voice data and uncoded upstream voice data through said voice mixing device 102 audio mixings after, encode through said speech coder 103 again, to generate the recording voice data file of coding.

The defective of the existing way of recording is outside upstream voice data is encoded, to need to introduce a new speech coding flow process (speech coder 103).Consider that the particularly high-quality speech coding of speech coding (for example AMR-NB coding and AMR-WB coding) has higher computation complexity; This way of recording brings computational resource and power consumption pressure can for the moving speech communication equipment of voice communication system, particularly low side.

More the way of recording and the recording system about voice communication can be the U.S. Patent application file of US2006173563A1 with reference to publication number: recording communication system and method (Sound recordingcommunication system and method) do not address the above problem equally.

Summary of the invention

The problem that the present invention solves is to reduce speech in the recording system of voice communication, thereby reduces the computation complexity of recording flow process.

For addressing the above problem, the embodiment of the invention provides a kind of way of recording based on voice communication, comprising: obtain from the speech data section in the communication process between the communication terminal; Said speech data section is carried out speech detection, to judge in the said speech data section whether comprise voice messaging; Form the recording data based on the speech data section that comprises voice messaging.

Alternatively, said speech data section comprises at least one frame speech data.

Alternatively, said speech data section comprises upstream voice data section and downlink voice data segment.

Alternatively, saidly said speech data section is carried out speech detection comprise: upstream voice data section that encoded or uncoded is carried out speech detection; Downlink voice data segment decoded or not decoding is carried out speech detection.

Alternatively, saidly form the recording data based on the speech data section that comprises voice messaging and comprise:

If have only the downlink voice data segment to comprise voice messaging, this downlink voice data segment of then will not decode is as the recording data;

If have only the upstream voice data section to comprise voice messaging, this upstream voice data section that then will encode is as the recording data;

If the upstream voice data section all comprises voice messaging with corresponding downlink voice data segment, then will decoded this downlink voice data segment and uncoded this upstream voice data section carry out result after voice mixing and the speech coding as the data of recording.

Alternatively, also comprise: if upstream voice data section and downlink voice data segment do not comprise voice messaging, this downlink voice data segment of this upstream voice data section that then will encode or not decoding is as the recording data; Quiet data perhaps is set as the recording data.

Alternatively, said communication terminal comprises portable terminal and/or fixed phone terminal.

The embodiment of the invention also provides a kind of recording system, and said recording system is suitable for speech data is recorded, and comprising: acquiring unit is used for obtaining the speech data section from communication process between the communication terminal; Detecting unit is used for the said speech data section that said acquiring unit gets access to is carried out speech detection, to judge in the said speech data section whether comprise voice messaging; Processing unit, be used for according to said detection to the speech data section that comprises voice messaging form the recording data.

Alternatively, said detecting unit comprises first detecting unit and second detecting unit; Wherein, said first detecting unit is used for upstream voice data section that encoded or uncoded is carried out speech detection; Said second detecting unit is used for downlink voice data segment decoded or not decoding is carried out speech detection.

Alternatively, said processing unit is used for:

Alternatively, said processing unit also is used for: if upstream voice data section and downlink voice data segment do not comprise voice messaging, this downlink voice data segment of this upstream voice data section that then will encode or not decoding is as the recording data; Quiet data perhaps is set as the recording data.

The embodiment of the invention also provides a kind of communication terminal that comprises above-mentioned recording system.

Compared with prior art, technical scheme of the present invention has following beneficial effect:

The speech data section from the communication process between the communication terminal that gets access to is carried out speech detection judge whether to comprise voice messaging, form the recording data based on the speech data section that comprises voice messaging then.So just need all not carry out voice mixing, form the recording data after the process speech coding again, thereby reduce the speech in the Recording Process, reduce the computation complexity of recording system all speech datas that get access to.

In the specific embodiment, respectively corresponding upstream voice data section and downlink voice data segment are carried out speech detection in the section at one time.To the different voice coded format; The speech data section of some coding can comprise the required information of speech detection, obtains the lang sound detection of going forward side by side of the required information of speech detection this moment in the downlink voice data segment to the upstream voice data section of having encoded and not decoding; The speech data section of some coding does not comprise the required information of speech detection, then need from uncoded upstream voice data section and decoded downlink voice data segment, obtain the required information of the speech detection lang sound of going forward side by side this moment and detect.According to testing result, be divided into following situation: 1) if having only the downlink voice data segment to comprise voice messaging, this downlink voice data segment of then will not decode is as the recording data; 2) if having only the upstream voice data section to comprise voice messaging, this upstream voice data section that then will encode is as the recording data; 3) if the upstream voice data section all comprises voice messaging with corresponding downlink voice data segment, then will decoded this downlink voice data segment and uncoded this upstream voice data section carry out result after voice mixing and the speech coding as the data of recording; 4) if upstream voice data section and downlink voice data segment do not comprise voice messaging, this downlink voice data segment of this upstream voice data section that then will encode or not decoding is as the recording data; Quiet data perhaps is set as the recording data.The present technique scheme forms the recording data based on the speech data section that comprises voice messaging; Utilize upstream voice data section of having encoded and the downlink voice data segment of not decoding to form the recording data simultaneously as much as possible; Thereby reduced the cataloged procedure of recording system, reduced the computation complexity of recording system.

Description of drawings

Fig. 1 is the structural representation based on the recording system of voice communication of prior art;

Fig. 2 is the schematic flow sheet of the embodiment of a kind of way of recording based on voice communication of the present invention;

Fig. 3 is the schematic flow sheet of the specific embodiment of a kind of way of recording based on voice communication of the present invention;

Fig. 4 is the structural representation of the specific embodiment of a kind of recording system of the present invention.

Embodiment

The inventor finds in existing recording system based on voice communication, outside upstream voice data is encoded, needs to introduce a new speech coding flow process.Consider that the particularly high-quality speech coding of speech coding (for example AMR-NB coding and AMR-WB coding) has higher computation complexity; This way of recording brings computational resource and power consumption pressure can for the moving speech communication equipment of voice communication system, particularly low side.

To the problems referred to above; The inventor is through research; A kind of way of recording and recording system based on voice communication is provided; Wherein said recording system carries out speech detection to the speech data section from the communication process between the communication terminal that gets access to and judges whether to comprise voice messaging, forms the recording data based on the speech data section that comprises voice messaging then.So just need all not carry out voice mixing, form the recording data after the process speech coding again, thereby reduce the speech in the Recording Process, reduce the computation complexity of recording system all speech datas that get access to.

For make above-mentioned purpose of the present invention, feature and advantage can be more obviously understandable, does detailed explanation below in conjunction with the accompanying drawing specific embodiments of the invention.

With reference to figure 2 are schematic flow sheets of the embodiment of a kind of way of recording based on voice communication of the present invention.As shown in Figure 2, the said way of recording based on voice communication comprises:

Step S1: obtain from the speech data section in the communication process between the communication terminal.

Particularly, said speech data section is meant in communication process the speech data that transmits between the communication terminal in the section sometime.In the present embodiment, speech data transmits with the frame structure form, and correspondingly, the mode of obtaining said speech data section can be to be the unit interval to obtain speech data with each frame, comprises a frame speech data in the said like this speech data section; Also can be to be the unit interval to obtain speech data, comprise the multiframe speech data in the said like this speech data section with continuous multiple frames (more than 2 or 2).But in practical application, be not limited to the above-mentioned mode of obtaining the speech data section.

Further, said speech data section comprises upstream voice data section and downlink voice data segment.Wherein said upstream voice data section is meant the speech data section that from the speech data that communication terminal sends, gets access to, and said downlink voice data segment is meant the speech data section that from the speech data that communication terminal receives, gets access to.In the present embodiment, said upstream voice data and downlink voice The data are with a kind of voice coding/decoding algorithms or standard, and correspondingly, recording system also adopts the algorithm or the standard that are complementary with it to record.

In the present embodiment, said communication terminal comprises portable terminal and/or fixed phone terminal, and said portable terminal can be a mobile phone, and said fixed phone terminal is landline telephone (base), and in practical application, said communication terminal can also be other equipment with communication function.

The described acquisition process of this step can be to obtain at least one frame corresponding upstream voice data and downlink voice data, for example, and the first frame upstream voice data and the first frame downlink voice data; Also can be to obtain upstream voice data and the downlink voice data in this predetermined amount of time respectively in the section at the fixed time, as upstream voice data section that gets access to and downlink voice data segment.Specifically, for example, obtained upstream voice data and downlink voice data respectively in during this period of time, form upstream voice data section and downlink voice data segment in these 10 seconds at the 1st second to the 10th second.

Step S2: said speech data section is carried out speech detection, to judge in the said speech data section whether comprise voice messaging.

Particularly, in this step, saidly said speech data section is carried out speech detection comprise: upstream voice data section that encoded or uncoded is carried out speech detection; Downlink voice data segment decoded or not decoding is carried out speech detection.Wherein said upstream voice data section and downlink voice data segment be at one time the section in corresponding upstream voice data and downlink voice data.It will be apparent to those skilled in the art that; In recording system, for different voice coded format (for example AMR-NB coding, AMR-WB coding etc.), the speech data section of some coding can include flag; Said flag is used for indicating this speech data section to contain the required information of speech detection; So, in this case, can from the downlink voice data segment of the upstream voice data section of having encoded and not decoding, obtain the required information of the speech detection lang sound of going forward side by side and detect; And the speech data Duan Ze of some coding does not comprise said flag, so, in this case, then need from uncoded upstream voice data section and decoded downlink voice data segment, obtain the required information of the speech detection lang sound of going forward side by side and detect.

In this step; Said speech detection method is after the speech data section that gets access to is eliminated noise; The voice that get access to through analysis; Processes such as the mathematical feature of detection information needed or parameter are categorized as speech data section that comprises voice messaging and the speech data section that does not comprise voice messaging with said speech data section.Wherein, mathematical feature or parameter commonly used comprise the peak value and the gradient (Spectrum Peak andSlope Analysis) on the frequency domain, coefficient correlation (Correlation Coefficients), signal to noise ratio (Signal and NoiseRatio) etc.For example, when the peak value on the frequency domain and the gradient during, think that then said speech data section is the speech data section that comprises voice messaging greater than setting threshold; Otherwise said speech data Duan Zewei does not comprise the speech data section of voice messaging.Need to prove that above-mentioned speech detection method all is suitable for said upstream voice data section and downlink voice data segment, and testing process is also identical, describes no longer respectively at this.But in practical application, said speech detection method is not limited to the foregoing description, and those skilled in the art can also utilize other feasible speech detection method that the speech data section is carried out speech detection, do not give unnecessary details at this.

Step S3: form the recording data based on the speech data section that comprises voice messaging.

Particularly, based on the testing result of said step S2, this step comprises following several kinds of concrete conditions; In conjunction with schematic flow sheet with reference to the specific embodiment of the way of recording based on voice communication shown in Figure 3; As shown in Figure 3, respectively said upstream voice data section and downlink voice data segment are carried out speech detection (step S21 and S22 are equivalent to the speech detection process of step S2); According to the result of speech detection judgement (the step S23 that records; Be equivalent to whether comprise voice messaging in the said speech data section of judging of step S2), said recording judgement comprises following several kinds of situation, then carries out corresponding step:

Situation (1): at one time in the section, in corresponding upstream voice data section and the downlink voice data segment, contain voice messaging if having only in the said downlink voice data, execution in step S31 then, with the downlink voice data segment of not decoding as the data of recording.Need to prove; When no matter above-mentioned steps S2 detects be the said not downlink voice data segment of decoding is detected or said decoded downlink voice data segment detected; Usually in recording system, all can preserve the said not downlink voice data of decoding, the downlink voice data segment of therefore only need directly will not decode here will duplicate as the recording data in this time period and be added in the recording file.

Situation (2): at one time in the section, in corresponding upstream voice data section and the downlink voice data segment, contain voice messaging if having only in the said upstream voice data section, execution in step S32 then, with the upstream voice data section of having encoded as the data of recording.Need to prove; Similar ground; When no matter above-mentioned steps S2 detects be said upstream voice data section of having encoded is detected or said uncoded upstream voice data section carried out speech detection; Usually in recording system, all can preserve said upstream voice data of having encoded, the upstream voice data section that therefore only need directly will encode here will be duplicated as the recording data in this time period and will be added in the recording file.

Situation (3): at one time in the section; In corresponding the upstream voice data section and downlink voice data segment; If when all containing voice messaging in said upstream voice data section and the downlink voice data segment, then execution in step S331 voice mixing and step S332 speech coding.Need to prove; Similar ground; No matter above-mentioned steps S2 detects decoded said downlink voice data segment and uncoded said upstream voice data section; Still be that the said downlink voice data segment of not decoding and the upstream voice data section of having encoded are detected; Usually in recording system, all can preserve said decoded said downlink voice data segment and uncoded said upstream voice data section; Therefore also only need here will decoded this downlink voice data segment and uncoded this upstream voice data section carry out result after voice mixing and the speech coding as the data of recording, and these recording data will be duplicated as the recording data in this time period and will be added in the recording file.

Situation (4): at one time in the section; In corresponding the upstream voice data section and downlink voice data segment; If when not containing voice messaging in said upstream voice data section and the downlink voice data segment; Execution in step S34 then, select arbitrarily a kind of speech data section as the recording data or with quiet data as the recording data.Need to prove; Similar ground; No matter above-mentioned steps S2 detects decoded said downlink voice data segment and uncoded said upstream voice data section; Still said upstream voice data section of having encoded and the downlink voice data segment of not decoding are detected; Usually the downlink voice data segment that in recording system, all can preserve said upstream voice data section of having encoded and not decode, therefore also only wherein a kind of speech data section of selection is as the recording data arbitrarily here, and same these recording data will be duplicated as the recording data in this time period and will be added in the recording file.

When why in said upstream voice data section and downlink voice data segment, not containing voice messaging here; Need be because formed recording data will be put into recording file as the recording data still with a kind of speech data section wherein; In order to guarantee the continuity and the integrality of recording data in the recording file, the recording data are arranged all in each time period promptly.Need to prove that when the form of said recording file is supported when quiet, also can the recording data in the time period be set to quiet data, the said process that quiet data is set is a technology as well known to those skilled in the art, does not give unnecessary details at this.

Need to prove; In the present embodiment; The recording data of each time period that needs to form based on said step S3 are in chronological sequence put into recording file after the sequence arrangement; And usually because the speech coding form of recording file or the record that standard is only supported one road speech data section promptly do not support a plurality of speech data sections (like uplink and downlink) to record simultaneously in the section at one time.Therefore; In the present embodiment; To above-mentioned situation (1), situation (2) and situation (4); Because a kind of speech data section in the downlink voice data segment of the upstream voice data section that only need will encode in the section at one time and not decoding is duplicated as the recording data and is added in the recording file, therefore is suitable for the speech coding form or the standard of recording file.And to above-mentioned situation (3); When all containing voice messaging in said upstream voice data section and the downlink voice data segment; Because section needs two speech data sections of uplink and downlink at one time; Be inappropriate for the speech coding form or the standard of recording file like this; Therefore will said decoded downlink voice data segment and said uncoded upstream voice data section be carried out voice mixing through mixer in the present embodiment, and then form the recording data carry out speech coding through the speech data section behind the audio mixing.

The embodiment of the invention also provides a kind of recording system, and said recording system is suitable for speech data is recorded.As shown in Figure 4 is the structural representation of the specific embodiment of a kind of recording system of the present invention.

With reference to figure 4, said recording system 20 comprises acquiring unit 201, detecting unit 202 and processing unit 203.Wherein, said detecting unit 202 also comprises first detecting unit 2021 and second detecting unit 2022, is suitable for detecting upstream voice data section and downlink voice data segment respectively.

Particularly, said acquiring unit 201 is used for obtaining the speech data section from communication process between the communication terminal.Said speech data section is meant in communication process the speech data that transmits between the communication terminal in the section sometime.In the present embodiment, speech data transmits with the frame structure form, and said acquiring unit 201 can be to be the unit interval to obtain speech data with each frame, comprises a frame speech data in the said like this speech data section; Also can be to be the unit interval to obtain speech data, comprise the multiframe speech data in the said like this speech data section with continuous multiple frames (more than 2 or 2).But in practical application, be not limited to the above-mentioned mode of obtaining the speech data section.

Said detecting unit 202 is used for the said speech data section that said acquiring unit 201 gets access to is carried out speech detection, to judge in the said speech data section whether comprise voice messaging.

In specific embodiment, said detecting unit 202 comprises: first detecting unit 2021 is used for upstream voice data section that encoded or uncoded is carried out speech detection; And second detecting unit 2022, be used for said downlink voice data decoded or not decoding are carried out speech detection.Wherein said upstream voice data section and downlink voice data segment be at one time the section in corresponding upstream voice data and downlink voice data.

Said processing unit 203 is used for detecting the speech data section that comprises voice messaging according to said detecting unit 202 and forms the recording data.

In specific embodiment; Said processing unit 203 is according to said detecting unit 202 different detection results; Comprise following four kinds of processing procedures: 1) if having only the downlink voice data segment to comprise voice messaging, this downlink voice data segment of then will not decode is as the recording data; 2) if having only the upstream voice data section to comprise voice messaging, this upstream voice data section that then will encode is as the recording data; 3) if the upstream voice data section all comprises voice messaging with corresponding downlink voice data segment, then will decoded this downlink voice data segment and uncoded this upstream voice data section carry out result after voice mixing and the speech coding as the data of recording; 4) if upstream voice data section and downlink voice data segment do not comprise voice messaging, this downlink voice data segment of this upstream voice data section that then will encode or not decoding is as the recording data; Quiet data perhaps is set as the recording data.The specific embodiment of above-mentioned four kinds of processing procedures can repeat no more at this with reference to the description of abovementioned steps S3.

The embodiment of the invention also provides a kind of communication terminal; Said communication terminal comprises the described recording system 20 like Fig. 4; Said recording system 20 is arranged on the inside of said communication terminal usually, and when said communication terminal and other communication terminals carry out in the communication process, said recording system 20 can obtain the speech data section in the communication process; And form the recording data according to the way of recording of present technique scheme; Wherein, the speech data that the upstream voice data section is sent for this communication terminal, the speech data that the downlink voice data segment receives from other communication terminals for this communication terminal.

To sum up, the present technique scheme is carried out speech detection to the speech data section from the communication process between the communication terminal that gets access to and is judged whether to comprise voice messaging, forms the recording data based on the speech data section that comprises voice messaging then.So just need all not carry out voice mixing, form the recording data after the process speech coding again, thereby reduce the speech in the Recording Process, reduce the computation complexity of recording system all speech datas that get access to.

Though the present invention with preferred embodiment openly as above; But it is not to be used for limiting the present invention; Any those skilled in the art are not breaking away from the spirit and scope of the present invention; Can utilize the method and the technology contents of above-mentioned announcement that technical scheme of the present invention is made possible change and modification, therefore, every content that does not break away from technical scheme of the present invention; To any simple modification, equivalent variations and modification that above embodiment did, all belong to the protection range of technical scheme of the present invention according to technical spirit of the present invention.

Claims

1. the way of recording based on voice communication is characterized in that, comprising:

Obtain from the speech data section in the communication process between the communication terminal;

Said speech data section is carried out speech detection, to judge in the said speech data section whether comprise voice messaging;

Form the recording data based on the speech data section that comprises voice messaging.

2. the way of recording based on voice communication according to claim 1 is characterized in that, said speech data section comprises at least one frame speech data.

3. the way of recording based on voice communication according to claim 1 is characterized in that, said speech data section comprises upstream voice data section and downlink voice data segment.

4. the way of recording based on voice communication according to claim 3 is characterized in that, saidly said speech data section is carried out speech detection comprises: upstream voice data section that encoded or uncoded is carried out speech detection; Downlink voice data segment decoded or not decoding is carried out speech detection.

5. the way of recording based on voice communication according to claim 4 is characterized in that, saidly forms the recording data based on the speech data section that comprises voice messaging and comprises:

6. the way of recording based on voice communication according to claim 5 is characterized in that, also comprises:

If upstream voice data section and downlink voice data segment do not comprise voice messaging, this downlink voice data segment of this upstream voice data section that then will encode or not decoding is as the recording data; Quiet data perhaps is set as the recording data.

7. the way of recording based on voice communication according to claim 1 is characterized in that said communication terminal comprises portable terminal and/or fixed phone terminal.

8. recording system, said recording system is suitable for speech data is recorded, and it is characterized in that, comprising:

Acquiring unit is used for obtaining the speech data section from communication process between the communication terminal;

Detecting unit is used for the said speech data section that said acquiring unit gets access to is carried out speech detection, to judge in the said speech data section whether comprise voice messaging;

Processing unit, be used for according to said detection to the speech data section that comprises voice messaging form the recording data.

9. recording system according to claim 8 is characterized in that, said speech data section comprises at least one frame speech data.

10. recording system according to claim 8 is characterized in that, said speech data section comprises upstream voice data section and downlink voice data segment.

11. recording system according to claim 10 is characterized in that, said detecting unit comprises first detecting unit and second detecting unit; Wherein, said first detecting unit is used for upstream voice data section that encoded or uncoded is carried out speech detection; Said second detecting unit is used for downlink voice data segment decoded or not decoding is carried out speech detection.

12. recording system according to claim 11 is characterized in that, said processing unit is used for:

13. recording system according to claim 12 is characterized in that, said processing unit also is used for:

14. a communication terminal is characterized in that, comprises each described recording system of claim 8 to 13.

15. communication terminal according to claim 14 is characterized in that, said communication terminal comprises portable terminal and/or fixed phone terminal.