CN110827838A

CN110827838A - Opus-based voice coding method and apparatus

Info

Publication number: CN110827838A
Application number: CN201910984964.7A
Authority: CN
Inventors: 梁波
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2019-10-16
Filing date: 2019-10-16
Publication date: 2020-02-21

Abstract

The invention provides an opus-based speech coding method, which comprises the following steps: step S1: acquiring a current speech frame to be coded in a preset speech section; step S2: based on the opus coding technology, coding the obtained current speech frame to be coded to obtain a coded speech frame; step S3: the byte size of the obtained encoded speech frame is expressed and determined based on a preset byte header. The method is used for representing the size of the coded voice frame based on the preset byte header by adopting the opus coding technology, so that the proportion of header information can be effectively reduced, and the byte size after voice coding can be effectively reduced.

Description

Opus-based voice coding method and apparatus

Technical Field

The present invention relates to the field of speech coding technology, and in particular, to a speech coding method and apparatus based on opus.

Background

In the currently used speech coding technology, each frame of speech is generally encoded to have 8 bytes of header information, and a section of speech is integrally encoded to be composed of a plurality of header information + encoded speech data, but if 8 bytes of header information are used, the proportion occupied by the header information in the bytes after speech coding is large, for example, the sampling rate is 16k, a frame of 20ms speech, the size after 8 times compression coding is about 80 bytes, wherein the header information occupies about 10% of the total size; the sampling rate is 8k, the size of a frame of 20ms speech after 8 times of compression coding is about 40 bytes, the header information occupies about 16% of the total size, and the header information occupies a larger proportion, so that it is important to reduce the percentage of the header information to reduce the size of the byte after speech coding.

Disclosure of Invention

The invention provides an opus-based speech coding method, which is used for representing the size of a coded speech frame based on a preset byte header by adopting opus coding technology, effectively reducing the proportion of header information and further effectively reducing the byte size after speech coding.

The embodiment of the invention provides an opus-based speech coding method, which comprises the following steps:

step S1: acquiring a current speech frame to be coded in a preset speech section;

step S2: based on the opus coding technology, coding the obtained current speech frame to be coded to obtain a coded speech frame;

step S3: the byte size of the obtained encoded speech frame is represented and determined based on a preset byte header.

In a possible implementation manner, before performing step S1, the method further includes:

step S11: acquiring a preset voice section input by a user;

step S12: and according to the preset time length, carrying out segmentation processing on the acquired preset voice section input by the user, and acquiring a plurality of voice frames to be coded.

In a possible implementation manner, after performing step S3, the method further includes:

step S31: acquiring a next speech frame to be coded of the current speech frame to be coded;

step S32: controlling the obtained next speech frame to be encoded to execute steps S2-S3;

step S33: and based on the preset arrangement sequence of the speech frames to be coded, continuing to execute the steps S31-S32 until all the speech frames to be coded in the preset speech segment are completely executed.

step S41: determining the proportional size of the preset byte header in the obtained byte size of the encoded voice frame;

step S42: judging whether the determined proportion size is smaller than a preset proportion size;

if yes, executing a first alarm operation;

otherwise, determining the byte size of the preset byte header, judging whether the determined byte size is smaller than the preset byte size, and if so, executing a second alarm operation;

otherwise, executing a third alarm operation.

In a possible implementation manner, after the step S11 is executed and before the step S12 is executed, the method further includes:

step S111: judging whether blank voice exists in the acquired preset voice section input by the user, and if so, sending a discarding instruction;

step S112: based on a voice position database, determining the position information of the blank voice in the preset voice section according to the sent discarding instruction;

step S113: based on the position information determined in step S112, deleting the corresponding blank speech, and recombining into a new preset speech segment.

The embodiment of the invention provides an opus-based speech coding device, which comprises:

the first acquisition module is used for acquiring a current speech frame to be coded in a preset speech segment;

the encoding module is used for encoding the current speech frame to be encoded acquired by the first acquisition module based on the opus encoding technology to acquire an encoded speech frame;

a first determining module, configured to indicate and determine a byte size of the encoded speech frame obtained by the encoding module based on a preset byte header.

In one possible implementation manner, the method further includes:

the second acquisition module is used for acquiring a preset voice segment input by a user before the first acquisition module acquires the current voice frame to be coded;

and the segmentation module is used for segmenting the preset voice segment input by the user and acquired by the second acquisition module according to the preset time length and acquiring a plurality of voice frames to be coded.

In one possible implementation manner, the method further includes:

a third obtaining module, configured to obtain a next speech frame to be encoded of the current speech frame to be encoded;

the first control module is used for controlling the next speech frame to be coded, which is acquired by the third acquisition module, to execute corresponding subsequent operations;

and controlling the rest speech frames to be coded in the preset speech segment based on the preset arrangement sequence of the speech frames to be coded, and executing corresponding subsequent operations.

In one possible implementation manner, the method further includes:

a second determining module, configured to determine a proportional size of the preset byte header in the obtained byte size of the encoded speech frame;

the second control module is used for judging whether the proportion size determined by the second determination module is smaller than a preset proportion size or not;

if so, controlling an alarm module to execute a first alarm operation;

otherwise, determining the byte size of the preset byte header, judging whether the determined byte size is smaller than the preset byte size, and if so, controlling an alarm module to execute a second alarm operation;

otherwise, controlling the alarm module to execute a third alarm operation.

In one possible implementation manner, the method further includes:

the judging module is used for judging whether blank voice exists in the preset voice section input by the user and acquired by the second acquiring module, and if yes, a discarding instruction is sent out;

the determining module is used for determining the position information of the blank voice in the preset voice section based on a voice position database and according to the discarding instruction sent by the judging module;

and the recombination module is used for deleting the corresponding blank voice according to the position information of the blank voice determined by the determination module in the preset voice section and recombining the blank voice into a new preset voice section.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method for opus-based speech coding according to an embodiment of the present invention;

FIG. 2 is a block diagram of an opus-based speech coder according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

An embodiment of the present invention provides an opus-based speech coding method, as shown in fig. 1, including:

The opus is a sound coding format, and the opus coding technology adopted here is used for compressing and coding a speech frame to be coded;

the current speech frame to be encoded is a certain frame in the preset speech segment, and the frame can be obtained by taking 20ms as a frame;

the preset byte header is 2 bytes in size, which has the advantage of effectively reducing the ratio of the size of the speech frame after the byte header is encoded, for example:

the maximum sampling rate supported by opus is 48k, the maximum frame size is 60ms, and the corresponding frame voice size is: sampling rate/1000 x 2 frame size, i.e. the size of a frame of speech is at most 5760 bytes, at this time, based on a preset byte header, for example, 2 bytes size can be expressed, and the byte size of the obtained encoded speech frame is determined;

and, for example: in the prior art, it is assumed that a sampling rate is 16k, a frame of 20ms speech has a size after 8 times of compression coding of 80 bytes, wherein header information accounts for 10% of the total size of the speech frame after coding; the sampling rate is 8k, one frame of 20ms speech, the 8 times compressed and encoded size is 40 bytes, wherein the header information accounts for 16% of the total size of the encoded speech frame, compared with the previous frame, the size of each frame of speech can be reduced by 6 bytes, then, the sampling rate is 16k, one frame of 20ms speech, the 8 times compressed and encoded size is 80, wherein the preset byte header accounts for 2-3%, and the preset byte header accounts for 6-7% of the header information in the prior art;

similarly, the sampling rate is 8k, the size of a frame of 20ms speech after 8 times of compression coding is 40 bytes, wherein the ratio of the preset bytes is 4-6%, and the ratio of the preset byte header is reduced by 10-12% compared with the ratio of the header information in the prior art; the higher the compression factor thereof, the larger the reduction ratio of the preset byte header compared with the ratio of the header information of the prior art.

The beneficial effects of the above technical scheme are: the method is used for representing the size of the coded voice frame based on the preset byte header by adopting the opus coding technology, so that the proportion of header information can be effectively reduced, and the byte size after voice coding can be effectively reduced.

The embodiment of the invention provides an opus-based speech coding method, which further comprises the following steps before the step S1 is executed:

step S11: acquiring a preset voice section input by a user;

The preset voice segment input by the user can be a segment of audio information;

the predetermined time length may be a frame length less than or equal to 60ms, such as: 20ms because the frame size supported by opus is 60ms maximum.

The beneficial effects of the above technical scheme are: the preset voice segment is segmented, so that a plurality of voice frames to be coded can be conveniently obtained, and convenience is brought to the coding processing of the subsequent voice frames to be coded.

The embodiment of the present invention provides an opus-based speech encoding method, which further includes, after performing step S3:

The preset arrangement order of the preset speech frames to be encoded may be, for example, that frames are arranged in a time sequence.

The beneficial effects of the above technical scheme are: according to the preset mine removal sequence, all the segmented frames are conveniently processed, and omission and information loss are avoided.

if yes, executing a first alarm operation;

otherwise, executing a third alarm operation.

For example: the sampling rate is 16k, the size of a frame of 20ms voice after 8 times of compression coding is 80 bytes, wherein the size of header information is 8 bytes, and the header information accounts for 10 percent of the total size of the coded voice frame;

or the sampling rate is 16k, the size of a frame of 20ms voice after 8 times of compression coding is 80 bytes, wherein the preset byte head is 2 bytes in size, and the total occupation ratio of the preset byte head is 2-3%;

wherein, a frame of 20ms speech is a speech frame to be coded, and the preset proportion is 10% of the total size of the header information; the proportion is 2-3% of the total proportion of the preset byte heads;

the determined byte size is the size of a preset byte header;

the preset byte size may be 8 bytes of header information;

the first alarm operation may be an alarm indicating that the preset byte head is qualified;

the second alarm operation may be an alarm indicating that the byte size of the preset byte header is qualified;

the third alarm operation may be an alarm indicating that the byte size of the preset byte header is not acceptable.

The beneficial effects of the above technical scheme are: by judging the byte size of the preset byte header, the error of representation caused by misoperation can be avoided, and the qualified preset byte header can be effectively ensured to be represented.

The embodiment of the present invention provides an opus-based speech encoding method, which, after performing step S11 and before performing step S12, further includes:

The discard instruction may be an instruction of blank speech, and may include: the time of starting and the time of ending in the preset voice section of the blank voice;

the new preset voice segment is formed by the recombination, and blank voice is not included.

The beneficial effects of the above technical scheme are: by deleting the blank speech, the encoding compression work of the blank speech can be reduced, the efficiency of encoding compression of the reconstructed preset speech segment is improved, and meanwhile, the storage space of the speech segment after encoding compression can be saved.

An embodiment of the present invention provides an opus-based speech encoding apparatus, as shown in fig. 2, including:

a first determining module, configured to determine, based on a preset byte header, a byte size of the encoded speech frame obtained by the encoding module.

The embodiment of the invention provides an opus-based speech coding device, which further comprises:

if so, controlling an alarm module to execute a first alarm operation;

otherwise, controlling the alarm module to execute a third alarm operation.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An opus-based speech coding method, comprising:

2. The opus-based speech coding method of claim 1, further comprising, before performing step S1:

step S11: acquiring a preset voice section input by a user;

3. The opus-based speech coding method of claim 2, further comprising, after performing step S3:

4. The opus-based speech coding method of claim 1, further comprising, after performing step S3:

if yes, executing a first alarm operation;

otherwise, executing a third alarm operation.

5. The opus-based speech coding method of claim 2, wherein after the step S11 is performed and before the step S12 is performed, further comprising:

6. An opus-based speech coder, comprising:

7. The opus-based speech coder of claim 6, further comprising:

8. The opus-based speech coder of claim 7, further comprising:

9. The opus-based speech coder of claim 6, further comprising:

if so, controlling an alarm module to execute a first alarm operation;

otherwise, controlling the alarm module to execute a third alarm operation.

10. The opus-based speech coder of claim 7, further comprising: