CN112511407A

CN112511407A - Self-adaptive voice playing method and system

Info

Publication number: CN112511407A
Application number: CN202011196492.8A
Authority: CN
Inventors: 刘祥国; 张营; 杜慧珺; 李文敬; 雷现惠; 彭佳; 杨坤; 周佳; 淳于岳松
Original assignee: State Grid Corp of China SGCC; TaiAn Power Supply Co of State Grid Shandong Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; TaiAn Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-03-16
Anticipated expiration: 2040-10-30
Also published as: CN112511407B

Abstract

The invention provides a self-adaptive voice playing method, which overlaps voice time lengths of voice data which belong to the same user ID and have a time interval between two adjacent voice data not larger than a preset interval threshold value, and then uses double-speed playing when the overlapped voice time lengths are larger than the preset time threshold value. The invention also provides a self-adaptive voice playing system. The invention can carry out self-adaptive speed-multiplying playing on the voice data.

Description

Self-adaptive voice playing method and system

Technical Field

The invention relates to a voice playing method and a voice playing system, in particular to a self-adaptive voice playing method and a self-adaptive voice playing system.

Background

With the rapid development of mobile internet technology, users increasingly use instant messaging software to communicate. Instant messaging software supports voice transmission, but some people transmit too long voice, thereby affecting the efficiency of users in obtaining information in the voice. For example, some users send a plurality of voices with a time duration of 60 seconds or more, so to understand the information expressed by the users, the voices need to be heard completely, and the time is long. One solution is to use the technology of converting the voice into text in the instant messaging software, so as to convert the audio into text and increase the browsing speed. However, the disadvantage of this scheme is also obvious, and mainly when the audio is converted into text, some information is lost and misplaced.

There are also some playing software in the prior art, which can increase the video playing speed. For example, the improvement is 1.25 times, 1.5 times and the like of the original. By increasing the playing speed, the user can quickly know the information in the audio, as shown in fig. 1. However, this kind of playing software can only play fixed multiple speed, and cannot perform adaptive adjustment according to the voice duration, and the user cannot acquire much information in a short time, and the user experience is not good.

Disclosure of Invention

The invention provides a self-adaptive voice playing method, which can play the voice with longer time in the instant messaging software at a speed matched with the voice time length, so as to accelerate the playing speed, and thus, a user can obtain more information in a short time. The invention also provides a self-adaptive voice playing system.

The technical scheme adopted by the invention is as follows:

the invention provides a self-adaptive voice playing method on one hand, which comprises the following steps:

s401, acquiring first unplayed voice data in an instant messaging software chat, wherein the voice data comprises voice duration and a corresponding user ID;

s402, obtaining the next unplayed voice data in the instant messaging software chat;

s403, if the user ID corresponding to the voice data acquired in step S402 is the same as the user ID corresponding to the voice data acquired before, and the time interval between the acquired voice data and the voice data acquired before is lower than the preset interval threshold, executing S404;

s404, repeatedly executing steps S402 and S403 until the obtained voice data is inconsistent with the user ID corresponding to the previously obtained voice data or the time interval between the obtained voice data and the previously obtained voice data is greater than the preset interval threshold; executing S405;

s405, accumulating the acquired voice time lengths of the voice data with the same user ID to obtain a total voice time length Z; executing S406;

s406, if the obtained total voice time length Z is larger than a preset time threshold value Z0, playing the obtained voice data with the same user ID at a preset double speed Q, wherein Q is larger than 1, and determining based on the total voice time length Z.

Optionally, the preset interval threshold is 3-5 seconds.

Optionally, the preset multiple speed

Wherein Q is_minAt a predetermined minimum multiple speed, Q_maxAt a predetermined maximum speed, f (Q)_max，Q_min) Is AND Q_minAnd Q_maxThe associated speed compensation function.

Alternatively, f (Q)_max，Q_min) And Q_maxPositively correlated with Q_minA negative correlation.

Alternatively, f (Q)_max，Q_min) And (Q)_max-Q_min) And (4) positively correlating.

Optionally, Z0 is max (a preset threshold, k is the maximum speech duration allowed by the instant messaging software), and k is a coefficient smaller than k.

Optionally, the preset threshold is 10-30 seconds.

Alternatively, Q_min＝1.1，Q_max＝1.5。

Alternatively, Q_minAnd Q_maxThe settings are made based on the corresponding user IDs, respectively.

Another aspect of the present invention provides an adaptive audio playing system, including: the system comprises a processor and a storage medium, wherein the storage medium is provided with a computer program stored therein, and the processor executes the computer program and realizes the method when acquiring a voice playing instruction.

The self-adaptive voice playing method and the system provided by the embodiment of the invention are used for superposing the voice time lengths of the voice data which meet the requirement of the voice data belonging to the same user ID and the time interval between two adjacent voice data does not exceed the preset interval threshold value corresponding to the voice data which is not played, and when the superposed voice time length is greater than the preset time threshold value, the double-speed playing is carried out, so that the playing speed can be accelerated, and the additional processing is reduced as much as possible.

Drawings

FIG. 1 is a diagram of a conventional playback software;

fig. 2 is a schematic flowchart illustrating a method for adaptive voice playing according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for adaptive voice playing according to another embodiment of the present invention;

fig. 4 is a schematic flowchart of a method for adaptive voice playing according to another embodiment of the present invention;

fig. 5 is a diagram showing a set of voice data having the same user ID but different voice durations;

fig. 6 is a schematic flowchart of a method for adaptive voice playing according to another embodiment of the present invention;

fig. 7 shows a set of voice data transmitted at different times.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 2 is a diagram illustrating an adaptive voice playing method according to an embodiment of the present invention, which is suitable for voice playing of instant messaging software. As shown in fig. 2, the adaptive speech playing method provided in the embodiment of the present invention includes the following steps:

s101, voice data X which are not played in the instant messaging software chat are obtained, wherein the voice data X comprise voice time Z and a corresponding user ID.

S102, if the acquired voice time length Z is larger than a preset time threshold value Z0, playing corresponding voice data at a preset double speed Q, wherein Q is larger than 1, and determining based on the acquired voice time length.

Obviously, in practice, in the chat in the instant messaging software, the time length for sending the voice can be changed, and the double-speed playing needs to perform additional processing on the audio. If the duration of the voice is short, the double-speed playing can not save too much time. If the voice duration is longer, the double-speed playing can save more time. Therefore, the embodiment of the invention uses a time domain value Z0 to play the voice with longer time at double speed, thereby accelerating the playing speed and simultaneously reducing the extra processing as much as possible.

Further, in the embodiment of the present invention, the preset time threshold Z0 may be, for example, 10 to 30 seconds. Preferably, Z0 is max (a preset threshold, k is the maximum speech duration allowed by the instant messaging software), and k is a smaller coefficient, i.e., k < 1, e.g., k is 0.5. The predetermined threshold may be 10-30 seconds.

Further, in an embodiment of the present invention, the preset double speed Q may be equal to 1.25 or 1.5. In another embodiment, the predetermined multiple speed Q can be at a predetermined minimum multiple speed Q_minAnd a preset maximum speed Q_maxIn the meantime. Preferably, in one embodiment, Q_min＝1.1，Q_max＝1.5。

Further, the preset double speed Q may be determined by the following formula (1):

wherein, f (Q)_max，Q_min) Is AND Q_minAnd Q_maxThe associated speed compensation function. f (Q)_max，Q_min) And Q_maxPositive correlation, i.e. Q_maxThe larger, f (Q)_max，Q_min) The larger; f (Q)_max，Q_min) And Q_minNegative correlation, i.e. Q_minThe larger, f (Q)_max，Q_min) The smaller. Further, f (Q)_max，Q_min) And (Q)_max-Q_min) Positive correlation, i.e. (Q)_max-Q_min) The larger, f (Q)_max，Q_min) The smaller. The technical effect of determining Q by using the formula (1) in the embodiment of the invention is that the larger the Z/Z0 is, the longer the voice time length is, and the faster the voice time length needs to be played, so that the time of a user can be saved, and the longer the time length is, the higher the information distribution is, and the worry about missing details is avoided. On the contrary, the smaller the Z/Z0, the shorter the voice time, the slower the speed playing, which can ensure the user not to miss the information in the voice.

In one embodiment, f (Q)_max，Q_min)＝Q_maxand/N1, for example, N1 ═ 10.

In one embodiment, f (Q)_max，Q_min)＝(Q_max-Q_min) and/N2, for example, N2 ═ 5.

In one embodiment, f (Q)_max，Q_min)＝min(Q_max/N1，(Q_max-Q_min) /N2), N1 and N2 have the same values.

The reason for using the above embodiment is that in general, the audio speed-doubled playback will not exceed 2 times, nor will it be lower than 1.1 times, otherwise it will be unclear or not meaningful to hear at all. Based on this, the compensation function f (Q)_max，Q_min) May be Q_max1/10 of (1), i.e. about 0.15-0.2, or Q_min1/5, i.e., about 0.22-0.3, or the lesser of the two, these settings are reasonable.

In another embodiment of the present invention, the substrate is,

Q_max＝2，Q_min＝1.2，

further, Q_minAnd Q_maxCan be set based on the corresponding user ID, Q_minAnd Q_maxCan be set by the user, e.g. having a setting interface, the user entering Q_minAnd Q_max. Because the friends in private chat are familiar, the speaking speed of the friends can be known, and for people with high speaking speed, Q can be adjusted_minAnd Q_maxThe setting is smaller, and for people with slow speech speed, Q can be set_minAnd Q_maxThe setting is larger, so that the information in the voice can be more accurately known.

Fig. 3 is a diagram illustrating an adaptive audio playing method according to another embodiment of the present invention.

As shown in fig. 3, in one embodiment, preferably, the step S101 includes:

s201, sequentially acquiring the voice data which are not played in the instant messaging software chat.

Preferably, the step S102 includes:

and S202, if the voice time of the acquired certain voice data is greater than the preset time threshold, playing the voice data corresponding to the voice data at a preset speed Q.

In this embodiment, the voice data that is not played back may be sequentially obtained one by one, and only when the voice duration Z of a certain piece of voice data is greater than the preset time threshold Z0, the voice data corresponding to the voice data is played back at the preset double speed Q, so that the playback speed can be increased, and additional processing is reduced as much as possible. The preset time threshold Z0 and the preset speed Q in this embodiment are defined as in the previous embodiment, and detailed descriptions thereof are omitted here for avoiding redundancy.

Fig. 4 is a diagram illustrating an adaptive speech playing method according to another embodiment of the present invention.

As shown in fig. 4, in an embodiment, preferably, the steps S101 and S102 may further include:

s301, acquiring first unplayed voice data in the instant messaging software chat;

s302, acquiring next unplayed voice data in the instant messaging software chat;

s303, if the user ID corresponding to the voice data acquired in step 302 is the same as the user ID corresponding to the voice data acquired before, that is, the user ID corresponding to the voice data acquired in the next step is the same as the user ID corresponding to the voice data acquired in the previous step, then S304 is executed;

s304, repeatedly executing the steps S302 and S304 until the acquired voice data is inconsistent with the user ID corresponding to the voice data acquired before; executing S305;

s305, accumulating the voice durations of the obtained voice data with the same user ID to obtain a total voice duration, for example, obtaining n pieces of voice data X1, X2, …, and Xn with the same user ID, where the corresponding voice durations are Z1, Z2, …, and Zn, and then the total voice duration Z is Z1+ Z2+ … + Zn; executing S306;

and S306, if the obtained total voice time length Z is larger than the preset time threshold Z0, playing the acquired voice data with the same user ID at a preset double speed Q, for example, sequentially playing the voice data X1, X2, … and Xn in sequence. And if the obtained total voice time length Z is not greater than a preset time threshold value Z0, not performing double-speed playing.

In this embodiment, based on a set of voice data that is not played, it is possible to effectively distinguish between the case of short voice and the case of long voice. Short speech does not require speed doubling, and long speech does. Taking Z0 as an example of 30 seconds, two segments of audio, 19 seconds and 17 seconds in fig. 5, are not played at double speed if S302 to S304 are not used, and are played at double speed using S302 to S304. Obviously, these 4 words are spoken simultaneously, if only the simple judgment is made by using the time pre-value Z0, the change in speech speed will occur, thereby affecting the user experience. Such continuous time-varying voice data is determined by the user's usage habits in the instant messenger chat history, and therefore is given sufficient consideration and respect. The preset time threshold Z0 and the preset speed Q in this embodiment are defined as in the previous embodiment, and detailed descriptions thereof are omitted here for avoiding redundancy.

Fig. 6 is a diagram illustrating an adaptive audio playing method according to another embodiment of the present invention.

As shown in fig. 6, in one embodiment, preferably, the steps S101 and S102 may further include:

s403, if the user ID corresponding to the voice data acquired in step S402 is the same as the user ID corresponding to the voice data acquired before, and the time interval between the acquired voice data and the voice data acquired before is lower than the preset interval threshold P, executing S404; if not, go to step S405;

that is, if the user ID corresponding to the voice data acquired in the next step is the same as the user ID corresponding to the voice data acquired in the previous step, and the time interval Ts2-Te1 between the voice data acquired in the next step and the voice data acquired in the previous step is lower than the preset interval threshold P, Ts2 is the start time of the voice data acquired in the next step, and Ts1 is the end time of the voice data acquired in the previous step, S404 is executed;

s405, accumulating the obtained voice durations of the voice data with the same user ID to obtain a total voice duration Z, for example, obtaining n pieces of voice data X1, X2, …, and Xn with the same user ID, where the corresponding voice durations are Z1, Z2, …, and Zn, and then the total voice duration Z is Z1+ Z2+ … + Zn; executing S406;

s406, if the obtained total voice time length Z is larger than a preset time threshold value Z0, playing the acquired voice data with the same user ID at a preset double speed Q, for example, sequentially playing the voice data X1, X2, …, Xn Q > 1 in sequence, and determining based on the total voice time length Z. And if the obtained total voice time length Z is not greater than a preset time threshold value Z0, not performing double-speed playing.

In this embodiment, the preset interval threshold P may be, for example, 3 to 5 seconds. In addition, the preset time threshold Z0 and the preset speed Q in this embodiment are defined in accordance with the previous embodiment, and detailed descriptions thereof are omitted here for avoiding redundancy.

In this embodiment, for a group of unplayed voice data, only the voice durations of the voice data that satisfy the voice data belonging to the same user ID and the time interval between two adjacent voice data does not exceed the preset interval threshold are superimposed, and when the superimposed voice durations are greater than the preset time threshold, the double-speed playback is performed, for example, as shown in fig. 7, the voice durations of the voice data with the time of 11 seconds at the upper end of the right and the voice data with the time of 4 seconds can be superimposed to perform continuous double-speed playback, and the voice data with the time of 11 seconds and the time of 5 seconds at the left cannot be continuously played, so that the playback efficiency and the accuracy of information extraction can be improved.

An embodiment of the present invention further provides a self-adaptive voice playing system, including: the processor executes the computer program, and when a voice playing instruction is obtained, the steps of the self-adaptive voice playing method are realized. The self-adaptive voice playing system provided by the embodiment of the invention can be arranged on a mobile terminal.

Specifically, the memory and the processor can be general-purpose memory and processor, which are not limited in particular, and when the processor runs a computer program stored in the memory, the adaptive speech playing method can be executed, so as to solve the problem in the related art that speech cannot be adaptively played at double speed.

The above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A time-based adaptive voice playing method is characterized by comprising the following steps:

2. The time-based adaptive voice playing method according to claim 1, wherein the preset interval threshold is 3-5 seconds.

3. The time-based adaptive speech playback method of claim 1, wherein the preset multiple speed is set

4. The method according to claim 3Time adaptive speech playing method, characterized in that f (Q)_max，Q_min) And Q_maxPositively correlated with Q_minA negative correlation.

5. The time-based adaptive speech playback method of claim 3, wherein f (Q)_max，Q_min) And (Q)_max-Q_min) And (4) positively correlating.

6. The time-based adaptive speech playback method according to claim 3, wherein Z0 ═ max (a preset threshold, k ═ a single maximum speech duration allowed by instant messaging software), and k is a smaller coefficient.

7. The time-based adaptive voice playing method according to claim 6, wherein the preset threshold is 10-30 seconds.

8. The time-based adaptive speech playback method of claim 3, wherein Q is_min＝1.1，Q_max＝1.5。

9. The time-based adaptive speech playback method of claim 3, wherein Q is_minAnd Q_maxThe settings are made based on the corresponding user IDs, respectively.

10. An adaptive speech playback system, comprising: a processor and a storage medium having a computer program stored thereon, the computer program being executable by the processor to perform the method of any one of claims 1 to 9 when the voice playback instruction is obtained.