CN112750436B

CN112750436B - Method and equipment for determining target playing speed of voice message

Info

Publication number: CN112750436B
Application number: CN202011597149.4A
Authority: CN
Inventors: 孙洋
Original assignee: Shanghai Zhangmen Science and Technology Co Ltd
Current assignee: Shanghai Zhangmen Science and Technology Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2022-12-30
Anticipated expiration: 2040-12-29
Also published as: CN112750436A

Abstract

The application aims at providing a method and equipment for determining a target playing speed of a voice message, wherein the method comprises the following steps: receiving a voice message sent by a first user to a second user; determining an original play speed of the voice message; and determining a target playing speed of target content information of the voice message, wherein the target content information comprises partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed. And accelerating or decelerating part of content information or all content information of the voice message to adapt to the actual needs of the user.

Description

Method and equipment for determining target playing speed of voice message

Technical Field

The present application relates to the field of communications, and in particular, to a technique for determining a target play speed of a voice message.

Background

With the development of the times, the rhythm of life is faster and faster. In order to improve the social efficiency, when online social contact is carried out, more and more users can select a mode of sending voice to improve the online social contact efficiency.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for determining a target play speed of a voice message.

According to an aspect of the present application, there is provided a method for determining a target play speed of a voice message, the method comprising:

receiving a voice message sent by a first user to a second user;

determining an original play speed of the voice message;

and determining a target playing speed of target content information of the voice message, wherein the target content information comprises partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed.

According to one aspect of the present application, there is provided an apparatus for determining a target playback speed of a voice message, the apparatus comprising.

The one-to-one module is used for receiving a voice message sent by a first user to a second user;

a second module for determining an original play speed of the voice message;

and a third module, configured to determine a target play speed of target content information of the voice message, where the target content information includes partial content information or all content information of the voice message, and the target play speed is a target play multiple of the original play speed.

According to an aspect of the present application, there is provided an apparatus for determining a target play speed of a voice message, wherein the apparatus comprises:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the operations of any of the methods described above.

According to one aspect of the application, there is provided a computer-readable medium storing instructions that, when executed, cause a system to perform the operations of any of the methods described above.

According to an aspect of the application, there is provided a computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the operations of any of the methods described above.

Compared with the prior art, the method and the device have the advantages that after the voice message is received, the original playing speed of the voice message is determined, and the target playing speed of the target content information of the voice message is determined, wherein the target content information comprises part of content information or all content information of the voice message, and the target playing speed is the target playing multiple of the original playing speed. The acceleration or deceleration of partial content information or all content information of the voice message is realized to adapt to the actual needs of a user, and when the target content information of the voice message needs to be accelerated and played, the target content information is accelerated and processed to save the time of the user; when the target content information of the voice message needs to be played in a speed-down mode, the target content information is processed in a speed-down mode, and therefore the user can be guaranteed to hear the content of the voice message clearly.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow chart of a method for determining a target play speed for a voice message according to one embodiment of the present application;

FIG. 2 illustrates a block diagram of an apparatus for determining a target play speed for a voice message according to one embodiment of the present application;

FIG. 3 illustrates an exemplary system that can be used to implement the various embodiments described in this application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include forms of volatile Memory, random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase-Change Memory (PCM), programmable Random Access Memory (PRAM), static Random-Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash Memory or other Memory technology, compact Disc Read Only Memory (CD-ROM), digital Versatile Disc (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The device referred to in the present application includes, but is not limited to, a terminal, a network device, or a device formed by integrating a terminal and a network device through a network. The terminal includes, but is not limited to, any mobile electronic product capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), such as a smart phone, a tablet computer, and the like, and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, and the like. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the terminal, the network device, or a device formed by integrating the terminal and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In the description of the present application, "a plurality" means two or more unless specifically defined otherwise.

Fig. 1 shows a flowchart of a method for determining a target play speed of a voice message according to an embodiment of the present application, the method including step S11, step S12, and step S13. In step S11, a voice message sent by the first user to the second user is received. In step S12, the original play speed of the voice message is determined. In step S13, a target playing speed of target content information of the voice message is determined, where the target content information includes partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed. In some embodiments, the executing subject of the solution described in this embodiment includes, but is not limited to, a user equipment and a network device. In some embodiments, the user device includes, but is not limited to, a computing device such as a cell phone, computer, tablet, and the like. For example, the scheme of the present embodiment is executed by the user equipment of the second user. In some embodiments, the network device includes, but is not limited to, a server corresponding to a social application. For example, when the execution subject is a user device, the user device receives a voice message sent by a corresponding network device (e.g., a server corresponding to a social application), and determines a target play speed of target content information of the voice message. For another example, when the execution subject is the network device, the network device determines, when receiving a voice message (e.g., a voice message sent by a first user to a second user), a target play speed of target content information of the voice message.

Specifically, in step S11, a voice message sent by the first user to the second user is received. For example, user a (e.g., the first user is the user a) sends a voice message to user B (e.g., the second user is the user B) through a social application.

In step S12, the original play speed of the voice message is determined. In some embodiments, the original play speed is determined based on the original speech rate of the voice message. In some embodiments, the units of the original play speed include, but are not limited to: number of words per second (e.g., n words played per second).

In step S13, a target playing speed of target content information of the voice message is determined, where the target content information includes partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed. In some embodiments, only the targeted content information of the voice message is subject to the acceleration or deceleration process. In some embodiments, the target content information includes all content information of the voice message, e.g., the all content information of the voice message is accelerated or decelerated. In other embodiments, the target content information includes part of the content information of the voice message, for example, part of the content information of the voice message is accelerated or decelerated. In some embodiments, the target playback multiple includes, but is not limited to, specific multiples of 0.1, 0.5, 0.75, 1.2, 2.5, and the like. In some embodiments, the target playback speed = the original playback speed by the target playback multiple. For example, when the target playback multiple is greater than 1, the processing of the target content information of the voice message is accelerated. For another example, when the target playback multiple is smaller than 1, the processing of the target content information of the voice message is a speed reduction processing.

For example, user a sends a voice message to user B (e.g., the second user is user B) through a social application (e.g., the first user is user a). After receiving the voice message, the user device of the user B or the server corresponding to the social application determines an original playing speed (e.g., 2 words/second) of the voice message. Further, the user equipment or the server determines a target playing speed of the target content information of the voice message so as to speed up or slow down the target content information of the voice message. For example, when the voice message has more vocalized words such as "kayings", "o", etc., the network device determines to perform accelerated processing on the part (e.g., the target content information) where the vocalized words appear (e.g., the target playing multiple is greater than 1) so as to save time of the user B. For another example, the original playing speed of the voice message is compared with the historical voice speed of the user B, and if the original playing speed is greater than the historical voice speed, the target content information of the voice message (for example, the target content information is all content information of the voice message) is slowed down (for example, the target playing multiple is less than 1) to ensure that the user B can hear the content to be expressed by the voice message. In this embodiment, by determining the target playing speed of the voice message, when the voice message needs to be accelerated, the voice message is accelerated, so as to save the social time of the second user; and when the voice message needs to be decelerated, performing deceleration processing on the voice message to ensure that the second user can clearly listen to the content to be expressed by the voice message.

In some embodiments, the step S12 includes a step S121 (not shown) and a step S122. In step S121, identifying content information of the voice message, wherein the content information includes one or more text messages; in step S122, determining an original playing speed of the voice message according to the one or more text messages; the step S13 includes: and determining a target playing speed of target content information of the voice message based on the one or more text messages, wherein the target content information comprises partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed. The execution subject of this embodiment includes, but is not limited to, the user equipment or the network equipment. In some embodiments, the user device includes, but is not limited to, a computing device such as a cell phone, computer, tablet, and the like. In some embodiments, the network device includes, but is not limited to, a server corresponding to a social application. For example, when the execution subject is the user equipment, the original play speed of the voice message is determined by the user equipment, and further the target play speed of the voice message is determined. For another example, when the execution subject is the network device, the network device determines an original play speed of the voice message, and further determines a target play speed of the voice message. In some embodiments, the user device or the network device identifies content information of the voice message (e.g., identifies content information of the voice message based on voice recognition technology), wherein the content information of the voice message includes one or more text messages. For example, the voice message is "casting to then-i-weekend overtime", the user equipment or the network equipment identifies, based on a voice identification technology, that content information of the voice message is "casting to then-i-weekend overtime", where the content information of the voice message includes 10 pieces of text information, and further, the user equipment or the network equipment determines an original playing speed of the voice message based on the identified 10 pieces of text information. Further, the user equipment or the network equipment determines a target playing speed of the target content information of the voice message based on the 10 pieces of text information, wherein the target playing speed is a target playing multiple of an original playing speed of the voice message. In this embodiment, the user equipment or the network equipment determines the original playing speed of the voice message and the target playing speed of the target content information by recognizing the content information of the voice message, so as to perform intelligent acceleration or deceleration processing on the target content information of the voice message, thereby improving user experience.

In some embodiments, the step S122 includes: and determining the original playing speed of the voice message according to the quantity of the one or more text messages and the time interval between the first text message and the last text message in the one or more text messages. The execution subject of this embodiment includes, but is not limited to, the user equipment or the network equipment. For example, the user equipment or the network device determines an original playing speed of the voice message according to the number of the one or more recognized text messages and a time interval between a first text message and a last text message (for example, the original playing speed = the number of text messages/the time interval). For example, the voice message of "take-any-that-i weekend overtime" includes 10 text messages, wherein, the time interval between the first text message "take-any" and the last text message "shift" is 10 seconds, and then the original playing speed of the voice message is determined to be "1 word/second". For another example, the voice message of "that-me will overtime on weekend" includes 8 text messages, wherein, the time interval between the first text message "that" and the last text message "class" is 10 seconds, then the original playing speed of the voice message is determined to be "0.8 words/second".

In some embodiments, the step S13 includes a step S131 (not shown), a step S132, and a step S133. In step S131, determining target content information of the voice message based on the number of target text information appearing in the one or more text information, wherein the target content information includes partial content information or all content information of the voice message; in step S132, a target playing multiple corresponding to the target content information is obtained; in step S133, a target playing speed of the target content information is determined according to the original playing speed of the voice message and the target playing multiple. In some embodiments, the user equipment or the network device needs to determine the target content information in the voice message that needs to be accelerated or decelerated, so as to accelerate or decelerate the target content information. In some embodiments, the user equipment or the network equipment determines a part of the voice message, which needs to be accelerated or decelerated, according to the number of target text messages appearing in one or more text messages of the voice message, and determines the part as the target content information. In some embodiments of the present invention, the, the target text information includes but is not limited to vocabularies such as "kayian", "o", "this", "that", etc. Further, the user equipment or the network equipment obtains a target playing multiple corresponding to the target content information (for example, a playing multiple greater than 1 is preset in the user equipment or the network equipment, and when the number of the target text information is equal to or greater than a number threshold, the playing multiple greater than 1 is obtained as the target playing multiple), where the target playing multiple is an original playing speed for the voice message, and for example, the target playing speed for the target content information is obtained by multiplying the original playing speed for the voice message by the target playing multiple.

In some embodiments, the determining the target content information of the voice message based on the number of the target text information appearing in the one or more text information comprises any one of:

(1) And if the number of target text messages appearing in one or more text messages of the voice message is equal to or greater than a first number threshold, taking all content information of the voice message as the target content information of the voice message. For example, with the whole voice message as a reference, it is detected whether the number of target text messages appearing in the voice message is equal to or greater than a first number threshold (e.g., 10), and if so, all of the voice message is taken as the target content information. And acquiring a target playing multiple which is larger than 1 and corresponds to the target content information.

(2) And averagely dividing the voice message into at least two parts, wherein each part comprises one or more pieces of text information, and taking the part of the at least two parts, in which the number of the target text information is equal to or greater than a second number threshold value, as the target content information of the voice message. For example, the voice message is equally divided into a plurality of parts (e.g., two parts, three parts, etc.), the number of the target text messages appearing in each of the plurality of parts is respectively detected, and if the number of the target text messages appearing in a certain part is equal to or greater than a second number threshold (e.g., 5), the part is determined as the target content message; or if the number of the target text messages appearing in each part is equal to or greater than a second number threshold (for example, 5), determining all content information of the voice message as the target content information of the voice message. And acquiring a target playing multiple which is larger than 1 and corresponds to the target content information, and determining a target playing speed for each part of voice messages according to the original playing speed of the voice messages and the target playing multiple.

(3) And taking the part of the target text information with the quantity equal to or larger than a third quantity threshold value continuously appearing in the voice message as the target content information of the voice message. For example, if the number of the target text messages continuously appearing in the voice message is equal to or greater than a third number threshold, the part of the target text messages continuously appearing is determined as the target content information of the voice message. For example, the third number threshold is 3, four target text messages of 'kao-o-that-' appear continuously in the voice messages of 'kao-o-that-me weekend overtime', then the part of 'kao-shi-' is taken as the target content information of the voice message, and the target playing multiple which is more than 1 and corresponds to the target content information is obtained.

Here, it should be understood by those skilled in the art that the specific operations for determining the target content information are only examples, and other specific operations that may be present or may occur later are also within the scope of the present application if applicable to the present application. And each item of specific operation for determining the target content information is a parallel relation, and the target content information of the voice message can be determined in any one mode in practical application.

In some embodiments, the target content information includes all content information of the voice message, and the determining a target play speed of the target content information of the voice message, wherein the target play speed is a target play multiple of the original play speed, includes: and determining the target playing speed of all content information of the voice message according to the user related information of the second user, wherein the target playing speed is a target playing multiple of the original playing speed. The execution subject of this embodiment includes, but is not limited to, the user equipment or the network equipment. In some embodiments, the target content information includes the entire content information of the voice message, e.g., the entire content information of the voice message is accelerated or decelerated. In some embodiments, the process of accelerating or decelerating the overall content information of the voice message is determined based on the user-related information of the second user, so as to adapt the target playing speed of the target content information after the process of accelerating or decelerating to the second user. In some embodiments, the user-related information includes, but is not limited to, a historical speech speed of the second user.

In some embodiments, the determining the target play speed of the entire content information of the voice message according to the user-related information of the second user includes: if the original playing speed of the voice message is greater than the historical voice speed, obtaining a target playing multiple corresponding to all content information of the voice message, wherein the target playing multiple is less than 1, and if the original playing speed of the voice message is less than the historical voice speed, obtaining the original playing speed of the voice message and the target playing multiple, wherein the target playing multiple is greater than 1; and determining the target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple. In some embodiments, the historical speech rate of the second user comprises an average speech rate of one or more speech messages that the second user has historically sent (e.g., an average speech rate at which the second user would ordinarily send speech messages). In some embodiments, the user device or the network device compares an original playing speed of the voice message sent by the first user to the second user with a historical voice speed of the second user, and if the original playing speed of the voice message is greater than the historical voice speed, obtains a target playing multiple smaller than 1 (for example, the playing multiple smaller than 1 is preset in the user device or the network device), so as to perform a speed reduction process on all content information of the voice message; if the original playing speed of the voice message is less than the historical voice speed, a target playing multiple greater than 1 is obtained (for example, the playing multiple greater than 1 is preset in the user equipment or the network equipment), so as to accelerate all content information of the voice message. In this embodiment, for the voice message sent to the second user, the target playing speed of all content information of the voice message is determined based on the historical voice speed of the second user, so that the target playing speed of all content information of the voice message after the acceleration or deceleration processing is adapted to the second user, and the user experience is improved.

In some embodiments, the user-related information of the second user includes information about the number of repeated plays of the second user with respect to the historical voice message, and the determining a target play speed of the entire content information of the voice message according to the user-related information of the second user, where the target play speed is a target play multiple of the original play speed, includes: if the repeated playing frequency information of the second user about the historical voice message is equal to or greater than the repeated frequency threshold value, acquiring a target playing multiple corresponding to the original playing speed of the voice message, wherein the target playing multiple is less than 1; otherwise, determining that the target playing multiple is equal to 1; and determining the target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple. In some embodiments, the replay number information of the second user about the historical voice message includes an average replay number of the second user on the historical voice message received by the second user, the replay number information may reflect whether the second user frequently replays the voice message repeatedly, and if so, the received voice message needs to be decelerated. For example, a threshold of the number of repetitions is preset in the user equipment or the network equipment (for example, 5 times), if the information of the number of repetitions of playing (for example, the average number of repetitions of playing of each voice message by the second user is 5 times) is equal to or greater than the threshold of the number of repetitions, it indicates that the second user frequently repeatedly plays a segment of voice message, and then performs a speed reduction process on the voice message sent by the first user to the second user (for example, acquiring a playing multiple smaller than 1 as a target playing multiple of all content information of the voice message, and further determining a target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple).

In some embodiments, the target content information includes all content information of the voice message, the method further comprising: acquiring multiple setting information of the second user on the voice message, wherein the multiple setting information comprises the target playing multiple; the determining a target play speed of target content information of the voice message, wherein the target play speed is a target play multiple of the original play speed, includes: and determining the target playing speed of all content information of the voice message according to the target playing multiple in the multiple setting information. In some embodiments, the target playback multiple for the voice message is set by the second user. For example, a setting window of the play speed of the voice message is set in the social application, and the second user may set a target play multiple (e.g., a specific multiple such as 1.2, 0.75, etc.) of the entire content information of the voice message in the setting window. Further, if the execution subject is the user equipment, the user equipment may directly obtain the target play multiple based on the setting operation of the second user, and determine the target play speed of the voice message according to the target play multiple and the original play speed of the voice message. If the execution subject is the network device, the user device of the second user may obtain the target play multiple based on the setting operation of the second user, and send the target play multiple to the network device, so that the network device determines the target play speed of the voice message according to the target play multiple and the original play speed of the voice message. In this embodiment, the target play speed of the voice message may be determined based on the setting of the second user, so that the second user may set according to actual needs, thereby improving user experience.

In some embodiments, the multiplier setting information corresponds to a target dialog window for presenting the voice message. For example, the target play multiplier in the multiplier setting information is for a voice message presented in the target dialog window. For example, if the second user knows that the first user is faster in speech speed, the target playback multiple of the dialog window with the first user is set to be 0.7 times the original playback speed. For another example, if the second user knows that the speech speed of the first user is slow, the target playing multiple of the dialog window with the first user is set to be 1.7 times of the original playing speed. When the execution subject is the user equipment, the user equipment can directly acquire the target dialog window set by the second user, and the target playing speed of the voice message presented in the target dialog window is determined based on the target playing multiple in the multiple setting information corresponding to the target dialog window. When the execution subject is the network device, the user device of the second user may obtain the identification information of the target dialog window and the corresponding target play multiple based on the setting operation of the second user, and send the identification information of the target dialog window and the corresponding target play multiple to the network device, so that the network device determines the target play speed of the voice message according to the target play multiple corresponding to the target dialog window and the original play speed of the voice message to be presented in the target dialog window.

In some embodiments, the method further includes a step S14 (not shown), in which step S14, a new voice message sent by a third user is received; and taking the target playing speed of the voice message sent by the first user to the second user as the target playing speed of the new voice message. For example, the execution subject is the user equipment or the network equipment. After the user device or the network device determines, based on the user-related information or the multiple setting information of the second user, a target play speed (e.g., 2 words/second) of the voice message sent by the first user to the second user, a play speed of a voice message sent by another subsequent user (e.g., the third user) to the second user is also the target play speed.

In some embodiments, the method further comprises step S15 (not shown). In step S15, receiving a new voice message sent by a third user; determining an original play speed of the new voice message; and taking the original playing speed of the new voice message as the target playing speed of the new voice message. For example, the execution subject is the user equipment or the network equipment. Even if the user device or the network device determines, based on the user-related information or the multiple setting information of the second user, a target play speed (e.g., 2 words/second) of the voice message sent by the first user to the second user, the play speed of the voice message sent by a subsequent other user (e.g., the third user) to the second user is still to be played at the original play speed of the new voice message.

In some embodiments, the method further includes step S16 (not shown), and in step S16, in response to the second user performing the playing operation on the voice message, the target content information of the voice message is played according to the target playing speed. In some embodiments, it should be understood by those skilled in the art that the determination of the target play speed of the voice message (e.g., the voice message sent by the first user to the second user) may be performed by the user equipment or the network equipment, if the user equipment is the user equipment, when the voice message is played, the user equipment responds to a play operation (e.g., a play operation of clicking the voice message) of the voice message (e.g., the voice message sent by the first user to the second user) by the second user, and the user equipment plays the target content information of the voice message according to the target play speed of the voice message (e.g., if the target content information is the partial content information of the voice message, then the partial content information of the voice message is played at the target play speed, and then the other partial content information of the voice message is played at the original play speed of the voice message), and if the target content information is the entire content information of the voice message, then the entire content information of the voice message is played at the target play speed).

In some embodiments, the method further comprises step S17 (not shown). In step S17, the voice message and the target playing speed of the target content information of the voice message are sent to the user equipment of the second user, so that the user equipment responds to the playing operation of the second user on the voice message, and plays the target content information of the voice message according to the target playing speed. In some embodiments, it should be understood by those skilled in the art that the determination of the target play speed of the voice message (for example, the voice message sent by the first user to the second user) may be performed by the user equipment or the network equipment, and if the network equipment is the network equipment, the network equipment sends the voice message that needs to be sent to the second user and the determined target play speed of the target content information of the voice message to the user equipment of the second user, so that the user equipment plays the voice message according to the target play speed of the voice message in response to the play operation of the voice message by the second user. In some embodiments, the network device marks the target content information, so that the user equipment can identify the target content information of the voice message that needs to be played at the target playing speed.

Fig. 2 is a diagram illustrating a device structure of a device for determining a target play speed of a voice message according to an embodiment of the present application, the device including a one-module, a two-module, and a three-module. And the module is used for receiving the voice message sent by the first user to the second user. A second module for determining an original play speed of the voice message. And a third module, configured to determine a target play speed of target content information of the voice message, where the target content information includes partial content information or all content information of the voice message, and the target play speed is a target play multiple of the original play speed. In some embodiments, the executing subject of the solution described in this embodiment includes, but is not limited to, a user equipment and a network device. In some embodiments, the user device includes, but is not limited to, a computing device such as a cell phone, computer, tablet, and the like. For example, the scheme of the present embodiment is executed by the user equipment of the second user. In some embodiments, the network device includes, but is not limited to, a server corresponding to a social application. For example, when the execution subject is a user device, the user device receives a voice message sent by a corresponding network device (e.g., a server corresponding to a social application), and determines a target play speed of target content information of the voice message. For another example, when the execution subject is the network device, the network device determines, when receiving a voice message (e.g., a voice message sent by a first user to a second user), a target play speed of target content information of the voice message.

Specifically, the module is configured to receive a voice message sent by a first user to a second user. For example, user a (e.g., the first user is the user a) sends a voice message to user B (e.g., the second user is the user B) through a social application.

A second module for determining an original play speed of the voice message. In some embodiments, the original play speed is determined based on the original speech rate of the voice message. In some embodiments, the units of the original play speed include, but are not limited to: number of words per second (e.g., n words played per second).

And a third module, configured to determine a target play speed of target content information of the voice message, where the target content information includes partial content information or all content information of the voice message, and the target play speed is a target play multiple of the original play speed. In some embodiments, only the targeted content information of the voice message is accelerated or decelerated. In some embodiments, the target content information includes all content information of the voice message, e.g., the all content information of the voice message is accelerated or decelerated. In other embodiments, the target content information includes part of the content information of the voice message, for example, part of the content information of the voice message is accelerated or decelerated. In some embodiments, the target playback multiple includes, but is not limited to, specific multiples of 0.1, 0.5, 0.75, 1.2, 2.5, and the like. In some embodiments, the target playback speed = the original playback speed by the target playback multiple. For example, when the target playback multiple is greater than 1, the processing of the target content information of the voice message is accelerated. For another example, when the target playback multiple is smaller than 1, the processing of the target content information of the voice message is a speed reduction processing.

For example, user a sends a voice message to user B (e.g., the second user is user B) through a social application (e.g., the first user is user a). After receiving the voice message, the user device of the user B or the server corresponding to the social application determines the original playing speed (e.g., 2 words/second) of the voice message. Further, the user equipment or the server determines a target playing speed of the target content information of the voice message so as to speed up or slow down the target content information of the voice message. For example, when there are more vocalized words such as "kay", "o", etc. in the voice message, the network device determines to perform accelerated processing on the part (e.g., the target content information) where more vocalized words appear (e.g., the target playing multiple is greater than 1) to save the time of the user B. For another example, the original playing speed of the voice message is compared with the historical voice speed of the user B, and if the original playing speed is greater than the historical voice speed, the target content information of the voice message (for example, the target content information is all content information of the voice message) is subjected to speed reduction processing (for example, the target playing multiple is not less than 1) so as to ensure that the user B can hear the content to be expressed by the voice message. In this embodiment, by determining the target playing speed of the voice message, when the voice message needs to be accelerated, the voice message is accelerated, so as to save the social time of the second user; and when the voice message needs to be decelerated, performing deceleration processing on the voice message to ensure that the second user can clearly listen to the content to be expressed by the voice message.

In some embodiments, the two modules include a one-two module (not shown) and a two-two module. A module for recognizing content information of the voice message, wherein the content information includes one or more text messages; a second module, a third module and a fourth module, wherein the second module is used for determining the original playing speed of the voice message according to the one or more text messages; the three modules are used for: and determining a target playing speed of target content information of the voice message based on the one or more text messages, wherein the target content information comprises part of content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed.

Here, the specific implementation manners of the first, second, and third modules are the same as or similar to the specific implementation manners of the steps S121, S122, and S13, and thus are not repeated herein and are included herein by reference.

In some embodiments, the one or two modules are to: and determining the original playing speed of the voice message according to the quantity of the one or more text messages and the time interval between the first text message and the last text message in the one or more text messages.

Here, the specific implementation of the first module, the second module, and the third module is the same as or similar to the specific implementation of the step S122, and thus the detailed description is omitted, and the detailed implementation is included herein by way of reference.

In some embodiments, the three-module includes a three-in-one module (not shown), a three-two module, and a three-three module. A third-first module, configured to determine target content information of the voice message based on a number of target text information appearing in the one or more text information, where the target content information includes partial content information or all content information of the voice message; the third module and the second module are used for acquiring a target playing multiple corresponding to the target content information; and the third module is used for determining the target playing speed of the target content information according to the original playing speed of the voice message and the target playing multiple.

Here, the specific implementation manners of the one-three-one module, the one-three-two module, and the one-three module are the same as or similar to the specific implementation manners of the step S131, the step S132, and the step S133, and therefore, the detailed descriptions thereof are omitted, and the description thereof is incorporated herein by reference.

(1) And if the number of the target text messages appearing in one or more text messages of the voice message is equal to or larger than a first number threshold, taking all content information of the voice message as the target content information of the voice message. For example, with the whole voice message as a reference, it is detected whether the number of target text messages appearing in the voice message is equal to or greater than a first number threshold (e.g., 10), and if so, all of the voice message is taken as the target content information. And acquiring a target playing multiple which is larger than 1 and corresponds to the target content information.

(2) And averagely dividing the voice message into at least two parts, wherein each part comprises one or more pieces of text information, and taking the part of the at least two parts, in which the number of the target text information is equal to or greater than a second number threshold value, as the target content information of the voice message. For example, the voice message is equally divided into a plurality of parts (e.g., two parts, three parts, etc.), the number of the target text messages appearing in each of the plurality of parts is respectively detected, and if the number of the target text messages appearing in a certain part is equal to or greater than a second number threshold (e.g., 5), the part is determined as the target content message; or if the number of the target text messages appearing in each part is equal to or greater than a second number threshold (for example, 5), determining the whole content information of the voice message as the target content information of the voice message. And acquiring a target playing multiple which is larger than 1 and corresponds to the target content information, and determining a target playing speed for each part of voice messages according to the original playing speed of the voice messages and the target playing multiple.

(3) And taking the part of the target text information with the quantity equal to or larger than a third quantity threshold value continuously appearing in the voice message as the target content information of the voice message. For example, if the number of target text messages continuously appearing in the voice message is equal to or greater than a third number threshold, the part of the target text messages continuously appearing is determined as the target content information of the voice message. For example, the third number threshold is 3, four target text messages of kay-kay are continuously appeared in the voice message of kay-kay, the kay-kay part is taken as the target content information of the voice message, and the target playing multiple which is greater than 1 and corresponds to the target content information is obtained.

It should be understood by those skilled in the art that the specific operations for determining the target content information are only examples, and other specific operations that may be present or later come may be applicable to the present application and are within the scope of the present application. And each item of specific operation for determining the target content information is a parallel relation, and in practical application, the target content information of the voice message can be determined in any one mode.

In some embodiments, the target content information includes all content information of the voice message, and the determining a target play speed of the target content information of the voice message, wherein the target play speed is a target play multiple of the original play speed, includes: and determining the target playing speed of all content information of the voice message according to the user related information of the second user, wherein the target playing speed is a target playing multiple of the original playing speed. The execution subject of this embodiment includes, but is not limited to, the user equipment or the network equipment. In some embodiments, the target content information includes the entire content information of the voice message, e.g., the entire content information of the voice message is accelerated or decelerated. In some embodiments, the accelerating or decelerating process for the entire content information of the voice message is determined based on the user-related information of the second user, so as to adapt the target playing speed of the target content information after the accelerating or decelerating process to the second user. In some embodiments, the user-related information includes, but is not limited to, a historical speech speed of the second user.

In some embodiments, the determining the target playback speed of the entire content information of the voice message according to the user related information of the second user includes: if the original playing speed of the voice message is greater than the historical voice speed, obtaining a target playing multiple corresponding to all content information of the voice message, wherein the target playing multiple is less than 1, and if the original playing speed of the voice message is less than the historical voice speed, obtaining the original playing speed of the voice message and the target playing multiple, wherein the target playing multiple is greater than 1; and determining the target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple. In some embodiments, the historical speech rate of the second user comprises an average speech rate of one or more speech messages that the second user has historically sent (e.g., an average speech rate at which the second user would ordinarily send speech messages). In some embodiments, the user equipment or the network equipment compares an original playing speed of the voice message sent by the first user to the second user with a historical voice speed of the second user, and if the original playing speed of the voice message is greater than the historical voice speed, obtains a target playing multiple smaller than 1 (for example, the user equipment or the network equipment has a playing multiple smaller than 1 in advance), so as to perform a speed reduction process on all content information of the voice message; if the original playing speed of the voice message is less than the historical voice speed, a target playing multiple greater than 1 is obtained (for example, the playing multiple greater than 1 is preset in the user equipment or the network equipment) so as to accelerate all content information of the voice message. In this embodiment, for the voice message sent to the second user, the target playing speed of all content information of the voice message is determined based on the historical voice speed of the second user, so that the target playing speed of all content information of the voice message after the acceleration or deceleration processing is adapted to the second user, and the user experience is improved.

In some embodiments, the user-related information of the second user includes information about repeated playing times of the second user with respect to a historical voice message, and the determining a target playing speed of the entire content information of the voice message according to the user-related information of the second user, where the target playing speed is a target playing multiple of the original playing speed, includes: if the repeated playing frequency information of the second user about the historical voice message is equal to or greater than the repeated frequency threshold value, acquiring a target playing multiple corresponding to the original playing speed of the voice message, wherein the target playing multiple is less than 1; otherwise, determining that the target playing multiple is equal to 1; and determining the target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple. In some embodiments, the replay number information of the second user about the historical voice message includes an average replay number of the second user's historical replay on the historical voice message received by the second user, and the replay number information may reflect whether the second user frequently replays the voice message, and if so, the received voice message needs to be decelerated. For example, a threshold of the number of repetitions is preset in the user equipment or the network equipment (for example, 5 times), if the information of the number of repetitions of playing (for example, the average number of repetitions of playing of each voice message by the second user is 5 times) is equal to or greater than the threshold of the number of repetitions, it indicates that the second user frequently repeatedly plays a segment of voice message, and then performs a speed reduction process on the voice message sent by the first user to the second user (for example, acquiring a playing multiple smaller than 1 as a target playing multiple of all content information of the voice message, and further determining a target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple).

In some embodiments, the target content information includes all content information of the voice message, the method further comprising: acquiring multiple setting information of the second user on the voice message, wherein the multiple setting information comprises the target playing multiple; the determining a target play speed of target content information of the voice message, wherein the target play speed is a target play multiple of the original play speed, includes: and determining the target playing speed of all content information of the voice message according to the target playing multiple in the multiple setting information. In some embodiments, the target play multiplier for the voice message is set by the second user. For example, a setting window of the playing speed of the voice message is set in the social application, and the second user may set a target playing multiple (for example, a specific multiple of 1.2, 0.75, and the like) of the entire content information of the voice message in the setting window. Further, if the execution subject is the user equipment, the user equipment may directly obtain the target play multiple based on the setting operation of the second user, and determine the target play speed of the voice message according to the target play multiple and the original play speed of the voice message. If the execution subject is the network device, the user device of the second user may obtain the target play multiple based on the setting operation of the second user, and send the target play multiple to the network device, so that the network device determines the target play speed of the voice message according to the target play multiple and the original play speed of the voice message. In this embodiment, the target playing speed of the voice message may be determined based on the setting of the second user, so that the second user may set according to actual needs, thereby improving user experience.

In some embodiments, the multiplier setting information corresponds to a target dialog window in which the voice message is presented. For example, the target play multiplier in the multiplier setting information is for a voice message presented in the target dialog window. For example, if the second user knows that the first user is fast in speaking, the target playback multiple of the dialog window with the first user is set to be 0.7 times the original playback speed. For another example, if the second user knows that the speech speed of the first user is slow, the target playing multiple of the dialog window with the first user is set to be 1.7 times of the original playing speed. When the execution subject is the user equipment, the user equipment can directly acquire the target dialog window set by the second user, and the target playing speed of the voice message presented in the target dialog window is determined based on the target playing multiple in the multiple setting information corresponding to the target dialog window. When the execution subject is the network device, the user device of the second user may obtain the identification information of the target dialog window and the corresponding target play multiple based on the setting operation of the second user, and send the identification information of the target dialog window and the corresponding target play multiple to the network device, so that the network device determines the target play speed of the voice message according to the target play multiple corresponding to the target dialog window and the original play speed of the voice message to be presented in the target dialog window.

In some embodiments, the apparatus further comprises a fourth module (not shown), a fourth module, for receiving a new voice message sent by a third user; and taking the target playing speed of the voice message sent by the first user to the second user as the target playing speed of the new voice message. For example, the execution subject is the user equipment or the network equipment.

Here, the specific implementation corresponding to the four modules is the same as or similar to the specific implementation of the step S14, and thus is not repeated here, and is included herein by way of reference.

In some embodiments, the apparatus further comprises a five module (not shown). A fifth module, which is used for receiving a new voice message sent by a third user; determining an original play speed of the new voice message; and taking the original playing speed of the new voice message as the target playing speed of the new voice message.

Here, the specific implementation manner corresponding to the fifth module is the same as or similar to the specific implementation manner of the step S15, and thus is not repeated here and is included herein by way of reference.

In some embodiments, the apparatus further includes a sixth module (not shown), and the sixth module is configured to, in response to a play operation of the voice message by the second user, play the target content information of the voice message according to the target play speed.

Here, the specific implementation corresponding to the six modules is the same as or similar to the specific implementation of the step S16, and thus is not repeated here, and is included herein by way of reference.

In some embodiments, the apparatus further comprises a seven module (not shown). And a seventh module, configured to send the voice message and the target play speed of the target content information of the voice message to the user equipment of the second user, so that the user equipment responds to a play operation of the voice message by the second user, and plays the target content information of the voice message according to the target play speed.

Here, the specific implementation manner corresponding to the one-seven module is the same as or similar to the specific implementation manner of the step S17, and thus is not repeated here and is included herein by way of reference.

In addition to the methods and apparatus described in the embodiments above, the present application also provides a computer readable storage medium storing computer code that, when executed, performs the method as described in any of the preceding claims.

The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.

The present application further provides a computer device, comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

FIG. 3 illustrates an exemplary system that can be used to implement the various embodiments described herein;

in some embodiments, as illustrated in FIG. 3, the system 300 can be implemented as any of the devices in each of the described embodiments. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.

For one embodiment, system control module 310 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 305 and/or to any suitable device or component in communication with system control module 310.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

System memory 315 may be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 315 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.

For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 320 may be accessible over a network via communication interface(s) 325.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310 to form a system on a chip (SoC).

In various embodiments, system 300 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. As such, the software programs (including associated data structures) of the present application can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Those skilled in the art will appreciate that the forms of computer program instructions that reside on a computer-readable medium include, but are not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.

Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, feRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.

An embodiment according to the present application herein comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or solution according to embodiments of the present application as described above.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method for determining a target play speed for a voice message, wherein the method comprises:

receiving a voice message sent by a first user to a second user;

determining an original play speed of the voice message;

determining a target playing speed of target content information of the voice message, wherein the target content information comprises partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed;

wherein the determining an original play speed of the voice message comprises: identifying content information of the voice message, wherein the content information comprises one or more text messages; determining the original playing speed of the voice message according to the one or more text messages;

the determining a target play speed of target content information of the voice message, where the target content information includes partial content information or entire content information of the voice message, and the target play speed is a target play multiple of the original play speed includes: determining a target playing speed of target content information of the voice message based on the one or more text messages, wherein the target content information comprises partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed;

wherein the determining a target playing speed of target content information of the voice message based on the one or more text messages, wherein the target content information includes partial content information or all content information of the voice message, and the target playing speed is a target playing multiple of the original playing speed includes: determining target content information of the voice message based on the number of target text information appearing in the one or more text information, wherein the target text information comprises an anaglyphic word, and the target content information comprises partial content information or all content information of the voice message; acquiring a target playing multiple corresponding to the target content information; determining a target playing speed of the target content information according to the original playing speed of the voice message and the target playing multiple, wherein the target playing speed is determined based on user related information of the second user or the setting of the second user;

wherein the determining the target content information of the voice message based on the number of the target text information appearing in the one or more text information comprises any one of:

if the number of target text messages appearing in one or more text messages of the voice message is equal to or greater than a first number threshold, taking all content information of the voice message as the target content information of the voice message;

equally dividing the voice message into at least two parts, wherein each part comprises one or more text messages, and taking the part of the at least two parts, in which the number of target text messages is equal to or greater than a second number threshold value, as the target content information of the voice message;

and taking the part of the target text information with the quantity equal to or larger than a third quantity threshold value continuously appearing in the voice message as the target content information of the voice message.

2. The method of claim 1, wherein said determining an original play speed of said voice message from said one or more text messages comprises:

and determining the original playing speed of the voice message according to the quantity of the one or more text messages and the time interval between the first text message and the last text message in the one or more text messages.

3. The method of claim 1, wherein the target content information comprises all content information of the voice message, the determining a target play speed of the target content information of the voice message, wherein the target play speed is a target play multiple of the original play speed comprises:

and determining the target playing speed of all content information of the voice message according to the user related information of the second user, wherein the target playing speed is a target playing multiple of the original playing speed.

4. The method of claim 3, wherein the user-related information of the second user comprises a historical voice speed of the second user, and the determining a target play speed of the entire content information of the voice message according to the user-related information of the second user, wherein the target play speed is a target play multiple of the original play speed comprises:

if the original playing speed of the voice message is greater than the historical voice speed, acquiring a target playing multiple corresponding to all content information of the voice message, wherein the target playing multiple is less than 1, and if the original playing speed of the voice message is less than the historical voice speed, acquiring a target playing multiple corresponding to all content information of the voice message, wherein the target playing multiple is greater than 1;

and determining the target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple.

5. The method of claim 3, wherein the user-related information of the second user comprises information of repeated playing times of the second user with respect to a historical voice message, and the determining a target playing speed of the entire content information of the voice message according to the user-related information of the second user, wherein the target playing speed is a target playing multiple of the original playing speed comprises:

if the repeated playing frequency information of the second user about the historical voice message is equal to or greater than the repeated frequency threshold value, acquiring a target playing multiple corresponding to the original playing speed of the voice message, wherein the target playing multiple is less than 1; otherwise, determining that the target playing multiple is equal to 1;

6. The method of claim 1, wherein the targeted content information comprises all content information of the voice message, the method further comprising:

acquiring multiple setting information of the second user on the voice message, wherein the multiple setting information comprises the target playing multiple;

the determining a target play speed of target content information of the voice message, where the target play speed is a target play multiple of the original play speed, includes:

and determining the target playing speed of all content information of the voice message according to the original playing speed of the voice message and the target playing multiple in the multiple setting information.

7. The method of claim 6, wherein the multiplier setting information corresponds to a target dialog window in which the voice message is presented.

8. The method of any of claims 3 to 7, wherein the method further comprises:

receiving a new voice message sent by a third user;

and taking the target playing speed of the voice message sent by the first user to the second user as the target playing speed of the new voice message.

9. The method of any of claims 3 to 7, wherein the method further comprises:

receiving a new voice message sent by a third user;

determining an original play speed of the new voice message;

and taking the original playing speed of the new voice message as the target playing speed of the new voice message.

10. The method of any of claims 1 to 7, wherein the method further comprises:

and responding to the playing operation of the second user on the voice message, and playing the target content information of the voice message according to the target playing speed of the voice message.

11. The method of any of claims 1 to 7, wherein the method further comprises:

and sending the voice message and the target playing speed of the target content information of the voice message to the user equipment of the second user, so that the user equipment responds to the playing operation of the second user on the voice message and plays the target content information of the voice message according to the target playing speed.

12. An apparatus for determining a target play speed for a voice message, wherein the apparatus comprises:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 11.

13. A computer-readable medium storing instructions that, when executed, cause a system to perform operations to perform a method as recited in any of claims 1-11.