CN108133719B

CN108133719B - Voice playing method and device, electronic equipment and storage medium

Info

Publication number: CN108133719B
Application number: CN201711326476.4A
Authority: CN
Inventors: 勇幸; 李玥亭; 邢鑫岩
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2020-11-27
Anticipated expiration: 2037-12-13
Also published as: CN108133719A

Abstract

The present disclosure provides a voice playing method, apparatus, electronic device and storage medium, the method comprising: acquiring the playing duration of voice data, and displaying the voice data in a session interface; determining that the playing time is greater than a preset time threshold, and providing a playing time trigger interface aiming at the voice data in the session interface after detecting that a set interface starting condition is met; acquiring the play starting time appointed by a user for the voice data through the play time trigger interface; and playing the voice data according to the playing starting moment. The method and the device can enable the electronic equipment to play the voice flexibly according to the needs of the user, improve the processing efficiency of voice playing and bring convenience to the user.

Description

Voice playing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of audio processing technologies, and in particular, to a method and an apparatus for playing a voice, an electronic device, and a storage medium.

Background

With the development of internet and terminal technology, instant messaging becomes an essential communication mode in people's life. Instant messaging programs allow two or more people to communicate text messages, files, voice and video in real time using a network. Voice transmission is a common function, and a user can record voice data by using a device and send the voice data to an opposite-end user for listening. However, the existing voice sending function is single, and cannot meet more requirements of users.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a voice playing method, apparatus, electronic device, and storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a voice playing method, including:

acquiring the playing duration of voice data, and displaying the voice data in a session interface;

determining that the playing time is greater than a preset time threshold, and providing a playing time trigger interface aiming at the voice data in the session interface after detecting that a set interface starting condition is met;

acquiring the play starting time appointed by a user for the voice data through the play time trigger interface;

and playing the voice data according to the playing starting moment.

In an optional implementation manner, the presenting voice data in the conversation interface includes:

displaying the voice data in a session interface in a data strip mode, wherein the length of the data strip is matched with the playing duration;

the obtaining of the play start time for the voice data specified by the user through the play time trigger interface includes:

and acquiring a trigger position of a user on the data strip through the play time trigger interface, and determining the play starting time according to the trigger position, the length of the data strip and the play duration.

In an optional implementation manner, the determining the play start time according to the trigger position, the length of the data strip, and the play duration includes:

determining the proportion of the trigger position relative to the length of the data strip, and determining the play starting time according to the proportion and the play duration;

alternatively, the first and second electrodes may be,

dividing the data strip into a plurality of length ranges according to the length, wherein each length range is configured with a target playing starting time determined according to the playing duration;

and determining the target playing starting time corresponding to the length range to which the trigger position belongs as the playing starting time.

In an optional implementation manner, the providing a play time trigger interface for the voice data in the session interface includes:

a movement interface is shown on the data strip that is movable along the data strip.

In an optional implementation manner, the detecting that the set interface start condition is met includes:

and detecting that the first playing of the voice data is finished.

According to a second aspect of the embodiments of the present disclosure, there is provided a voice playing apparatus, the apparatus including:

a voice data processing module configured to: acquiring the playing duration of voice data, and displaying the voice data in a session interface;

an interface provision module configured to: determining that the playing time is greater than a preset time threshold, and providing a playing time trigger interface aiming at the voice data in the session interface after detecting that a set interface starting condition is met;

a time of day acquisition module configured to: acquiring the play starting time appointed by a user for the voice data through the play time trigger interface;

a playback module configured to: and playing the voice data according to the playing starting moment.

In an optional implementation manner, the voice data processing module includes:

a presentation submodule configured to: displaying the voice data in a session interface in a data strip mode, wherein the length of the data strip is matched with the playing duration;

the time acquisition module comprises a time determination submodule configured to:

In an optional implementation manner, the time determining submodule includes:

a first determination submodule configured to: determining the proportion of the trigger position relative to the length of the data strip, and determining the play starting time according to the proportion and the play duration;

alternatively, the first and second electrodes may be,

a second determination submodule configured to: dividing the data strip into a plurality of length ranges according to the length, wherein each length range is configured with a target playing starting time determined according to the playing duration; and determining the target playing starting time corresponding to the length range to which the trigger position belongs as the playing starting time.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of the aforementioned voice playing method.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the aforementioned voice playing method.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the disclosure, the scheme can provide a play time trigger interface for voice data under the condition that a user needs the voice data, and the play start time for the voice data specified by the user can be obtained through the play time trigger interface; and then the electronic equipment can play the voice data according to the play starting moment under the designation of the user, so that the electronic equipment can flexibly play the voice according to the user requirement, the processing efficiency of voice playing is improved, and convenience is brought to the user.

According to the method and the device, the voice data can be displayed in a session interface in a data strip mode, the trigger position of the user on the data strip is obtained based on the play time trigger interface, and the play starting time is determined according to the trigger position, the length of the data strip and the play duration.

In the present disclosure, a ratio of the trigger position to the length of the data bar may be determined, and the play start time is determined according to the ratio and the play duration, so that the play start time desired by the user may be accurately determined; or, the data bar may be divided into a plurality of length ranges according to the length, and the target play start time corresponding to the length range to which the trigger position belongs is determined as the play start time.

In this disclosure, the play time trigger interface may be a mobile interface moving along the data strip, and through the mobile interface, the user can visually find that the play time trigger interface is started, and the user can flexibly move the mobile interface as required to input the play start time desired by the user.

In the disclosure, the play time trigger interface may be provided after the first play of the voice data is finished, so that the data processing burden of the electronic device may be reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1A is a schematic diagram illustrating an instant messaging system according to an exemplary embodiment of the present disclosure.

Fig. 1B is a flowchart illustrating a voice playback method according to an exemplary embodiment of the present disclosure.

Fig. 1C is a schematic diagram illustrating one presentation of voice data according to an exemplary embodiment of the present disclosure.

Fig. 1D is a schematic illustration of a voice playback shown in the present disclosure according to an exemplary embodiment.

Fig. 1E is a schematic illustration of a voice playback shown in the present disclosure according to an exemplary embodiment.

Fig. 2 is a block diagram of a voice playback device shown in accordance with an exemplary embodiment of the present disclosure.

Fig. 3 is a block diagram of another speech playback device shown in accordance with an exemplary embodiment of the present disclosure.

Fig. 4 is a block diagram of another speech playback device shown in accordance with an exemplary embodiment of the present disclosure.

Fig. 5 is a block diagram of another speech playback device shown in accordance with an exemplary embodiment of the present disclosure.

Fig. 6 is a block diagram of another speech playback device shown in accordance with an exemplary embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of a voice playing apparatus according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

As shown in fig. 1A, fig. 1A is a schematic view of an instant messaging system, and the electronic device in fig. 1A is illustrated by taking a smart phone as an example. User a and user B are in instant messaging, and user a and user B each hold a smart phone, and fig. 1A shows session interfaces of both parties. The user A records 60-second voice data by using a voice function provided by an instant messaging program in the smart phone, the voice data is displayed on a session interface of the smart phone of the user A, and meanwhile the instant messaging program also sends the voice data to the user B. And after the instant communication program in the smart phone of the user B receives the voice data, the instant communication program is displayed in the session interface.

The user A and the user B can click the voice data shown in the conversation interface, and the instant messaging program can play the voice data after detecting the trigger of the user. In the related art, after a user triggers voice data shown in a session interface, an instant messaging program is usually played from an initial time of the voice data until the voice data is played. Assuming that the user needs to listen to the middle part or the latter part of the voice data, the instant messenger program is still played from the initial time of the voice data, and the user still needs to listen from the initial time. The above processing method for voice playing causes the electronic device to need to repeatedly process data, which causes a large data processing burden to the electronic device, and the voice playing processing efficiency is low, which also brings great inconvenience to the user.

Based on this, the embodiment of the present disclosure provides a voice playing scheme, where for voice data, a play time trigger interface for the voice data may be provided under a condition that a user needs the voice data, and a play start time for the voice data specified by the user may be obtained through the play time trigger interface; and then the electronic equipment can play the voice data according to the play starting moment under the designation of the user, so that the electronic equipment can flexibly play the voice according to the user requirement, the processing efficiency of voice playing is improved, and convenience is brought to the user.

As shown in fig. 1B, fig. 1B is a flowchart illustrating a voice playing method according to an exemplary embodiment of the present disclosure, which can be applied to an electronic device with an audio processing function, and includes the following steps:

in step 101, a playing duration of the voice data is obtained, and the voice data is displayed in a session interface.

In step 102, it is determined that the playing time length is greater than a preset time length threshold, and after it is detected that a set interface start condition is met, a playing time trigger interface for the voice data is provided in the session interface.

In step 103, a play start time for the voice data specified by the user is obtained through the play time trigger interface.

In step 104, the voice data is played according to the play start time.

The electronic device of the disclosed embodiments may include a personal computer, a notebook computer, a smart phone, a tablet computer, a personal digital assistant, or other devices with audio processing capabilities. The embodiment of the disclosure can be applied to instant messaging application programs, and the instant messaging application programs have a voice function and can provide functions for recording voice data and sending the voice data to other session users.

Generally, an electronic device with an audio processing function is provided with a microphone (also called a microphone, or a microphone), which is a transducer that converts sound into an electronic signal. The microphone can convert the sound collected by the microphone into an analog electric signal, and then the analog electric signal is converted into an analog signal which can be identified by equipment in a digital mode.

Taking an Android operating system as an example, in the Android, a sound can be recorded by using an AudioRecord (one of the Android operating systems is dedicated to processing an audio), and the recorded sound can be set as a PCM sound. To represent sound in computer language, the sound must be digitized. The sound is digitized, the most common way being by pulse Code modulation (pcm). The sound is converted into a series of voltage-varying signals by a microphone. To convert such a voltage-varying signal into a PCM signal requires three processes: sampling, quantizing and encoding. In this way, the AudioRecord can collect sounds and generate voice data. Based on the above manner, the instant messaging program can acquire the voice data obtained by recording the voice by the user, and can determine the playing time of the voice data by reading the attribute information of the voice data and the like.

Considering that the voice data recorded by the user may be longer or shorter, the embodiment of the present disclosure provides a play time trigger interface for the longer voice data. Specifically, a play time trigger interface may be provided under a condition that the voice data is long, and the specific length may be flexibly configured according to actual needs, and the implementation manner may be that a duration threshold is preset, for example, a preset duration threshold such as 30 seconds, 40 seconds, or 60 seconds, and if the play duration of the voice data is greater than the preset duration threshold, it may be determined that the voice data is long, and a play time trigger interface needs to be provided for a user, so that the user may flexibly select a play start time.

The embodiment of the present disclosure provides a manner of triggering the interface at the play time after detecting that the set interface start condition is satisfied. As an example, a user typically needs to listen from the beginning after receiving voice data, and after listening all, the user may need to listen again for some portion of the voice data, or be interrupted at a certain position during listening and need to listen again from the interrupted position. In this case, the play time trigger interface may not be provided for the first time when the voice data is played for the first time, and after detecting that the voice data is played for the first time, it is determined that the set interface start condition is satisfied, and then the play time trigger interface for the voice data is provided in the session interface.

In other examples, the play time trigger interface may be provided after the voice data is received, and thus, the detection that the set interface start condition is met may be to determine that the voice data is received.

In other examples, the playing time trigger interface may be provided after determining that the user needs to specify the playing start time of the voice data, and in particular, in application, whether the user needs to specify the playing start time of the voice data may be flexibly determined in multiple ways, for example, a button or an option that can be triggered by the user is provided while the voice data is displayed, or other keys are set, and when the user triggers the button or the option or presses the set key, the playing start time of the voice data that the user needs to specify may be determined, so as to start the playing time trigger interface. In practical application, the interface start condition may be flexibly configured according to needs, which is not limited in the embodiment of the present disclosure.

Upon receiving the voice data, the electronic device may present the voice data in a conversation interface for review and listening by the user. In practical applications, the voice data display mode can be flexibly configured, for example, the voice data can be represented and displayed through a pre-designed icon, picture, window or shape. The play time trigger interface may also be implemented in various ways, such as a line showing different play times, a circle showing different play times, an option for a user to select different play times, and the like.

In order to reduce the implementation difficulty and reduce the processing load on the electronic device, in an embodiment of the present disclosure, displaying the voice data in the session interface may include: displaying the voice data in a session interface in a data bar mode; fig. 1C is a schematic diagram illustrating voice data according to an exemplary embodiment of the present disclosure, where the data strip in fig. 1C has a length, and the length matches with the playing duration of the voice data. As an example, the length of the data bar may have a positive correlation with the playing time, so that the user may more flexibly select the playing start time of the voice data.

The play start time in the embodiment of the present disclosure refers to any time between the initial time and the end time in the voice data, for example, if the play duration of the voice data is 120 seconds, the play start time may be any time between 0 seconds and 120 seconds. Based on the data bar in fig. 1C, a play time trigger interface may be implemented to detect whether the user triggers the data bar and selects the time at which the user wants to start playing through the interface.

In practical applications, the play time trigger interface may be a visual interface, for example, as shown in fig. 1D, the play time trigger interface may be a mobile interface moving along the data strip, through which a user may visually check that the play time trigger interface is started, and the user may flexibly move the mobile interface as needed to input a play start time desired by the user. In other alternative implementations, the play time trigger interface may also be a non-visual interface, for example, a detection program that detects whether a trigger event occurs to the data bar, and detects a trigger position of the user triggering the data bar through the detection program.

When the play start time desired by the user is determined, specifically, the play start time may be determined according to the trigger position, the length of the data strip, and the play duration, where the trigger position of the user on the data strip is obtained through the play time trigger interface. In practical application, when the trigger position of the user on the data strip is obtained, the time corresponding to the trigger position may be displayed according to the trigger position, as shown in fig. 1E, when the user triggers on the data strip, the corresponding time may be determined according to the trigger position of the user and displayed, so that the user may accurately select the desired play start time.

In practical applications, in order to accurately determine the play start time desired by the user, as an example, a ratio of the trigger position to the length of the data bar may be determined, and the play start time may be determined according to the ratio and the play duration.

For example, assuming that the playing time of the voice data is 60 seconds, the length of the data bar is 30 millimeters, the user desires to start playing from 30 seconds, the user can trigger the middle position of the data bar, and the electronic device determines that the playing start time is 30 seconds according to the ratio of the triggered position of the user to the length of the data bar being close to one half and the playing time of 60 seconds, so that the voice data is played from the position of 30 seconds.

In other examples, considering an electronic device configured with a touch screen, a user usually uses fingers to trigger a data bar on the screen, because the fingers of the user are different in size, the size of a session interface on the electronic device is limited, the size of an area of the data bar for displaying voice data is also limited, it is detected that the trigger position of the user may cover a certain range, in order to reduce implementation difficulty and improve voice playing efficiency, the data bar may be divided into a plurality of length ranges according to the lengths, and each length range is configured with a target playing start time determined according to the playing duration; and determining the target playing starting time corresponding to the length range to which the trigger position belongs as the playing starting time.

As an example, assuming that the play time of the voice data is 60 seconds, the length of the data bar is 30 millimeters, the data bar is divided into 6 length ranges, and each length range is 5 millimeters; according to the playing duration, each length range corresponds to 6 seconds, the playing starting time corresponding to the first length range is 0 second, the playing starting time corresponding to the second length range is 11 seconds, and the like. By detecting the trigger position of the finger of the user on the data bar, assuming that the trigger position is in the 4 th length range, the play start time is determined to be 41 th second.

Corresponding to the embodiment of the voice playing method, the disclosure also provides an embodiment of a voice playing device and an electronic device applied by the voice playing device.

As shown in fig. 2, fig. 2 is a block diagram of a voice playback apparatus shown in the present disclosure according to an exemplary embodiment, the apparatus including:

a voice data processing module 21 configured to: acquiring the playing duration of voice data, and displaying the voice data in a session interface;

an interface providing module 22 configured to: determining that the playing time is greater than a preset time threshold, and providing a playing time trigger interface aiming at the voice data in the session interface after detecting that a set interface starting condition is met;

a time acquisition module 23 configured to: acquiring the play starting time appointed by a user for the voice data through the play time trigger interface;

a play module 24 configured to: and playing the voice data according to the playing starting moment.

As can be seen from the above embodiments, the scheme can provide a play time trigger interface for the voice data in response to the need of the user, and the play start time for the voice data specified by the user can be obtained through the play time trigger interface; and then the electronic equipment can play the voice data according to the play starting moment under the designation of the user, so that the electronic equipment can flexibly play the voice according to the user requirement, the processing efficiency of voice playing is improved, and convenience is brought to the user.

As shown in fig. 3, fig. 3 is a block diagram of another voice playing apparatus shown in the present disclosure according to an exemplary embodiment, on the basis of the foregoing embodiment shown in fig. 2, the voice data processing module 21 includes:

a presentation submodule 211 configured to: displaying the voice data in a session interface in a data strip mode, wherein the length of the data strip is matched with the playing duration;

the time obtaining module 23 includes a time determining submodule 231 configured to:

It can be seen from the above embodiments that voice data can be displayed in a session interface in a manner of a data bar, a trigger position of a user on the data bar is obtained based on a play time trigger interface, and the play start time is determined according to the trigger position, the length of the data bar, and the play duration.

As shown in fig. 4, fig. 4 is a block diagram of another speech playing apparatus shown in the present disclosure according to an exemplary embodiment, on the basis of the foregoing embodiment shown in fig. 3, the time determining sub-module 231 includes:

a first determination submodule 2311 configured to: determining the proportion of the trigger position relative to the length of the data strip, and determining the play starting time according to the proportion and the play duration;

alternatively, the first and second electrodes may be,

a second determining sub-module 2312 configured to: dividing the data strip into a plurality of length ranges according to the length, wherein each length range is configured with a target playing starting time determined according to the playing duration; and determining the target playing starting time corresponding to the length range to which the trigger position belongs as the playing starting time.

As can be seen from the above embodiments, a ratio of the trigger position to the length of the data bar may be determined, and the play start time may be determined according to the ratio and the play duration, so that the play start time desired by the user may be accurately determined; or, the data bar may be divided into a plurality of length ranges according to the length, and the target play start time corresponding to the length range to which the trigger position belongs is determined as the play start time.

As shown in fig. 5, fig. 5 is a block diagram of another speech playing apparatus shown in the present disclosure according to an exemplary embodiment, on the basis of the foregoing embodiment shown in fig. 3, the interface providing module 22 includes an interface showing sub-module 221 configured to:

It can be seen from the above embodiment that the play time trigger interface may be a mobile interface moving along the data strip, and through the mobile interface, the user can visually find that the play time trigger interface is started, and the user can flexibly move the mobile interface as required to input the play start time desired by the user.

As shown in fig. 6, fig. 6 is a block diagram of another speech playing apparatus shown in the present disclosure according to an exemplary embodiment, on the basis of the foregoing embodiment shown in fig. 2, the interface providing module 22 includes a detection sub-module 222 configured to:

and detecting that the first playing of the voice data is finished.

It can be seen from the above embodiments that the play time trigger interface may be provided after the first play of the voice data is finished, so as to reduce the data processing load of the electronic device.

Correspondingly, the embodiment of the present disclosure provides an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

Accordingly, the disclosed embodiments also provide a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the aforementioned voice playing method.

The implementation process of the functions and actions of each module in the voice playing device is specifically described in the implementation process of the corresponding steps in the method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.

Fig. 7 is a schematic structural diagram of a voice playback apparatus according to an exemplary embodiment.

As shown in fig. 7, a speech playback apparatus 700 is shown according to an exemplary embodiment, where the apparatus 700 may be a computer, a mobile phone, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, or the like.

Referring to fig. 7, apparatus 700 may include one or more of the following components: processing components 701, memory 702, power components 703, multimedia components 704, audio components 705, input/output (I/O) interfaces 706, sensor components 707, and communication components 708.

The processing component 701 generally controls the overall operation of the device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 701 may include one or more processors 709 to execute instructions to perform all or part of the steps of the methods described above. Further, processing component 701 may include one or more modules that facilitate interaction between processing component 701 and other components. For example, the processing component 701 may include a multimedia module to facilitate interaction between the multimedia component 704 and the processing component 701.

The memory 702 is configured to store various types of data to support operations at the apparatus 700. Examples of such data include instructions for any application or method operating on device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 702 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 703 provides power to the various components of the device 700. The power components 703 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 700.

The multimedia component 704 includes a screen that provides an output interface between the device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 704 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 700 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 705 is configured to output and/or input audio signals. For example, audio component 705 includes a Microphone (MIC) configured to receive external audio signals when apparatus 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 702 or transmitted via the communication component 708. In some embodiments, audio component 705 also includes a speaker for outputting audio signals.

The I/O interface 702 provides an interface between the processing component 701 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 707 includes one or more sensors for providing various aspects of state assessment for the apparatus 700. For example, sensor assembly 707 may detect an open/closed state of apparatus 700, the relative positioning of components, such as a display and keypad of apparatus 700, the change in position of apparatus 700 or a component of apparatus 700, the presence or absence of user contact with apparatus 700, the orientation or acceleration/deceleration of apparatus 700, and the change in temperature of apparatus 700. The sensor assembly 707 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 707 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 707 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 708 is configured to facilitate communication between the apparatus 700 and other devices in a wired or wireless manner. The apparatus 700 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 708 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 708 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs)

(ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic component, for performing the above-described method.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 702 comprising instructions, executable by the processor 709 of the apparatus 700 to perform the above method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Wherein the instructions in the storage medium, when executed by the processor, enable the apparatus 700 to perform one of the aforementioned voice playback methods.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above description is only exemplary of the present disclosure and should not be taken as limiting the disclosure, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method for playing speech, the method comprising:

acquiring the playing time of voice data, and displaying the voice data in a session interface in a data strip mode, wherein the length of the data strip is matched with the playing time;

determining that the playing time is greater than a preset time threshold, and providing a playing time trigger interface aiming at the voice data in the session interface after detecting that a set interface starting condition is met; the detection that the set interface starting condition is met comprises any one of the following steps: detecting that the first playing of the voice data is finished, determining that the voice data is received, or detecting that a preset button, an option or a key is triggered;

determining a target playing starting time corresponding to the length range to which the trigger position belongs as the playing starting time;

and playing the voice data according to the playing starting moment.

2. The method of claim 1, wherein providing a play-time trigger interface for the voice data in the conversational interface comprises:

3. A voice playback apparatus, characterized in that the apparatus comprises:

an interface provision module configured to: determining that the playing time is greater than a preset time threshold, and providing a playing time trigger interface aiming at the voice data in the session interface after detecting that a set interface starting condition is met; the detection that the set interface starting condition is met comprises any one of the following steps: detecting that the first playing of the voice data is finished, determining that the voice data is received, or detecting that a preset button, an option or a key is triggered;

a playback module configured to: playing the voice data according to the playing starting moment;

wherein, the voice data processing module comprises: a presentation submodule configured to: displaying the voice data in a session interface in a data strip mode, wherein the length of the data strip is matched with the playing duration;

the time acquisition module comprises: the time determining submodule is configured to divide the data strip into a plurality of length ranges according to the length, and each length range is configured with a target playing starting time determined according to the playing duration; and determining the target playing starting time corresponding to the length range to which the trigger position belongs as the playing starting time.

4. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of the method of any one of claims 1 to 2.

5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 2.