CN110062281B

CN110062281B - Play progress adjusting method and terminal equipment thereof

Info

Publication number: CN110062281B
Application number: CN201910455952.5A
Authority: CN
Inventors: 郑传亮
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2021-08-24
Anticipated expiration: 2039-05-29
Also published as: CN110062281A

Abstract

The invention discloses a playing progress adjusting method and terminal equipment thereof, which are used for solving the problems that in the prior art, if a plurality of characters are displayed in an image frame, a user needs to manually adjust the playing progress in a complicated way, and the operation is complicated and time-consuming. The method comprises the following steps: acquiring the playing duration of a target image frame including target text content in a video to be played; determining the complexity of the target text content based on the playing duration of the target image frame and the number of the characters in the target text content; based on the complexity of the target text content and a preset complexity threshold, adjusting the playing time of the target image frame and the playing time of the target text content; the preset complexity threshold is determined based on the number of the words read by a preset number of users in a preset time period.

Description

Play progress adjusting method and terminal equipment thereof

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a playing progress adjusting method and terminal equipment thereof.

Background

At present, with the continuous increase of network bandwidth, video contents such as movies and small videos have gradually become an important way for people to obtain information and entertainment. The existing video content usually carries some content information in units of frames, and usually a frame is used for gathering information such as images, sounds, characters and the like, so as to help a user receive more information when watching a video.

When a user watches a video, for example, when a person in the video first enters a field, the user usually attaches an explanatory text of the person. If some other characters are displayed on the video at this time, the user may not be able to read all the characters displayed on the video within the character display time period.

In addition, in the video playing process, images and sound are used as the preferential receiving carriers of the video, so that the receiving time of the user for the characters in the video can be reduced, and the slow receiving speed of some users for the characters can be a factor influencing that the characters displayed in the video cannot be completely read by the user. In this case, the user can only manually adjust the video playing progress to acquire all the characters displayed in the video.

Disclosure of Invention

The embodiment of the invention provides a playing progress adjusting method and terminal equipment thereof, and aims to solve the problems that in the prior art, if a plurality of characters are displayed in an image frame, a user needs to manually adjust the playing progress in a complicated manner, and the operation is complicated and time-consuming.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a method for adjusting a play speed, including:

acquiring the playing duration of a target image frame including target text content in a video to be played;

determining the complexity of the target text content based on the playing duration of the target image frame and the number of the characters in the target text content;

based on the complexity of the target text content and a preset complexity threshold, adjusting the playing time of the target image frame and the playing time of the target text content;

the preset complexity threshold is determined based on the number of the words read by a preset number of users in a preset time period.

In a second aspect, an embodiment of the present invention provides a terminal device, including:

the device comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring the playing duration of a target image frame including target text content in a video to be played;

the determining unit is used for determining the complexity of the target character content based on the playing time of the target image frame and the number of characters in the target character content;

the adjusting unit is used for adjusting the playing time of the target image frame and the playing time of the target text content based on the complexity of the target text content and a preset complexity threshold;

In a third aspect, an embodiment of the present invention provides a terminal device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method for adjusting play progress according to the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the steps of the method for adjusting a playing progress according to the first aspect are implemented.

In the embodiment of the invention, the playing time of the target image frame including the target text content in the video to be played can be obtained, the complexity of the target text content is determined based on the playing time of the target image frame and the number of the characters in the target text content, and finally the playing time of the target image frame and the target text content is adjusted based on the complexity of the target text content and a preset complexity threshold. Therefore, when the text contents corresponding to certain image frames in the video watched by the user are excessive, the playing time of the image frames and the corresponding target text contents can be automatically adjusted according to the complexity of the text contents and the preset complexity threshold value, the text contents in the image frames can be obtained by the user after the user can see clearly, and convenient experience is provided for the user to efficiently obtain the text contents in the video.

Drawings

Fig. 1 is a schematic flowchart illustrating a specific implementation process of a method for adjusting a play progress according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of another terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.

In order to solve the problems that in the prior art, if a certain frame of image has more characters displayed and a user cannot acquire all the displayed characters during the playing of the frame of image, the playing progress needs to be adjusted manually, and the operation is complicated and time-consuming, the invention provides a playing progress adjusting method.

For convenience of description, the following description will be made of an embodiment of the method taking as an example that an execution subject of the method is a terminal device capable of executing the method. It is understood that the implementation of the method by the terminal device is only an exemplary illustration, and should not be construed as a limitation of the method.

Specifically, the method for adjusting the playing progress provided by the invention comprises the following steps: firstly, acquiring the playing duration of a target image frame including target text content in a video to be played; then, determining the complexity of the target character content based on the playing time of the target image frame and the number of characters in the target character content; and finally, adjusting the playing time of the target image frame and the playing time of the target text content based on the complexity of the target text content and a preset complexity threshold, wherein the preset complexity threshold is determined based on the text reading number of a preset number of users in a preset time period.

In the embodiment of the invention, the playing time of the target image frame including the target text content in the video to be played can be obtained, the complexity of the target text content is determined based on the playing time of the target image frame and the number of the characters in the target text content, and finally the playing time of the target image frame and the target text content is adjusted based on the complexity of the target text content and a preset complexity threshold. Therefore, when the text contents corresponding to certain image frames in the video watched by the user are excessive, the playing time of the image frames and the corresponding target text contents can be automatically adjusted according to the complexity of the text contents and the preset complexity threshold value, the user can be ensured to clearly see and acquire the text contents in the image frames, and convenient experience is provided for the user to efficiently acquire the text contents in the video.

The following describes in detail an implementation process of the method with reference to a schematic flow chart of a specific implementation of the method for adjusting a play progress shown in fig. 1, including:

step 101, acquiring the playing duration of a target image frame including target text content in a video to be played;

it should be understood that the target text content is text content displayed in the video to be played, and includes text of subtitles, text of object introduction, and the like displayed on the image frame in the video to be played.

In the video playing process, the characters displayed in the video are only a way for assisting the user to understand the content of the video, and besides the characters displayed in the video, in the video playing process, the user can understand the content of the video through the image frames displayed in the video and the sound in the video. This makes it easier for some videos with simpler content to receive and understand what the video is intended to convey when the user watches such videos.

For some videos with complex contents, for example, videos that include descriptions of many characters in the early stage of video playing, it is difficult for users to receive all contents transmitted in the videos while watching the videos, and some characters may be missed, or the users may not pay attention to the conversation between the characters while watching the descriptions of tasks. If a user wants to acquire all characters displayed in a certain image frame in the video playing process, the user needs to manually adjust a playing progress bar of the video to find character information missed by the user.

Obviously, this will bring a very tedious user experience to the user, and in order to solve this problem, the embodiment of the present invention may determine the playing time length of the target image frame corresponding to the target text content in the video to be played, determine the playing speed of the target text content based on the playing time length, and then reduce the playing speed of the target text content and the target image frame when the playing speed of the target text content is higher, so that the user can obtain the complete target text content.

Optionally, since the playing durations corresponding to the image frames in the video are often the same, for example, one image frame often corresponds to a preset number of milliseconds, based on this, the playing duration of the target image frame corresponding to the target text content in the video to be played is determined, specifically, the number of the target image frames may be determined, and the playing duration of the target image frame is determined based on the number of the target image frames.

Optionally, when the video to be played is ready to be played after being loaded on the player, the start playing time and the end playing time of the video to be played may be obtained, and more specifically, the start playing time and the end playing time of each image frame of the video to be played may also be obtained.

102, determining the complexity of target character content based on the playing time of a target image frame and the number of characters in the target character content;

it should be understood that the target text content can be obtained by scanning each image frame of the video to be played through an artificial intelligence text recognition technology based on deep learning, and can also be obtained by combining with a subtitle file of the video to be played. After the target text content is obtained, in order to determine the text playing speed of the target text, the target text content may be subjected to word segmentation to obtain the number of texts in the target text content.

It should be understood that, when performing word segmentation on the target text content, the space and punctuation marks in the target text content may be removed in units of text characters in the target text content, and the number of text characters after the space and punctuation marks are removed is taken as the number of texts in the target text content. When the target text content contains Chinese characters, the number of the Chinese characters is taken as the number of the characters in the target text content; when the target character content contains English characters, the number of English words is taken as the number of characters in the target character content; when the target text content contains bilingual characters in chinese and english, the number of system languages (for example, the system language is chinese) of the device playing the video to be played should be the number of texts in the target text content.

Because different character quantity determining modes can be adopted based on the language types (such as Chinese characters and English characters) in the target character content, the character complexity in videos of different language types can be accurately determined, and the film watching experience of a user when watching videos of different language types is further ensured.

Optionally, since many objects appearing for the first time, such as people and animals of non-common sense, usually have text descriptions attached to the objects, such as names of people and types of animals, when the video is played, in this case, in order to accurately determine the amount of text displayed, the target text content in the embodiment of the present invention may further include text information of the object introduction in the target image frame.

Optionally, since the objects appearing for the first time in the video, such as the object introduction characters of the characters and the animals of non-common sense class, are often displayed only when the objects appear for the first time, and are often not displayed when the objects appear for the second time, in this case, in order to avoid the user forgetting the roles of the objects, the embodiment of the present invention may convert the object introduction character information into an audio file, establish a mapping relationship between the audio file and the corresponding objects in the video to be played, and play the audio file when the user clicks or presses the objects for a long time. Then, when the target text content includes at least one of subtitle text and object introduction text information, the method further includes:

identifying object introduction text information in the target text content;

identifying a target object matched with the object introduction character information from the target image frame;

converting the object introduction text information into an audio file, and establishing a first mapping relation between the audio file and the target object;

receiving a first input of a target object by a user;

in response to a first input, the audio file is played.

The first input comprises input modes such as clicking, long pressing, sliding and the like.

Optionally, after identifying the target object matching the object introduction text information from the target image frame, a mapping relationship between the object introduction text information and the target object may be established, and when the user clicks or presses the objects for a long time, the object introduction file information is displayed, and then the method further includes:

establishing a second mapping relation between the object introduction text information and the target object;

receiving a second input of the target object by the user;

in response to the second input, the object introduction text information is displayed.

The second input comprises input modes such as clicking, long pressing, sliding and the like.

And 103, adjusting the playing time of the target image frame and the playing time of the target text content based on the complexity of the target text content and a preset complexity threshold.

The preset complexity threshold value can be determined based on the word reading number of the preset number of users in the preset time period, and the playing time of the target image frame and the playing time of the target word content are adjusted based on the complexity of the target word content and the preset complexity threshold value, so that the playing time of the target image frame and the playing time of the target word content are adjusted by effectively utilizing the reading capability of the users, and the adjusted playing time of the target image frame and the adjusted playing time of the target word content are more in line with the reading habit of the users.

Optionally, since the text information that the brain can receive in the preset time period is often limited, if the number of the text in the target text content included in the target image frame is too large, the user often cannot see clearly, that is, cannot acquire all the text content included in the image frame, which affects the understanding of the user on the video. Taking the target text content as an example that the target text content simultaneously includes a chinese subtitle and an english subtitle, in this case, since the capability of simultaneously reading the chinese subtitle and the english subtitle within a unit time of the user is limited, in order to ensure that the user can obtain all text information in the target text content, the preset complexity threshold in the embodiment of the present invention includes a first preset complexity threshold and a second preset complexity threshold, where the first preset complexity threshold is smaller than the second preset complexity threshold, for example, the second preset complexity threshold may be a preset multiple of the first preset complexity threshold.

Then, based on the complexity of the target text content and a preset complexity threshold, adjusting the playing time of the target image frame and the playing time of the target text content includes:

under the condition that the complexity of the target text content is greater than or equal to a first preset complexity threshold, based on the playing stopping time of the target image frame, the complexity of the target text content, the difference value between the first preset complexity threshold and a preset slow-down parameter, the playing time of the target image frame and the playing time of the target text content are prolonged;

and under the condition that the complexity playing speed of the target text content is greater than or equal to a second preset complexity threshold value, pausing playing the video to be played.

And under the condition that the complexity of the target text content is greater than or equal to a first preset complexity threshold, based on the playing stopping time of the target image frame, the complexity of the target text content, the difference value between the first preset complexity threshold and a preset slow-down parameter, the playing time length of the target image frame and the playing time length of the target text content are prolonged. On one hand, when the user watches the video, the user can be ensured to clearly see the target character content on the video picture; on the other hand, according to the complexity of the target text content, the difference value between the first preset complexity threshold value and the preset slow-down parameter, the playing time length of the target image frame and the playing time length of the target text content are prolonged, and compared with the original constant playing, the video watching experience of a user is improved.

Optionally, since the start playing time and the end playing time of the target image frame can be obtained, after the end playing time of the target image frame is reached, the next image frame of the target image frame is played, and at this time, the target text content corresponding to the target image frame also disappears in the video playing picture. That is to say, the playing time axis of the target text content can be adjusted according to the playing stop time of the target image frame, the complexity of the target text content, the difference between the first preset complexity thresholds and the preset slowing-down parameter.

Then, based on the stop playing time of the target image frame, the complexity of the target text content, the difference between the first preset complexity threshold and the preset slow-down parameter, the method for prolonging the playing time of the target image frame and the playing time of the target text content includes:

delaying the playing stopping time E ' of the target image frame and the playing stopping time E ' of the target character content according to a formula E ' = E + A multiplied by beta;

wherein, E is the moment when the target image frame stops playing, a is the difference between the complexity of the target text content and a first preset complexity threshold, and β is a preset slowing parameter.

Assuming that the number of words that the brain can read in 1 second is 10, a first preset complexity threshold can be set to 10 characters/second, a second preset complexity threshold is set to 50 characters/second, and a mitigation coefficient is set to β, where β ∈ (0, 1), assuming β = 0.1. The process of determining the complexity of the target textual content S then includes:

firstly, segmenting a target character content S;

if the characters in the target character content are Chinese, one Chinese can be used as a vocabulary; if the characters in the target character content are English, one English word can be used as a vocabulary;

then, obtaining word segmentation results of { A1, A2, A3, … …, Ax }, and counting to obtain the number of characters in the group as N;

and finally, determining the complexity L = N × α, wherein α =1/10, and if L is greater than or equal to 10 characters/second, delaying the playing stop time of the target image frame and the playing stop time of the target text content to be E' = E + A × β = E +0.1A according to a preset slow-down parameter 0.1, wherein E is the playing stop time of the target image frame and the target text content, and A is the difference between the complexity of the target text content and a first preset complexity threshold.

It should be understood that, after delaying the end playing time of the target image frame and the target text content according to the preset slow-down parameter, in order not to affect the normal playing of the image frame after the target image frame in the video to be played, the start playing time and the end playing time of each image frame after the target image frame in the video to be played should be correspondingly delayed.

Referring to fig. 2, a schematic structural diagram of a terminal device 200 according to an embodiment of the present invention is shown in fig. 2, which includes an obtaining unit 201, a determining unit 202, and an adjusting unit 203, where:

an obtaining unit 201, configured to obtain a playing duration of a target image frame including target text content in a video to be played;

a determining unit 202, configured to determine complexity of the target text content based on a playing duration of the target image frame and a number of texts in the target text content;

an adjusting unit 203, configured to adjust a playing time of the target image frame and a playing time of the target text content based on the complexity of the target text content and a preset complexity threshold;

Optionally, in an embodiment, the preset complexity threshold includes a first preset complexity threshold and a second preset complexity threshold, and the adjusting unit 203 is configured to:

under the condition that the complexity playing speed of the target text content is greater than or equal to a second preset complexity threshold value, pausing playing the video to be played;

wherein the first preset complexity threshold is smaller than the second preset complexity threshold.

Optionally, in an embodiment, the adjusting unit 203 is configured to:

delaying the playing stop time E ' of the target image frame and the playing stop time E ' of the target character content according to a formula E ' = E + A multiplied by beta;

wherein E is the moment when the target image frame stops playing, a is the difference between the complexity of the target text content and the first preset complexity threshold, and β is a preset slowing parameter.

Optionally, in an embodiment, the target text content includes at least one of subtitle text and object introduction text, and the terminal device further includes:

a first identification unit 204, configured to identify object introduction text information in the target text content;

a second identifying unit 205, configured to identify, from the target image frame, a target object matching the object introduction text information;

a first establishing unit 206, configured to convert the object introduction text information into an audio file, and establish a first mapping relationship between the audio file and the target object;

a first receiving unit 207, configured to receive a first input to the target object by a user;

a first response unit 208 for playing the audio file in response to the first input.

Optionally, in an embodiment, after the second identifying unit 205 identifies, from the target image frame, a target object matching the object introduction text information, the terminal device further includes:

a second establishing unit 209, configured to establish a second mapping relationship between the object introduction text information and the target object;

a second receiving unit 210, configured to receive a second input to the target object by the user;

a second response unit 211, configured to respond to the second input and display the object introduction text information.

The terminal device 200 provided in the embodiment of the present invention can implement each process of the method for adjusting the playing progress of the terminal device in the method embodiment of fig. 1, and is not described herein again to avoid repetition.

Figure 3 is a schematic diagram of a hardware structure of a terminal device 300 for implementing various embodiments of the present invention,

the terminal device 300 includes but is not limited to: radio frequency unit 301, network module 302, audio output unit 303, input unit 304, sensor 305, display unit 306, user input unit 307, interface unit 308, memory 309, processor 310, and power supply 311. Those skilled in the art will appreciate that the terminal device configuration shown in fig. 3 does not constitute a limitation of the terminal device, and that the terminal device may include more or fewer components than shown, or combine certain components, or a different arrangement of components. In the embodiment of the present invention, the terminal device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

The processor 310 is configured to obtain a playing duration of a target image frame including target text content in a video to be played; determining the complexity of the target text content based on the playing duration of the target image frame and the number of the characters in the target text content; based on the complexity of the target text content and a preset complexity threshold, adjusting the playing time of the target image frame and the playing time of the target text content; the preset complexity threshold is determined based on the number of the words read by a preset number of users in a preset time period.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 301 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 310; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 301 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 301 can also communicate with a network and other devices through a wireless communication system.

The terminal device provides the user with wireless broadband internet access through the network module 302, such as helping the user send and receive e-mails, browse webpages, access streaming media, and the like.

The audio output unit 303 may convert audio data received by the radio frequency unit 301 or the network module 302 or stored in the memory 309 into an audio signal and output as sound. Also, the audio output unit 303 may also provide audio output related to a specific function performed by the terminal device 300 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 303 includes a speaker, a buzzer, a receiver, and the like.

The input unit 304 is used to receive audio or video signals. The input Unit 304 may include a Graphics Processing Unit (GPU) 3041 and a microphone 3042, and the Graphics processor 3041 processes image data of a still picture or video obtained by an image capturing apparatus (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 303. The image frames processed by the graphic processor 3041 may be stored in the memory 309 (or other storage medium) or transmitted via the radio frequency unit 301 or the network module 302. The microphone 3042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 301 in case of the phone call mode.

The terminal device 300 further comprises at least one sensor 305, such as light sensors, motion sensors and other sensors. Specifically, the light sensor includes an ambient light sensor that adjusts the brightness of the display panel 3061 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 3061 and/or a backlight when the terminal device 300 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the terminal device posture (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration identification related functions (such as pedometer, tapping), and the like; the sensors 305 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 306 is used to display information input by the user or information provided to the user. The Display unit 306 may include a Display panel 3061, and the Display panel 3061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 307 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal device. Specifically, the user input unit 307 includes a touch panel 3071 and other input devices 3072. The touch panel 3071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 3071 (e.g., operations by a user on or near the touch panel 3071 using a finger, a stylus, or any suitable object or attachment). The touch panel 3071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 310, and receives and executes commands sent by the processor 310. In addition, the touch panel 3071 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 307 may include other input devices 3072 in addition to the touch panel 3071. Specifically, the other input devices 3072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein.

Further, the touch panel 3071 may be overlaid on the display panel 3061, and when the touch panel 3071 detects a touch operation on or near the touch panel, the touch operation is transmitted to the processor 310 to determine the type of the touch event, and then the processor 310 provides a corresponding visual output on the display panel 3061 according to the type of the touch event. Although the touch panel 3071 and the display panel 3061 are shown as two separate components in fig. 3 to implement the input and output functions of the terminal device, in some embodiments, the touch panel 3071 and the display panel 3061 may be integrated to implement the input and output functions of the terminal device, which is not limited herein.

The interface unit 308 is an interface for connecting an external device to the terminal apparatus 300. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 308 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the terminal apparatus 300 or may be used to transmit data between the terminal apparatus 300 and an external device.

The memory 309 may be used to store software programs as well as various data. The memory 309 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 309 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 310 is a control center of the terminal device, connects various parts of the entire terminal device by using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs and/or modules stored in the memory 309 and calling data stored in the memory 309, thereby performing overall monitoring of the terminal device. Processor 310 may include one or more processing units; preferably, the processor 310 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 310.

Terminal device 300 may also include a power supply 311 (e.g., a battery) for providing power to various components, and preferably, power supply 311 may be logically connected to processor 310 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

In addition, the terminal device 300 includes some functional modules that are not shown, and are not described in detail here.

Preferably, an embodiment of the present invention further provides a terminal device, which includes a processor 310, a memory 309, and a computer program stored in the memory 309 and capable of running on the processor 310, where the computer program is executed by the processor 310 to implement each process of the foregoing method for adjusting a play progress, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the foregoing method for adjusting a play progress, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for adjusting a playing progress, comprising:

the preset complexity threshold is determined based on the number of the words read by a preset number of users in a preset time period, and comprises a first preset complexity threshold and a second preset complexity threshold;

the adjusting the playing time of the target image frame and the playing time of the target text content based on the complexity of the target text content and a preset complexity threshold comprises:

2. The method of claim 1, wherein the extending the playing duration of the target image frame and the playing duration of the target text content based on the stop playing time of the target image frame, the difference between the complexity of the target text content and the first preset complexity threshold, and a preset slowing parameter comprises:

3. The method of claim 1, wherein the target textual content includes at least one of subtitle text and object introduction text information, the method further comprising:

identifying object introduction text information in the target text content;

identifying a target object matched with the object introduction text information from the target image frame;

receiving a first input of the target object by a user;

in response to the first input, playing the audio file.

4. The method of claim 3, wherein after identifying a target object from the target image frame that matches the object introduction text information, the method further comprises:

receiving a second input of the target object by the user;

and responding to the second input, and displaying the object introduction text information.

5. A terminal device, comprising:

the preset complexity threshold is determined based on the number of the words read by the preset number of users in a preset time period, the preset complexity threshold comprises a first preset complexity threshold and a second preset complexity threshold, and the adjusting unit is used for:

6. The terminal device of claim 5, wherein the adjusting unit is configured to:

7. The terminal device of claim 5, wherein the target text content includes at least one of subtitle text and object introduction text, the terminal device further comprising:

the first identification unit is used for identifying object introduction character information in the target character content;

the second identification unit is used for identifying a target object matched with the object introduction character information from the target image frame;

the first establishing unit is used for converting the object introduction text information into an audio file and establishing a first mapping relation between the audio file and the target object;

the first receiving unit is used for receiving a first input of a user to the target object;

a first response unit to play the audio file in response to the first input.

8. The terminal device according to claim 7, wherein the second identifying unit, after identifying a target object matching the object introduction text information from the target image frame, further comprises:

the second establishing unit is used for establishing a second mapping relation between the object introduction text information and the target object;

the second receiving unit is used for receiving a second input of the target object by the user;

and the second response unit is used for responding to the second input and displaying the object introduction character information.

9. A terminal device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of adjusting a play progress according to any one of claims 1 to 4.

10. A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the play progress adjustment method according to any one of claims 1 to 4.