WO2020119455A1

WO2020119455A1 - Method for repeating word or sentence during video playback, and electronic device

Info

Publication number: WO2020119455A1
Application number: PCT/CN2019/121187
Authority: WO
Inventors: 王有俊; 祁毅; 郭志刚; 胡惠淳
Original assignee: 华为技术有限公司
Priority date: 2018-12-10
Filing date: 2019-11-27
Publication date: 2020-06-18
Also published as: CN109756770A

Abstract

The present application provides a video playback method and an electronic device. The method can be employed in a video playback process according to requirements of a user, and enables the user to click a text unit in subtitles at any time to perform learning of the same; or the method can be employed during the video playback process to identify a text unit input by the user, highlight and display the text unit, and automatically and repeatedly play, by means of configuring a current repeat mode, the text unit or a video clip corresponding to a sentence containing the text unit, so as to achieve repetition of the text unit. The method can facilitate effective English learning for users, simplify user operation, and improve the user experience.

Description

Method and electronic device for re-reading words or sentences during video playback

This application requires the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on December 10, 2018, with the application number 201811502510.3 and the invention titled "method and electronic device for re-reading words or sentences during video playback", all of which are The content is incorporated into this application by reference.

Technical field

The present application relates to the field of electronic technology, and in particular to a method and electronic device for repetition of words or sentences in a video process.

Background technique

In the process of English learning, most of the existing English learning resources stay at the stage of words, interpretation and example sentences. The use of video resources by English learning applications is scarce.

In addition, the correlation between English video resources and English words is low, and it is not easy for users to learn English words or sentences through video resources during the English learning process. Moreover, in the existing English video, the user can only repeat the English words or sentences by dragging the progress bar, and the time during the dragging process of the progress bar is not precise enough. In order to realize the re-reading of words or sentences, if the video content is edited, it will affect the duration of the video itself, and the user experience is poor.

Summary of the invention

The present application provides a video playback method and electronic device, which can realize the repetition of words or sentences during video playback, improve the user's English learning effect, and improve the user experience.

In a first aspect, a method for playing a video is provided. The method includes: displaying a first interface, the first interface displaying a first video being played and subtitles of the first video, the subtitles of the first video including the first A text unit and a second text unit; when the first segment of the first video corresponding to the first text unit is played, the first segment is automatically played repeatedly on the first interface; the user is detected on the first interface In response to the first operation, a second interface is displayed, and the first information associated with the first text unit is displayed on the second interface.

It should be understood that the text unit (for example, the first text unit and the second text unit) in the user learning process may be a single word, or the text unit may include phrases, sentences, etc. of multiple words, which is not limited in this application.

It should be noted that the first interface can correspond to many possible situations, for example, in the full-screen playback mode of the electronic device, the first interface refers to the video playback display area, the display area displays subtitles, etc.; or the electronic device is in the non-full-screen mode In addition to the display area of the video playback, other display areas may be included, such as an area for analyzing details of the text unit, and other multiple learning video resources associated with the text unit. This application does not limit this.

Here, the first information may refer to the parsing details of the first text unit, such as the English, American pronunciation, Chinese interpretation, Chinese and English example sentences, and related learning videos of the first text unit.

It should also be understood that the second interface is an interface after the user clicks on the text unit in the subtitle to display the analysis details of the text unit, for example, an interface after the video playback interface pops up the analysis window of the text unit. Corresponding to many possible situations of the first interface, for example, there are many possible situations of the second interface. Based on the display of the first interface, the second interface also includes analysis after the user clicks the text unit in the subtitles Pop-ups. For example, the parsing window of the text unit may include the English, American pronunciation, Chinese interpretation, Chinese and English example sentences of the text unit, and the like. This application does not limit this.

Optionally, the detailed content presented in this application such as word analysis and analysis details may come from the English dictionary built into the system itself, or may be associated with other English online dictionaries, etc., which is not limited in this application.

It should also be understood that in addition to clicking on words in video subtitles, users can also click on phrases in video subtitles. For example, some words appear basically in the form of phrases, and during the user's click, the parsing of the phrase may appear in the form of phrases. For example, when the detailed content presented in the word analysis interface is associated with an English dictionary, the word in the dictionary mainly appears in the form of a phrase. When the user clicks on the word while watching the video, the analysis or interpretation of the phrase may also pop up. limited.

The above-mentioned method for learning words, phrases or sentences during video playback can realize learning English words while watching videos. According to user needs, click to enter word learning at any time, which can simplify the search operation of word learning. Increase the convenience of user learning and enhance user experience.

With reference to the first aspect, in some implementations of the first aspect, the first segment is a video segment corresponding to the first text unit, or the first segment is a video segment corresponding to the entire sentence where the first text unit is located .

Exemplarily, after setting the current repeating mode to the word repeating through the repeating setting box, the video segment from the start time to the end time corresponding to the keyword input by the user is repeatedly played on the first interface.

Or, after setting the current repetition mode to sentence repetition through the repetition setting box, the entire sentence where the keyword entered by the user is repeatedly played on the first interface, and the video segment from the start time corresponding to the sentence to the end time is repeated.

With reference to the first aspect and the foregoing implementation manners, in some implementation manners of the first aspect, the number of times to repeatedly play the first segment of the first video is a preset number of times preset by the system or set by the user.

Exemplarily, the number of repeated playbacks may be set by the user in the background, or may be the system default. In the case that the user does not set the number of repeated playbacks, the number of repeated playbacks may be the system default 3 times. This application does not limit this.

With reference to the first aspect and the foregoing implementation manners, in some implementation manners of the first aspect, the method further includes: before displaying the first interface, displaying a third interface, the third interface displaying the first text unit input by the user , The third interface includes second information and a first video list associated with the first text unit, the first video list includes the first video; detecting a second operation of the user on the third interface; in response to the In the second operation, the first interface is displayed.

With reference to the first aspect and the foregoing implementation manners, in some implementation manners of the first aspect, the first video list further includes a second video, and the second operation is used to select the first video.

With reference to the first aspect and the foregoing implementation manners, in some implementation manners of the first aspect, the method further includes: detecting a third operation of the user on the third interface; in response to the third operation, displaying a fourth interface, The fourth interface includes second information of the first text unit and a second video list, the second video list includes at least one video, and the subtitle of each video in the second video list includes the first text unit.

With reference to the first aspect and the foregoing implementation manners, in some implementation manners of the first aspect, when the second interface is displayed, the first video is paused to play.

During the video playback, just click the repeat setting control to set the repeat mode and repeat times, or the user clicks any word in the subtitle and enters the learning mode of the word. When the word analysis box of the word pops up, the video is paused. of.

In a possible implementation, when the user clicks to close the repeat setting control, subtitle setting control, or close the word analysis box of the key word to exit the learning mode of the word, the video can automatically continue to play without requiring the user to click to play again Controls. Or the video is in a paused playback state, and the user can click the playback controls on the video display interface to continue playing the video. This application does not limit this.

The method for re-reading the words in the video provided above can realize the learning of English words in the ordinary viewing process. According to the needs of users, click to enter the word learning at any time, which can simplify the search operation of word learning and increase the convenience of user learning. Improve user experience.

When the user is relaxing and watching the movie, if he wants to learn a certain word in the subtitle, he can click the word in the subtitle through the above method to enter the learning mode of the word. In another scenario, the user may need to learn some key words in a targeted manner, for example, the user needs to learn multiple words in a certain vocabulary set, and the vocabulary set may be

English level

4 or 6 vocabulary or IELTS vocabulary. In this scenario, the present application also provides a word learning method that can provide users with centralized learning of multiple words included in a certain vocabulary set.

In a possible implementation manner, the subtitles of the learning video and the progress bar of the learning video are presented at different positions in the playback interface. If possible, if both the subtitle of the learning video and the progress bar of the learning video are displayed in the same position area of the playback interface, when the user clicks on the word contained in the subtitle, the click effect may be poor, for example, the process of clicking the word may be triggered by mistake Clicked the progress bar. This situation is particularly prominent when the display screen of the electronic device where the user watches the learning video is small, or the interface for playing the learning video is small. Therefore, the subtitles of the learning video and the progress bar of the learning video are displayed at different positions in the playback interface. For example, the progress bar of the video playback is displayed at the top of the screen, and the subtitles are displayed at the bottom of the screen. Of course, it can also be the playback interface. Other locations can improve the sensitivity of user operations and improve the user experience. This application does not limit the position of the video playback progress bar and the display of subtitles.

With reference to the first aspect and the foregoing implementation manners, in some implementation manners of the first aspect, the display effect of the first text unit is different from the display effect of the second text unit.

Specifically, during the video playback, the words learned by the user are highlighted in the subtitles. When the "message" that the user wants to learn appears in the subtitles, the "message" is different from the display of other words in the subtitles to remind the user of the position of the word and pay attention to the pronunciation of the word.

In addition, when the user is relaxing and watching the movie, if he wants to learn a certain word in the subtitle, he can click the word in the subtitle through the above method to enter the learning mode of the word. In another scenario, the user may need to learn some key words in a targeted manner, for example, the user needs to learn multiple words in a certain vocabulary set, and the vocabulary set may be

English level

4 or 6 vocabulary or IELTS vocabulary.

In this scenario, the present application also provides a word learning method that can provide users with centralized learning of multiple words included in a certain vocabulary set. That is, for a video resource, all words in a vocabulary set in the video resource can be extracted in advance. Before selecting a movie, the user can view all the key words included in each movie, and can click to select the key words to be learned, or the user can select the movie resource according to the number of key words, for example, select the movie with the most key words included As the currently watched movie, click to enter the movie learning mode.

It should be understood that in the process of playing the learning video described above, when the video is played to the screen where the word is located, the word is highlighted to realize automatic replay, and when the replay is completed, the video continues to play without being affected. Or, when the video plays to the screen where the word is located, a word parsing box of the key word pops up, and after the time for popping the word parsing box reaches a preset duration, the video continues to play without being affected.

Through the above-mentioned method to realize the repetition of words or sentences in the video, users can use the English subtitles of the video to watch the English video while using the word index and the player's ability to rewind, etc. to realize the repetition and follow-up of English words And other functions to improve the user's English learning effect and improve the user experience.

In a second aspect, an electronic device is provided, including: one or more processors; a memory; multiple application programs; and one or more programs, wherein the one or more programs are stored in the memory when the When one or more programs are executed by the processor, the electronic device is caused to perform the following steps: display a first interface, the first interface displays the first video being played and the subtitles of the first video, and the subtitles of the first video It includes a first text unit and a second text unit; when the first segment of the first video corresponding to the first text unit is played, the first segment is automatically and repeatedly played on the first interface; on the first interface The first operation of the user is detected on the top; in response to the first operation, a second interface is displayed, and the first information associated with the first text unit is displayed on the second interface.

With reference to the second aspect, in some implementations of the second aspect, the first segment is a video segment corresponding to the first text unit, or the first segment is a video segment corresponding to the entire sentence where the first text unit is located .

With reference to the second aspect and the foregoing implementation manners, in some implementation manners of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: before displaying the first interface, display A third interface that displays the first text unit input by the user, the third interface includes second information associated with the first text unit and a first video list, and the first video list includes the first video Detecting the user's second operation on the third interface; in response to the second operation, displaying the first interface.

With reference to the second aspect and the foregoing implementation manners, in some implementation manners of the second aspect, the first video list further includes a second video, and the second operation is used to select the first video.

With reference to the second aspect and the foregoing implementation manners, in some implementation manners of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: detect the user on the third interface A third operation; in response to the third operation, displaying a fourth interface, the fourth interface includes the second information of the first text unit and a second video list, the second video list includes at least one video, the second The subtitle of each video in the video list includes the first text unit.

With reference to the second aspect and the foregoing implementation manners, in some implementation manners of the second aspect, when the second interface is displayed, the first video is paused for playback.

With reference to the second aspect and the foregoing implementation manners, in some implementation manners of the second aspect, the number of times to repeatedly play the first segment of the first video is a preset number of times preset by the system or set by the user.

With reference to the second aspect and the foregoing implementation manners, in some implementation manners of the second aspect, the display effect of the first text unit is different from the display effect of the second text unit.

In a third aspect, the present application provides an apparatus, which is included in an electronic device, and the apparatus has a function of implementing the above aspect and the possible implementation manners of the above aspect. The function can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the above functions. For example, display modules or units, detection modules or units, processing modules or units, etc.

In a fourth aspect, the present application provides an electronic device, including: a touch display screen, wherein the touch display screen includes a touch-sensitive surface and a display; a camera; one or more processors; a memory; a plurality of application programs; and one or Multiple computer programs. Among them, one or more computer programs are stored in the memory, and the one or more computer programs include instructions. When the instruction is executed by the electronic device, the electronic device is caused to execute the video playback method in any possible implementation of any one of the above aspects.

In a fifth aspect, the present application provides an electronic device, including one or more processors and one or more memories. The one or more memories are coupled to one or more processors. The one or more memories are used to store computer program code. The computer program codes include computer instructions. When the one or more processors execute the computer instructions, the electronic device is executed. A video playback method in any possible implementation of any of the above aspects.

In a sixth aspect, the present application provides a computer storage medium, including computer instructions, which, when the computer instructions run on an electronic device, cause the electronic device to perform any possible video playback method of any one of the above aspects.

In a seventh aspect, the present application provides a computer program product that, when the computer program product runs on an electronic device, causes the electronic device to perform any possible video playback method according to any one of the above aspects.

BRIEF DESCRIPTION

FIG. 1 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.

FIG. 2 is a schematic diagram of a software structure of an electronic device provided by an embodiment of the present application.

FIG. 3 is a schematic diagram of a user interface for realizing word repetition in a video provided by an embodiment of the present application.

FIG. 4 is a schematic diagram of another example of a user interface for learning words during movie watching provided by an embodiment of the present application.

FIG. 5 is a schematic diagram of another example of a user interface for learning words during movie viewing provided by an embodiment of the present application.

6 is a schematic diagram of an example of an HMM model provided by an embodiment of the present application.

FIG. 7 is an implementation flowchart of an example of an acoustic model generation and forced alignment process provided by this application.

8 is a flowchart of an example of generating a word time series provided by an embodiment of the present application.

9 is a schematic diagram of an example of a content association index provided by an embodiment of the present application.

10 is a flowchart of an implementation of a word or sentence repetition process provided by an embodiment of the present application.

FIG. 11 is a schematic diagram of an implementation process of a method for implementing word or sentence repetition in a video provided by an embodiment of the present application.

12 is a schematic flowchart of a video playback method provided by an embodiment of the present application.

13 is a schematic diagram of an example of an electronic device provided by an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments of the present application, unless otherwise stated, “/” means or, for example, A/B may mean A or B; “and/or” in this text is merely a description of the related object The association relationship indicates that there can be three relationships, for example, A and/or B, which can indicate: there are three situations in which A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, “plurality” refers to two or more than two.

In the following, the terms "first" and "second" are used for description purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of this embodiment, unless otherwise stated, the meaning of "plurality" is two or more.

The embodiment of the present application provides a method for re-reading words or sentences in a video, which can be applied to an electronic device or a separate application program, which can implement the method for re-reading words or sentences in the present application. Specifically, based on the English subtitles of the video, while watching the English video, the user can use the word index and the player's ability to rewind, etc. to realize the functions of repeating and following English words, improving the user's English learning effect and improving the user Experience.

The method for realizing the repetition of words or sentences in the video provided by the embodiments of the present application can be applied to mobile phones, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (augmented reality (AR)/virtual reality (VR) devices, notebooks) On electronic devices such as computers, ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), etc., the embodiments of the present application do not limit the specific types of electronic devices.

Exemplarily, FIG. 1 shows a schematic structural diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, key 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine some components, or split some components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and an image signal processor (image)signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Among them, different processing units may be independent devices, or may be integrated in one or more processors.

The controller may be the nerve center and command center of the electronic device 100. The controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.

The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Avoid repeated access, reduce the waiting time of the processor 110, thus improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. Interfaces can include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit, sound, I2S) interface, pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous transceiver (universal asynchronous) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and And/or universal serial bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may respectively couple the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100.

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 can transfer audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering the call through the Bluetooth headset.

The PCM interface can also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to realize the function of answering the phone call through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.

The MIPI interface can be used to connect the processor 110 to peripheral devices such as the display screen 194 and the camera 193. MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI) and so on. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to realize the display function of the electronic device 100.

The GPIO interface can be configured via software. The GPIO interface can be configured as a control signal or a data signal. In some embodiments, the GPIO interface may be used to connect the processor 110 to the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that conforms to the USB standard, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones. The interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic description, and does not constitute a limitation on the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also use different interface connection methods in the foregoing embodiments, or a combination of multiple interface connection methods.

The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive the charging input of the wired charger through the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, and the like. The power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be set in the same device.

The wireless communication function of the electronic device 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.

Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and so on. The mobile communication module 150 can receive electromagnetic waves from the antenna 1 and filter, amplify, etc. the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor and convert it to electromagnetic wave radiation through the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.

The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low-frequency baseband signal to be transmitted into a high-frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110, and may be set in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), and global navigation satellites that are applied to the electronic device 100. System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency-modulates and filters electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive the signal to be transmitted from the processor 110, frequency-modulate it, amplify it, and convert it to electromagnetic wave radiation through the antenna 2.

In some embodiments, the antenna 1 of the electronic device 100 and the mobile communication module 150 are coupled, and the antenna 2 and the wireless communication module 160 are coupled so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include a global mobile communication system (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long-term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a beidou navigation system (BDS), and a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite-based augmentation systems (SBAS).

The electronic device 100 realizes a display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connecting the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations, and is used for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos and the like. The display screen 194 includes a display panel. The display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.

The electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP processes the data fed back by the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, and the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, which is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be set in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects it onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.

The digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the electronic device 100 is selected at a frequency point, the digital signal processor is used to perform Fourier transform on the energy at the frequency point.

Video codec is used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between neurons in the human brain, it can quickly process the input information and can continue to self-learn. The NPU can realize applications such as intelligent recognition of the electronic device 100, such as image recognition, face recognition, voice recognition, and text understanding.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.

The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the electronic device 100. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area may store an operating system, at least one function required application programs (such as sound playback function, image playback function, etc.). The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100 and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc.

The audio module 170 is used to convert digital audio information into analog audio signal output, and also used to convert analog audio input into digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also known as "handset", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be received by bringing the receiver 170B close to the ear.

The microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a person's mouth, and input a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C. In addition to collecting sound signals, it may also implement a noise reduction function. In other embodiments, the electronic device 100 may further include three, four, or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.

The headset interface 170D is used to connect wired headsets. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile electronic device (open terminal) platform (OMTP) standard interface, and the American Telecommunications Industry Association (cellular telecommunications industry association of the United States, CTIA) standard interface.

The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors. The capacitive pressure sensor may be at least two parallel plates with conductive materials. When force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure according to the change in capacitance. When a touch operation is applied to the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch position but have different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.

The gyro sensor 180B may be used to determine the movement posture of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y, and z axes) may be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the jitter angle of the electronic device 100, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to counteract the jitter of the electronic device 100 through reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude using the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.

The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a clamshell machine, the electronic device 100 may detect the opening and closing of the clamshell according to the magnetic sensor 180D. Furthermore, according to the detected opening and closing state of the holster or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.

The acceleration sensor 180E can detect the magnitude of acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching and pedometers.

The distance sensor 180F is used to measure the distance. The electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the electronic device 100 may use the distance sensor 180F to measure distance to achieve fast focusing.

The proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light outward through the light emitting diode. The electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100. The electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the brightness of ambient light. The electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touch.

The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access to application locks, fingerprint photographing, and fingerprint answering calls.

The temperature sensor 180J is used to detect the temperature. In some embodiments, the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs performance reduction of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In some other embodiments, when the temperature is below another threshold, the electronic device 100 heats the battery 142 to avoid the abnormal shutdown of the electronic device 100 due to the low temperature. In some other embodiments, when the temperature is below another threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

Touch sensor 180K, also known as "touch panel". The touch sensor 180K may be provided on the display screen 194, and the touch sensor 180K and the display screen 194 constitute a touch screen, also called a "touch screen". The touch sensor 180K is used to detect a touch operation acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the location where the display screen 194 is located.

The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the pulse of the human body and receive a blood pressure beating signal. In some embodiments, the bone conduction sensor 180M may also be provided in the earphone and combined into a bone conduction earphone. The audio module 170 may parse out the voice signal based on the vibration signal of the vibrating bone block of the voice part acquired by the bone conduction sensor 180M to realize the voice function. The application processor may analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M to implement the heart rate detection function.

The key 190 includes a power-on key, a volume key, and the like. The key 190 may be a mechanical key. It can also be a touch button. The electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100.

The motor 191 may generate a vibration prompt. The motor 191 can be used for vibration notification of incoming calls and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as taking pictures, playing audio, etc.) may correspond to different vibration feedback effects. For the touch operation in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminder, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. Touch vibration feedback effect can also support customization.

The indicator 192 may be an indicator light, which may be used to indicate a charging state, a power change, and may also be used to indicate a message, a missed call, a notification, and the like.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards may be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 can also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as call and data communication. In some embodiments, the electronic device 100 uses eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

The software system of the electronic device 100 may adopt a layered architecture, event-driven architecture, micro-core architecture, micro-service architecture, or cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example to exemplarily explain the software structure of the electronic device 100.

2 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application. The layered architecture divides the software into several layers, each of which has a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom are the application layer, the application framework layer, the Android runtime and the system library, and the kernel layer. The application layer may include a series of application packages.

As shown in FIG. 2, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.

The application framework layer provides an application programming interface (application programming interface) and programming framework for applications at the application layer. The application framework layer includes some predefined functions.

As shown in FIG. 2, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.

The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, intercept the screen, etc.

Content providers are used to store and retrieve data, and make these data accessible to applications. The data may include videos, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls for displaying text and controls for displaying pictures. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes an SMS notification icon may include a view that displays text and a view that displays pictures.

The phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call state (including connection, hang up, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear after a short stay without user interaction. For example, the notification manager is used to notify the completion of downloading, message reminders, etc. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, the text message is displayed in the status bar, a sound is emitted, the electronic device vibrates, and the indicator light flashes.

Android runtime includes core library and virtual machine. Android runtime is responsible for the scheduling and management of the Android system.

The core library contains two parts: one part is the function function that Java language needs to call, and the other part is the core library of Android.

The application layer and the application framework layer run in the virtual machine. The virtual machine executes the java files of the application layer and the application framework layer into binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, security and exception management, and garbage collection.

The system library may include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.

The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least the display driver, camera driver, audio driver, and sensor driver.

For ease of understanding, the following embodiments of the present application will take the electronic device having the structure shown in FIG. 1 and FIG. 2 as an example, combined with the accompanying drawings and application scenarios, to specifically explain the method of implementing word repetition in the video provided by the embodiments of the present application.

Most of the existing English learning resources stay at the stage of words, interpretation and example sentences. The use of video resources by English learning applications is relatively scarce. For English learning videos, most of them are based on existing video splits, and the split videos have a low correlation with English word learning, which is not convenient for users to use English video resources or sentences in English learning process. Learn.

Therefore, this application will propose a method to realize the repetition of words or sentences in a video, which can be based on the English subtitles of the video, while watching the English video, and realize the functions of repetition and follow-up of English words to improve the user's English learning effect. Improve user experience.

3 is a schematic diagram of an example of a graphical user interface (GUI) for repetition of words in a video provided by an embodiment of the present application. This application will use a mobile phone as an electronic device to introduce in detail the implementation of words or sentences in a video provided by the present application Repeat the method. FIG. 3(a) shows that in the unlocking mode of the mobile phone, the screen display system of the mobile phone displays the currently output interface content 301, which is the main interface of the mobile phone. The interface content 301 displays various third-party applications (applications, apps), such as Alipay, task card store, photo album, WeChat, card package, settings, camera, and applications for English learning provided in the embodiments of the present application The program is, for example, Fun V English shown in (a) in FIG. 3. It should be understood that the interface content 301 may also include other more applications, which is not limited in this application.

A user operation is input to the English learning application. The user operation may include a user's click operation on the icon of the English learning application displayed on the mobile phone. In response to the user's click operation, the main interface of the English learning shown in (b) in FIG. 3 is entered. The main interface may include multiple functional areas, such as a daily recommended English word learning area for listing some words and corresponding videos, and the user may receive daily pushes and click to learn the word. The top area on the main interface includes a search box 302, a browsing record control 303, and a message reminding control 304. Among them, the search box 302 is used by the user to enter a word and enter the learning mode of the word; the browsing record control 303 is used to record the user's search and learning record, which is convenient for the user to quickly find the learned word; the message reminder control 304 may include the system push News etc. The main interface may also include scene classification areas, such as different scene classifications such as restaurants, taxis, airplanes, conferences, airports, hotels, and shopping malls shown in (b) in FIG. 3. The user can click to select any scene, enter the scene category, and select English learning videos to learn.

As shown in (b) in FIG. 3, the user clicks on the search box 302, enters the word "message" to be learned, and enters the display interface shown in (c) in FIG. After the user inputs a word, the interface automatically displays the navigation bar 305 of the word, and the user clicks the navigation bar 305 of "message" to enter the word analysis interface shown in (d) of FIG. 3. In (d) of FIG. 3, the "message" parsing interface includes English, American pronunciation, Chinese interpretation, Chinese and English example sentences, and video example sentences. Optionally, the detailed content presented on the word analysis interface may come from the English dictionary built into the system itself, or may be associated with other English online dictionaries, etc., which is not limited in this application.

The word analysis interface also includes a word addition control 306. Clicking the word addition control 306 can add a word to the word book. The user can click the word in the word book to quickly enter the word analysis interface, simplifying the search operation.

As shown in (d) of FIG. 3, the video example sentence area may display all learning video resources related to the word. Optionally, the learning video resource may include video resources under different scene classifications. After the user inputs the words to be learned, the list of English learning videos in different scene classifications is obtained. The user can click to select any scene to obtain the English learning videos in the scene classification for learning. For example, the user can click on different scene categories such as restaurants, taxis, and airplanes to view the video resource of the word in different scenes. This application does not limit the classification of video resources.

In response to the click operation shown in (d) in FIG. 3, the user can enter the video learning mode of "message". The video learning mode interface is shown in (e) in FIG. 3, for example, the video is an excerpt A 36-second clip from the film "When Happiness Comes to Knock". The video learning mode interface includes a video playback area for playing learning videos related to "message"; it also includes a word analysis interface that displays the interpretation of "message" in detail, which is convenient for users to combine video scenes and Chinese interpretation at the same time. Learn, improve learning effect; In addition, slide up the video learning mode interface, you can also display the related videos of the same recommendation as shown in the area 308 in (h) in Figure 3, the related videos of the same recommendation can be the same as the current Learning videos related to words played, for example, learning videos in other scenes related to "message", or other videos that are the same as the current scene, for example, currently playing learning videos for "message" are videos under the restaurant scene category The related videos recommended by this category are also other learning videos under the restaurant scene classification, which is not limited in this application.

Optionally, during the video playback process shown in (e) of FIG. 3, the user can view the position of the time slice where the English word to be learned is located through the mark in the video playback progress bar. For example, in the currently playing 36-second "message" learning video, there is a user-visible logo at the 9th second position in the progress bar, which is used to mark the location of the time slot where "message" appears in the video. It should be understood that the learning video may include multiple identifiers, and the number of the identifiers matches the number of times the word appears in the video. In addition, during playback, the user can control the progress of the video playback by dragging the progress bar. For example, when the learning video is long, the user can drag the progress bar to the mark near the word for playback.

In a possible implementation, the words learned by the user are highlighted during the video playback. In the video playback process shown in (e) in FIG. 3, when the "message" that the user wants to learn appears in the subtitles, the "message" is different from the display of other words in the subtitles to remind the user of the word position.

The video playback area of "message" includes a repeat setting control 306, as shown in (f) of FIG. 3. After the user clicks on the repeat setting control 306, a repeat setting box 307 shown in (g) in FIG. 3 may pop up in the video playback area. In the repeat setting box 307, the setting options of the number of loops and the setting of subtitles are included.

Exemplarily, the user can select the content to be repeated through the cycle number setting option. For example, the user can click "None" to set to no repeat mode, that is, the word or sentence is not repeated during the current learning video playback process. Alternatively, the user can click "Word" to set the word repeat mode, that is, in the current "message" learning video playback process, when the "message" is played, the audio and video frames corresponding to the "message" will be played in a loop. Or, the user can click "sentence" to set the sentence repetition mode, that is, in the current "message" learning video playback process, when a sentence containing "message" is played, the audio and video corresponding to the sentence will be played in a loop. frame. For example, in the (h) diagram in Figure 3, the sentence "message" includes: "Yes, I'd like to leave message for Mr. Jay Twistle". In the sentence repetition mode, the audio and the corresponding sentence of the sentence will be played cyclically. Video frame.

It should be understood that during the learning video playback described above, when the video is played to the time slot where the word is located, the word is highlighted to realize automatic replay, and when the replay is completed, the video continues to play without being affected.

In a possible implementation manner, the number of loop playbacks may be set by the user in the background, or may be the system default. In the case that the user does not set the number of looped playbacks, the number of looped playbacks may be the system default 3 times. This application does not limit this.

In addition, the user can set the presentation form of the subtitles through the subtitle setting options. For example, the user can click the first control of the subtitles in the repeat setting box 307, which corresponds to the no subtitles mode, that is, no subtitles in Chinese or English are displayed during the learning video playback process. Alternatively, the user can click the second control "A" of the subtitle in the repeat setting box 307, which corresponds to the English subtitle mode, that is, only English subtitles are displayed during the learning video playback process. Alternatively, the user can click the third control "A+" of the subtitle in the repeat setting box 307, which corresponds to the full subtitle mode, that is, English subtitles and Chinese subtitles are simultaneously displayed during the learning video playback process.

It should be understood that during video playback, as long as you click the repeat setting control to set the repeat mode and repeat times, or the user clicks any word in the subtitle and enters the learning mode of the word, when the word analysis box of the word pops up, the video is Is paused.

It should also be understood that in the introduction of the above embodiments, a single word is used as an example in the user's learning process, such as "message". In the actual application process, the user can also input different text units such as phrases and sentences. This is not limited.

In a possible implementation, when the user clicks to close the repeat setting control or close the word analysis box of the key word to exit the learning mode of the word, the video can continue to play, or the video is paused, and the user can click the video display The playback controls on the interface continue to play the video. This application does not limit this.

The method for re-reading words or sentences in the video provided by the present application as described above can provide users with an environment more conducive to learning English. Based on the method of this application, the user can select video resources in different scenarios according to the English words to be learned. During the video learning process, based on the English subtitles of the video, the user can watch the English video while realizing the functions of repeating and following the English words, improving the user's English learning effect and improving the user experience.

In another possible implementation, the above-mentioned method for repetition of words or sentences in a video can be applied to videos in addition to applications specifically for learning English (such as the aforementioned Fun V English) Play applications, such as existing Youku videos, Tencent videos, YouTube, etc., this application does not limit. FIG. 4 is a schematic diagram of another example of a user interface for learning words during a viewing process provided by the present application, which will be described below in conjunction with FIG. 4.

Exemplarily, FIG. 4(a) shows that in the unlocking mode of the mobile phone, the screen display system of the mobile phone displays the currently output interface content 401, which is the main interface of the mobile phone. The interface content 401 shows a variety of third-party applications, including applications for users to watch movies, such as Huawei’s film and television application Huawei Movies. The user clicks the Huawei movie to enter the display interface of the Huawei movie shown in (b) of FIG. 4. As shown in the figure, the interface may include various classified film and television resources, as well as various recommended film and television resources, as shown in the wonderful recommended movie "When Happiness Comes to Knock". Click the recommended movie to enter the play mode of the movie.

Exemplarily, during the movie playback, when any word is played in the audio, the word is highlighted in the subtitles below the video. For example, in the video playback process shown in (c) of Figure 4, when the audio in the movie is played to close, close in the subtitle is different from the display of other words in the subtitle, which is used to clarify that each word is in the audio and The position in the subtitles is convenient for users to learn the pronunciation and interpretation of the word.

Exemplarily, the movie playback interface may include a loop number setting control 402 and a subtitle setting control 403. In the same way, the user can select the content to be repeated through the loop number setting control. For example, the user can click "None" to set to no repeat mode, that is, the word or sentence is not repeated during the current learning video playback process. Alternatively, the user can click "Word" to set the word repeat mode. Or, the user can click "sentence" to set the sentence repetition mode, that is, in the current learning video playback process, after the user sets, the audio and video frames corresponding to the sentence before or after the setting will be cyclically played.

It should be understood that the repeat setting at this time may default to the last word or the corresponding audio and video frames of the time slice when the user starts the setting, or may default to the previous word of the time slice when the user starts the setting or For the audio and video frames corresponding to the latter sentence, the user can also change the relationship between the repeated words or sentences and the time slice in which the setting mode is turned on. This application does not limit this.

For example, during the process of watching the movie, the user does not hear a clear sentence or word, or the user wants to learn the sentence or word. The user can directly click the repeat setting control to set the repeat mode and the number of repeat times to exit the repeat mode. , Repeat the previous sentence or word directly.

In the process of watching the movie, the user may encounter strange words. In a possible implementation manner, when the user wants to learn the strange word appearing in the video, the user can click the strange word in the video subtitle. As shown in (d) of FIG. 4, if the user wants to learn “close”, he can click “close” on the subtitle of the video playback interface to enter the interface shown in (e) of FIG. 4. That is, the user can enter the learning mode of the word by clicking on the word in the subtitle, as shown in (e) in FIG. 4, the close word resolution shown in the pop-up box 404 in the figure, and the pop-up box 404 includes a “details” control 403和 Add to wordbook control 406. Among them, the "details" control 405 is used for the user to quickly enter the parsing interface of the word as shown in (f) of FIG. 4, the parsing interface includes English, American pronunciation, Chinese interpretation, Chinese and English example sentences and related Users can view related learning content of the word and video resources related to the word on the word analysis interface. Similarly, the detailed content presented on the word analysis interface can come from the English dictionary built into the system itself, or it can be associated with other English online dictionaries, etc., which is not limited in this application. In addition, the user can click on the video resource of the word to learn. For the specific operation process, please refer to the related introduction in FIG. 3, which will not be repeated here.

It should be understood that in addition to clicking on words in video subtitles, users can also click on phrases in video subtitles. For example, some words appear basically in the form of phrases, and during the user's click, the parsing of the phrase may appear in the form of phrases. For example, when the detailed content presented in the word analysis interface is associated with an English dictionary, the word in the dictionary mainly appears in the form of a phrase. When the user clicks on the word while watching the video, the analysis or interpretation of the phrase may also pop up. limited.

It should also be understood that during video playback, as long as the repeat setting control is clicked to set the repeat mode and repeat times, or the user clicks any word in the subtitle and enters the learning mode of the word, when the word resolution box for the word pops up, the video They are all paused.

In a possible implementation manner, the subtitles of the learning video and the progress bar of the learning video are presented at different positions in the playback interface. If possible, if both the subtitle of the learning video and the progress bar of the learning video are displayed in the same position area of the playback interface, when the user clicks on the word contained in the subtitle, the click effect may be poor, for example, the process of clicking the word may be triggered by mistake Clicked the progress bar. This situation is particularly prominent when the display screen of the electronic device where the user watches the learning video is small, or the interface for playing the learning video is small. Therefore, the subtitles of the learning video and the progress bar of the learning video are presented at different positions in the playback interface, for example, as shown in FIG. 4(c) and FIG. 4(d), the video is played The progress bar is displayed at the top of the screen, and the subtitles are displayed at the bottom of the screen. Of course, it can also be in other positions of the playback interface, which can improve the sensitivity of user operations and improve the user experience. This application does not limit the position of the video playback progress bar and the display of subtitles.

In a possible implementation, when the user clicks to close the repeat setting control 402, the subtitle setting control 403, or close the word parsing box 404 of the key word to exit the learning mode of the word, the video can be automatically resumed without requiring the user Then click the playback control. Or the video is in a paused playback state, and the user can click the playback controls on the video display interface to continue playing the video. This application does not limit this.

The above-mentioned method for learning words in the process of watching movies can realize the learning of English words in the ordinary process of watching movies. According to the needs of users, click to enter the learning of words at any time, which can simplify the search operation of word learning and increase the user’s learning. Convenience, enhance user experience.

English level

FIG. 5 is a schematic diagram of another example of a user interface for learning words during movie viewing provided by an embodiment of the present application. For a video resource, all words in a vocabulary set in the video resource can be extracted in advance. For example, in the movie "Pirates of the Caribbean" shown in (a) of FIG. 5, all six levels of vocabulary included in the subtitles of the movie are extracted to form a key word set, as shown in the word list 503 in the key word area in the figure.

Optionally, in the video interface selected by the user, a word set 503 is displayed. The user can set the kind of the word set. For example, the user can click on the "emphasis word" control to set the word list 503 as a six-level vocabulary or an IELTS vocabulary. Alternatively, the user can click on the word in the word list 503 and list the selected word as the key word for video learning. Alternatively, the user may click on a word in the word list 503, and list all words in the word list 503 except the selected word as key words for video learning. This application does not limit this.

Through the above method, before selecting a movie, the user can view all the key words included in each movie, and can click to select the key words to be learned, or the user can select the movie resource according to the number of key words, for example, select the included key The movie with the most words is regarded as the currently watched movie, and click to enter the movie learning mode.

In a possible implementation manner, the user can check the position of the time slice where the key word in the movie is located through the mark in the video playback progress bar. The user can find the position of the key word by dragging the video progress bar during the viewing process.

In a possible implementation manner, the key words in the movie are highlighted during the video playback. In the video playback process shown in (b) of Figure 5, when the key word "abandon" in the movie appears in the subtitles, the display of "abandon" is different from the display of other words in the subtitles to remind the user The position and pronunciation of the word. For example, the subtitles are displayed in black as a whole. When key words appear in the subtitles, they are displayed in blue and highlighted to remind users to pay attention to the position of the word and related pronunciation.

In a possible implementation manner, the video playback area may include a loop number setting control 501 and a subtitle setting control 502, as shown in (b) of FIG. 5. The user can set the number of repetitions of the key word by clicking the cycle number setting control 501, and set the presentation form of the subtitle of the movie by clicking the subtitle setting control 502. The function is similar to that of the repeat setting control 306 described in the related description of FIG. 3 above. For the sake of simplicity, it will not be repeated here.

Optionally, the key words in the movie can be looped by default. For example, when the video plays to the key word, the word is looped three times by default, which can reduce the user's setting steps and improve the user's learning effect.

In a possible implementation, for all the words included in the movie subtitles, the user can click on any word and enter the learning mode of the word, as shown in (b) of FIG. 5, after clicking on the abandon Word parsing box. Similarly, the word parsing box includes a detail control and a control added to the word book. The user can click the detail control to enter the learning interface of abandon, which will not be repeated here.

Or, when the movie plays to the key word, a word analysis box 504 for the key word is automatically popped up, as shown in (b) of FIG. 5. Optionally, the time for popping up the word parsing box 504 can be set to a fixed duration, for example, the time for popping up the word parsing box 504 is 5 seconds, and the word parsing box 504 is automatically closed after 5 seconds. This application does not limit this.

It should be understood that during video playback, as long as the word analysis box 504 pops up to enter a certain word learning mode, the video is paused. Specifically, the user can click on any word in the subtitle and enter the learning mode of the word, or when the aforementioned movie is played to the key word, a word analysis box 504 of the key word automatically pops up to enter the word learning mode, the video They are all paused.

In a possible implementation, when the user clicks to close the word analysis box 504 of the key word to exit the learning mode of the word, or the display time of the word analysis box 504 reaches the set fixed duration, the word analysis box 504 is automatically closed After exiting the word learning mode, the video can continue to play after the word parsing box 504 is closed, or the video is in a paused state, and the user can click the playback controls on the video display interface to continue playing the video. This application does not limit this.

Through the above-mentioned method of re-reading words or sentences in a video, users can use English subtitles based on the video, while watching the English video, using the word index and the player's ability to rewind, etc. to realize the re-reading and follow-up of English words And other functions to improve the user's English learning effect and improve the user experience.

It should be understood that in the introduction of the above embodiments, a single word is taken as an example in the user's learning process. In the actual application process, the user can also input different text units such as phrases and sentences, which is not limited in this application.

It should also be understood that this article uses English learning as an example to introduce that users can realize the repetition of words or sentences in the process of learning English. This method is also applicable to video learning in other languages, which is not limited in this application.

The human-computer interaction embodiments of the present application are described in detail above with reference to FIGS. 3 to 5. In order to better understand the method of implementing word or sentence repetition in the video provided by the present application, the specific implementation process and algorithm principle are introduced below.

In the specific implementation process, the method for repetition of words or sentences provided by this application needs to generate a word search function associated with a video based on speech recognition technology, generate a correspondence index from multiple words to multiple videos, and generate a single video Correspondence index to multiple words enables users to search from words to related learning videos. In addition, this application reuses the word search function and the player's ability to retreat, and locates the start and end times and key frames of words to realize the repeat function. Specifically, it includes the following implementation steps:

Step 1: Generate an acoustic model

It should be understood that the acoustic model (acoustic model) is one of the most important parts in the speech recognition system. In speech recognition, the acoustic model is used to represent the relationship between the sound signal and the phoneme, or to represent each language unit that constitutes speech Relationship. Among them, phoneme is the smallest unit of pronunciation. Most current mainstream systems use Hidden Markov Model (HMM) for modeling. Hidden Markov model is the most common acoustic model. The concept of hidden Markov model is a discrete time domain finite state automaton. HMM means that the internal state of this Markov model is not visible to the outside world, and only visible to the outside world. The output value at each moment. Figure 6 is a schematic diagram of an HMM model. This application will use HMM as an example. In Fig. 6, 1 to 6 show each phoneme of a word, and 1 and 6 are the head and tail of the word. HMM can get the best phoneme, word and sentence sequence according to each probability.

For example, if we want to recognize good audio signals, according to the basic steps of speech recognition, we first need to split these two words into phonemes, for example:

good consists of 3 phonemes, divided into phonemes and expressed as: G IH0 D

morning is composed of 6 phonemes, divided into phonemes, the expression is: MAO1RNNIH0NG

Then, each phoneme is subjected to model training. This training is performed through a large number of speech signals. This application uses existing models, including monophone models of monophones and triphone models of triphones. Among them, the monophone model uses an HMM to represent one phoneme, and the triphone model uses an HMM to represent three phonemes. Because different pronunciations will change during continuous reading, for example, the continuous pronunciation of two words in English pronunciation may produce a new pronunciation. For example, the two words can and I are read consecutively, and they sound like "cannai" together. Therefore, you need to use multiple phonemes to represent the pronunciation of can.

Step 2: Force alignment

Forced alignment is a technique for obtaining the correct spelling and pronunciation of dictionary vocabulary through audio files and generating a point in time. Specifically, forced alignment actually uses the aforementioned acoustic model and alternative words. It is necessary to solve how to place these words, generate phonemes from the obtained audio signals, and how to connect the acoustic models together. . E.g:

The phoneme generated by good morning is: G IH0 D M AO1 R N IH0 NG

The system used in this application to implement steps one and two is the Kaldi algorithm, which is Kaldi's open source toolkit (please refer to http://kaldi-asr.org/doc/index.html ). 7 is an implementation flowchart of the process of generating an acoustic model and forcibly aligning provided by the present application. This includes:

701. Import a sample library to generate a single phoneme, and train a single phoneme model to obtain a monophone model.

Specifically, 701 includes a feature extraction process and a sound model establishment process. In the feature extraction process, regardless of the context, a large number of sample libraries of different contexts are obtained. Import the files related to the language model prepared in advance, extract the features of the sample, train the Gaussian (mixture model), GMM-based acoustic model for maximum likelihood estimation, and then perform iterative loop operation, constantly re-starting Estimate GMM and combine the results scattered on different processors.

702. According to the context of the situation, a triphone triphone model is generated based on the monophonic model.

Specifically, for example, Good is composed of 3 phonemes, and only 3 HMM models need to be established according to the monophone model (monophone). Considering the syntactic pronunciation effect of the context, that is, the context phoneme will affect the current pronunciation of the central phoneme and will produce a synergistic change, which is different from the individual pronunciation of the phoneme. In order to consider this effect, the use of triphone models (triphone) will appear thousands of HMM models, which can improve the accuracy of speech recognition.

The traditional triphone method is model binding, that is, normalized triphone, using a posterior smoothing method. Or, if the pronunciation types of the context are similar, the impact on the current phoneme is similar, then these data can be clustered. The Kald algorithm can automatically generate a problem set, and automatically cluster into a class based on the similarity of the phonemes themselves.

703. Perform LDA+MLLT to learn the maximum difference between each phoneme and optimize feature extraction.

Specifically, the linear discriminant analysis (LDA) algorithm uses the projection method to project the feature vector into a space with a lower dimension, so that the projected points will be differentiated by category in the projected space. Closer. That is, the LDA algorithm uses a change matrix to achieve the purpose of dimensionality reduction for feature vectors, so that the distribution within the sample is condensed, and the distribution between the samples is alienated, so that the extracted features are more representative and make the classification better.

The maximum likelihood linear transformation (MLLT) can use a linear transformation matrix to decorrelate the parameter feature vector under the maximum likelihood (ML) criterion, so that in the new space, the model and the training set Likelihood is increased, and the process of feature extraction is optimized.

In addition, in 703, it also introduces the adaptation of multi-person speaking information, enhances the triphone model, and improves the accuracy of the algorithm.

It should be understood that each triphone model finally corresponds to a sound signal, that is, the start and end time of a sound signal is determined, and this start and end time is the time when the phoneme levels are aligned.

Step 3: Generate word time series

Through

steps

1 and 2 introduced above, an acoustic model is established through the relevant algorithms of Kaldi's open source toolkit. In step three, it is necessary to pre-process the audio files that this application can use to learn English. Using the triphone model generated by the audio file, a list of "word + time" associated with the audio file is output. FIG. 8 is a flowchart of an example of generating a time series of words provided by an embodiment of the present application. The generation process includes the following:

801. Import subtitle files, extract words one by one, and generate an acoustic model of each word.

802. Import an audio file, generate an acoustic model of the entire audio file, and judge possible words according to the acoustic model.

803: Compare the acoustic model of the audio file with the acoustic model of the words one by one, match the words, output the words and start and end time, the words that do not match are discarded directly, and the word sequence corresponding to the audio file.

Through step three, for English video resources, through the background processing of the administrator, the start and end time of each word can be obtained to achieve the accurate positioning of the time slice of each word, that is, the file corresponding to each word and audio.

Step 4: Generate content association index

FIG. 9 is a schematic diagram of an example of a content association index provided by an embodiment of the present application. Taking a four-level vocabulary as an example, FIG. 9 shows that after the foregoing steps are processed, files corresponding to words and audio are obtained.

Specifically, in step four, for an English video resource, according to the English subtitle file and audio file of the video resource, the English subtitle file includes a corresponding index for each word and time slice, and the audio file includes a time index. Use the time slice information to establish the index relationship between the four-level vocabulary in all words of the English video resource and the four-level vocabulary in the audio file, generate multiple content association index tables, and generate the corresponding relationship index from a single word to multiple videos, or Generate a relationship index corresponding to multiple words from a single video. For example, Table 1 and Table 2 listed in FIG. 9, Table 1 is used to represent the correspondence between a single word and a video, and is used for users to find relevant English learning videos through word search for word learning; Table 2 is used to represent a single The corresponding relationship between the video and the word is used for the user to display the English word or sentence that the user wants to learn while watching the English video, and to implement the repeat function.

Step 5: Generate content metadata

The content metadata may refer to the words input by the user and the start and end times of the words, or the words included in the English video watched by the user and the start and end times of the words. When the client requests content metadata, the corresponding relationship between the video and the word is queried, and the metadata is integrated into the start and end time of the word and returned. It should be understood that in this application, the start and end times are all in the order of milliseconds.

In summary, the process of content metadata is completed through steps one to five, that is, a speech recognition algorithm is used to extract the start and end time of the word granularity of the audio files in the video, and the corresponding relationship between the words and the video content is generated. After that, you can repeat the words or sentences according to the user's request.

Step 6: Locate key frames based on word start and end time, start timed tasks, and enable repeat function

After requesting the cloud to obtain content metadata, the user needs to locate the core frame position of the keyword after the keyword and the start and end time of the keyword are included in the content, according to the user's settings Repeat words or sentences.

FIG. 10 is a flowchart of a word or sentence repetition process provided by an embodiment of the present application. According to FIG. 10, the entire process includes the following:

1001. Acquire content metadata, and locate keyword words according to the keywords contained in the content metadata and the start and end times of the keywords. Specifically, the user terminal receives the user instruction, and requests the cloud to obtain content metadata. According to the time information of the key sentence where the keyword is located in the content metadata, the time slot where the keyword is located is judged. It should be understood that the key sentence here refers to the sentence where the keyword is located, and the time slice here and the foregoing start-end time are both millisecond-level time.

1002, start playing.

In response to the user's click operation, the learning video is played. Specifically, the user can import a playback link to the player and start playing the video. In addition, the user terminal searches for the time of the key sentence through the keyword, and searches the current repeat mode to confirm that it is currently word repeat or sentence repeat.

It should be understood that in this application, the source of the video resource may be a video resource stored in the cloud, the user obtains the video resource by sending a request to the cloud, or the video resource may also be a local resource, which is not limited in this application.

1003, start timing.

During the video playback, when the key sentence time is played, the scheduled task is started, and the scheduled task is triggered at the end time of the key word.

1004. Locate the start frame of the keyword and retrieve the complex reading.

Determine the current repeat mode, call the playback engine, and use the forward frame positioning mode.

Specifically, when the start time point does not match the key frame of the video, the key frame is retrieved using the fallback method, the key frame is retrieved to the start time position of the keyword, the key frame is retrieved, and the key frame is started to be played back to realize re-reading.

When the current repeat mode is word repeat, the key frame is the video frame corresponding to the start time of the keyword; when the current repeat mode is sentence repeat, the key frame is the start time of the key sentence where the keyword is located Corresponding video frame.

In addition, by retrieving the multiple readings, the number of backtracking key frames is determined.

1005, start replay and rewind playback.

Retrieve the current repeat mode and repeat times, and read the current repeat mode and repeat times. Start forward playback and accumulate one playback. For example, when the number of re-reading is 3 by default, it is detected that when the re-reading is less than 3, the key frame is searched and the key frame is played back.

1006. End the repeat reading and continue playing.

When the number of repetitions matches the set number of times, for example, when the number of repetitions is 3 by default, when the cumulative number of retrievals is greater than or equal to 3, the repetition is stopped, and the video continues to play forward.

In summary, through the six steps introduced above, from the establishment of acoustic models, the processing of audio and video files by speech algorithms, the generation of word time series, the generation of content metadata, the positioning of key frames and the function of repetition, etc. Introduced the implementation process of the method for re-reading words or sentences in the video provided by the embodiments of the present application in detail, so that the user can use the English index of the video, while watching the English video, using the word index and the player's ability to retreat, Realize the functions of repetition and follow-up of English words, improve the user's English learning effect, and improve the user experience.

In addition, from the perspective of the administrator of the video resource and the user, the implementation process of the method for repetition of words or sentences in the video provided by the embodiments of the present application is shown in FIG. 11.

Specifically, from the perspective of the video resource administrator, it includes the following operations: 1101, the administrator operates the management console to extract video resources; 1102, calls the algorithm, preprocesses the video, and calls the voice algorithm to automatically split; 1103, the output includes time Poke word sequence; 1104, generate video content metadata, that is, generate word sequence search index.

Correspondingly, on the user side, according to different scenes, the user may include the following operations: 1105, the user enters keywords, through scene search or word search, short video clips are searched, and the client may display the video keywords and content; 1106, Go to the video details page; 1107, the progress bar identifies the keyword, the user can view the location of the keyword through the player, for example, the progress bar identifies the location of the keyword; 1108, select vocabulary repeat, the user can set the repeat mode to word repeat or Sentences are repeated, and the number of repetitions can be set through the setting interface, the default is 3 times; 1109, highlight keywords and enable repetition. And when the video is played to the time slot where the keyword is located, the words can be highlighted and automatically re-read; when the cumulative number of times is greater than or equal to 3 after the re-reading, stop re-reading, and the video continues to play without affecting.

In a possible scenario, such as the scenario described in FIG. 4, during watching a movie, a user can click to extract the word contained in the subtitles and display the word card, and manually perform re-reading of a single word in the movie. There is a detailed introduction to the foregoing, so I won't repeat them here.

In a possible scenario, such as the scenario described in FIG. 5, the user can learn professional words while watching the movie in the video playback application. Specifically, when a user opens an English movie, he can check which professional vocabulary is included in the current movie, such as Level 4, TOEIC, TOEFL, etc. During the movie watching, play to the position of the professional vocabulary and enable the repeat function.

In summary, the method for repetition of words or sentences provided by this application is based on speech recognition technology, generates a word search function associated with videos, generates a correspondence index from multiple words to multiple videos, and generates a single video to multiple The correspondence index of words enables users to search from words to related learning videos. In addition, the use of the word search function and the player's ability to retreat, by positioning the start and end time of the word and key frames, to achieve the repeat function. In the implementation process, the user requests content metadata from the cloud, and the acquired content metadata contains words and timeline information. During the player's replay process, the length of the replayed content does not affect the duration of the video resource content itself. High-frequency changes in playback time. The existing video content editing process is avoided, and if the words are to be repeated, the content time will be lengthened, etc., which improves the user experience.

With reference to the above embodiments and related drawings, embodiments of the present application provide a video playback method, which can be used in electronic devices (such as mobile phones, tablet computers, etc.) with a touch screen and a camera as shown in FIGS. 1 and 2. Implemented in. FIG. 12 is a schematic flowchart of a video playback method provided by an embodiment of the present application. As shown in FIG. 12, the method may include the following steps:

1201. A first interface is displayed, where the first interface displays the first video being played and the subtitles of the first video. The subtitles of the first video include a first text unit and a second text unit.

Exemplarily, the first interface is the interface shown in (e) or (f) in FIG. 3. The first interface includes the first video being played and the subtitles of the first video. On the subtitles of the first video, the first text unit "message" to be learned by the user is included, and the "message" is removed from the subtitles. Words outside are called the second text unit.

Optionally, the first interface may also include parsing details of the first text unit, such as English, American pronunciation, Chinese interpretation, Chinese and English example sentences and video example sentences of "message". Optionally, the detailed content presented by the detailed analysis of the word may come from the English dictionary built into the system itself, or may be associated with other English online dictionaries, etc., which is not limited in this application.

1202: When the first segment of the first video corresponding to the first text unit is played, the first segment is automatically played repeatedly on the first interface.

Optionally, the first segment is a video segment corresponding to the first text unit, or the first segment is a video segment corresponding to the entire sentence where the first text unit is located.

Exemplarily, after setting the word repetition setting box 307 in the figure (g) picture to repeat the word, the first interface repeats playing from the start time to the end corresponding to "message" shown in the picture (h) in FIG. Video clip within time.

Alternatively, after setting the sentence repetition through the repetition setting box 307 in (g) of FIG. 3, repeat the entire sentence from “message” shown in (h) of FIG. 3 on the first interface, such as "Yes, I'd like to leave message for Mr. Jay Twistle" corresponds to the video clip from the start time to the end time.

Optionally, the number of times to repeatedly play the first segment of the first video is a preset number of times preset by the system or set by a user.

1203: Detect the user's first operation on the first interface.

Exemplarily, as shown in (d) of FIG. 4, the first operation may be an operation in which the user clicks the first text unit on the subtitle of the first video. For example, the user clicks on "close" in the subtitles.

1204. In response to the first operation, display a second interface, and display first information associated with the first text unit on the second interface.

When the mobile phone detects that the user clicks on the first text unit in the subtitle, the mobile phone enters the second interface shown in (e) of FIG. 4. Among them, the second interface displays information such as analysis details associated with the first text unit clicked by the user.

In a possible implementation, before displaying the first interface, the method 1200 further includes:

Displaying a third interface displaying the first text unit input by the user, the third interface including second information and a first video list associated with the first text unit, the first video The list includes the first video.

Exemplarily, the third interface is the interface shown in (d) of FIG. 3. The third interface is an interface displayed after the user performs the operation shown in (c) in FIG. 3, enters the "message" to be learned, and clicks the navigation box 305. As shown in (d) of FIG. 3, the third interface includes word resolution details of "message" and a video list associated with "message".

Detect the second operation of the user on the third interface.

Optionally, the first video list further includes a second video, and the second operation is used to select the first video.

Exemplarily, the video list may include multiple videos, and the user performs an operation of swiping upward on the third interface similar to (h) in FIG. 3 to see more selectable videos.

In response to the second operation, the first interface is displayed.

Exemplarily, as shown in (d) of FIG. 3, the second operation may be a user's click operation on the first video, and the user may click the first video to enter the first interface.

In a possible implementation manner, the method 1200 further includes:

The third operation of the user is detected on the third interface.

Exemplarily, as shown in (e) of FIG. 4, the third operation may be that the user clicks the detail control in the word analysis popup box 404.

In response to the third operation, a fourth interface is displayed, the fourth interface includes second information of the first text unit and a second video list, the second video list includes at least one video, the second The subtitle of each video in the video list includes the first text unit.

Exemplarily, as shown in (f) in FIG. 4, the fourth interface is a user’s click on the detail control to enter the parsing interface of the text unit, including word parsing details and a video list associated with the word.

In a possible implementation, when the second interface is displayed, the first video is paused.

Exemplarily, during video playback, as shown in (f) to (g) in FIG. 3, as long as you click the repeat setting control to set the repeat mode and repeat times, or as shown in (d) in FIG. 4 to (e) As shown in the figure, when the user clicks on any word in the subtitle and enters the learning mode of the word, when the word analysis box of the word pops up, the video is paused.

In a possible implementation manner, the display effect of the first text unit is different from the display effect of the second text unit.

Exemplarily, as shown in (e) to (h) in Figure 3, the "message" in the subtitle is different from the display of other words, or, as shown in (c) and (d) in Figure 4 As shown in the figure, the "close" in the subtitle is different from the display of other words. For example, highlight effect display, or as shown in (b) in Figure 5, the subtitle "abandon" is different from the display of other words.

It can be understood that, in order to realize the above-mentioned functions, the electronic device includes hardware and/or software modules corresponding to performing each function. With reference to the example algorithm steps described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed by hardware or computer software driven hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application in combination with the embodiments, but such implementation should not be considered beyond the scope of the present application.

In this embodiment, the electronic device may be divided into function modules according to the above method example. For example, each function module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above integrated module can be implemented in the form of hardware. It should be noted that the division of the modules in this embodiment is schematic, and is only a division of logical functions. In actual implementation, there may be another division manner.

In the case where each functional module is divided corresponding to each function, FIG. 13 shows a schematic diagram of a possible composition of the electronic device 1300 involved in the above embodiment. As shown in FIG. 13, the electronic device 1300 may include: a display unit 1301, a detection unit 1302, and a processing unit 1303.

The display unit 1301 may be used to support the electronic device 1300 to perform the

above steps

1201 and 1204, and/or other processes used in the technology described herein.

The detection unit 1302 may be used to support the electronic device 1300 to perform the above steps 1203, etc., and/or other processes for the technology described herein.

The processing unit 1303 may be used to support the electronic device 1300 to perform the above steps 1202, etc., and/or other processes for the technology described herein.

It should be noted that all relevant content of the steps involved in the above method embodiments can be referred to the function description of the corresponding function module, which will not be repeated here.

The electronic device provided in this embodiment is used to execute the above-mentioned video playback method, and therefore can achieve the same effect as the above-mentioned implementation method.

In the case of using an integrated unit, the electronic device may include a processing module, a storage module, and a communication module. The processing module may be used to control and manage the actions of the electronic device. For example, it may be used to support the electronic device to execute the steps performed by the display unit 1301, the detection unit 1302, and the processing unit 1303. The storage module can be used to support electronic devices to execute stored program codes and data. The communication module can be used to support communication between electronic devices and other devices.

The processing module may be a processor or a controller. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the present application. The processor may also be a combination of computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, and so on. The storage module may be a memory. The communication module may specifically be a device that interacts with other electronic devices, such as a radio frequency circuit, a Bluetooth chip, or a Wi-Fi chip.

In one embodiment, when the processing module is a processor and the storage module is a memory, the electronic device involved in this embodiment may be a device having the structure shown in FIG. 1.

This embodiment also provides a computer storage medium that stores computer instructions. When the computer instructions run on the electronic device, the electronic device is allowed to perform the above-mentioned related method steps to realize the shooting of long-exposure images in the above embodiment. Methods.

This embodiment also provides a computer program product, which, when the computer program product runs on a computer, causes the computer to perform the above-mentioned relevant steps to implement the method of shooting a long exposure image in the above embodiment.

In addition, the embodiments of the present application also provide an apparatus. The apparatus may specifically be a chip, a component, or a module. The apparatus may include a connected processor and a memory; wherein the memory is used to store computer-executed instructions. When the apparatus is running, The processor may execute computer execution instructions stored in the memory, so that the chip executes the method for shooting a long exposure image in each of the above method embodiments.

Among them, the electronic devices, computer storage media, computer program products, or chips provided in this embodiment are used to perform the corresponding methods provided above. Therefore, for the beneficial effects that can be achieved, refer to the corresponding The beneficial effects in the method will not be repeated here.

Through the description of the above embodiments, those skilled in the art can understand that, for the convenience and conciseness of description, only the above-mentioned division of each functional module is used as an example for illustration. In actual applications, the above-mentioned functions can be assigned by different The functional module is completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of modules or units is only a division of logical functions. In actual implementation, there may be other divisions, for example, multiple units or components may be combined or Can be integrated into another device, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application may be essentially or part of the contribution to the existing technology or all or part of the technical solutions may be embodied in the form of software products, which are stored in a storage medium In it, several instructions are included to enable a device (which may be a single-chip microcomputer, chip, etc.) or processor to execute all or part of the steps of the methods of the embodiments of the present application. The foregoing storage media include various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

The above content is only the specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. Covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A video playback method, characterized in that the method includes:

Displaying a first interface displaying the first video being played and the subtitles of the first video, the subtitles of the first video including a first text unit and a second text unit;

When playing the first segment of the first video corresponding to the first text unit, automatically playing the first segment repeatedly on the first interface;

Detecting the user's first operation on the first interface;

In response to the first operation, a second interface is displayed, and the first information associated with the first text unit is displayed on the second interface.
The method according to claim 1, wherein the first segment is a video segment corresponding to the first text unit, or the first segment is a video corresponding to the entire sentence where the first text unit is located Fragment.
The method according to claim 1 or 2, wherein the method further comprises:

Before displaying the first interface, a third interface is displayed, the third interface displays the first text unit input by the user, and the third interface includes second information and a first video associated with the first text unit List, the first video list includes the first video;

Detecting the second operation of the user on the third interface;

In response to the second operation, the first interface is displayed.
The method according to claim 3, wherein the first video list further includes a second video, and the second operation is used to select the first video.
The method according to claim 4, wherein the method further comprises:

Detecting the third operation of the user on the third interface;

In response to the third operation, a fourth interface is displayed, the fourth interface includes second information of the first text unit and a second video list, the second video list includes at least one video, the second The subtitle of each video in the video list includes the first text unit.
The method according to any one of claims 1 to 5, wherein the first video is paused when the second interface is displayed.
The method according to claim 6, wherein the number of times of repeatedly playing the first segment of the first video is a preset number of times preset by the system or set by a user.
The method according to any one of claims 1 to 7, wherein the display effect of the first text unit is different from the display effect of the second text unit.
An electronic device, comprising: one or more processors; a memory; a plurality of application programs; and one or more programs, wherein the one or more programs are stored in the memory, when the When the one or more programs are executed by the processor, the electronic device performs the following steps:

Displaying a first interface displaying the first video being played and the subtitles of the first video, the subtitles of the first video including a first text unit and a second text unit;

When playing the first segment of the first video corresponding to the first text unit, automatically playing the first segment repeatedly on the first interface;

Detecting the user's first operation on the first interface;

In response to the first operation, a second interface is displayed, and the first information associated with the first text unit is displayed on the second interface.
The electronic device according to claim 9, wherein the first segment is a video segment corresponding to the first text unit, or the first segment is corresponding to the entire sentence where the first text unit is located Video clip.
The electronic device according to claim 9 or 10, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

Before displaying the first interface, a third interface is displayed, the third interface displays the first text unit input by the user, and the third interface includes second information and a first video associated with the first text unit List, the first video list includes the first video;

Detecting the second operation of the user on the third interface;

In response to the second operation, the first interface is displayed.
The electronic device according to claim 11, wherein the first video list further includes a second video, and the second operation is used to select the first video.
The electronic device according to claim 12, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

Detecting the third operation of the user on the third interface;

In response to the third operation, a fourth interface is displayed, the fourth interface includes second information of the first text unit and a second video list, the second video list includes at least one video, the second The subtitle of each video in the video list includes the first text unit.
The electronic device according to any one of claims 9 to 13, wherein the first video is paused when the second interface is displayed.
The electronic device according to claim 14, wherein the number of times of repeatedly playing the first segment of the first video is a preset number of times preset by the system or set by a user.
The electronic device according to any one of claims 9 to 15, wherein the display effect of the first text unit is different from the display effect of the second text unit.
A computer storage medium, characterized by comprising computer instructions, when the computer instructions run on an electronic device, the electronic device is caused to perform the video playback method according to any one of claims 1 to 8.
A computer program product, characterized in that, when the computer program product runs on a computer, the computer program product is caused to perform the video playback method according to any one of claims 1 to 8.