WO2023097446A1 - Video processing method, smart terminal, and storage medium - Google Patents
Video processing method, smart terminal, and storage medium Download PDFInfo
- Publication number
- WO2023097446A1 WO2023097446A1 PCT/CN2021/134410 CN2021134410W WO2023097446A1 WO 2023097446 A1 WO2023097446 A1 WO 2023097446A1 CN 2021134410 W CN2021134410 W CN 2021134410W WO 2023097446 A1 WO2023097446 A1 WO 2023097446A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- target
- video
- content
- preset
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 124
- 238000000034 method Methods 0.000 claims description 105
- 230000008569 process Effects 0.000 claims description 38
- 238000004590 computer program Methods 0.000 claims description 16
- 230000004044 response Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 description 27
- 238000004891 communication Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 18
- 238000012015 optical character recognition Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
Definitions
- the present application relates to the technical field of video processing, and in particular to a video processing method, an intelligent terminal and a storage medium.
- a subtitle display function in order to facilitate the understanding of video content, generally in the video, a subtitle display function is supported, and some also support a subtitle translation function to help users better understand the video content.
- Preset information (such as text information, etc.) in different languages may appear in the video image, which may be difficult for the user to understand, thereby affecting the user's viewing experience.
- the main purpose of this application is to provide a video processing method, an intelligent terminal and a storage medium, so that users can quickly understand the preset information in the video according to the converted target information.
- the application provides a video processing method
- the video processing method includes the following steps:
- the first information is processed according to a preset rule to determine or generate target information corresponding to the video image frame.
- the step of processing the first information according to preset rules to determine or generate target information corresponding to the video image frame includes:
- the target information is determined or generated according to the characteristic information and the first information.
- the method also includes:
- the step of acquiring the second information corresponding to the voice information in the target video includes:
- the first information includes first content and/or a first location
- the step of processing the first information according to preset rules includes:
- the step of processing the first information according to preset rules includes:
- the preset location rules include at least one of the following:
- the target location is determined or generated based on the preset operation.
- the preset operation may be a drag operation, or other operations.
- the method further includes:
- the method also includes:
- the target content is displayed at the target position corresponding to the video image frame.
- the step of acquiring the target information includes:
- the relationship acquires the target information.
- the step of displaying the target content at the position corresponding to the target image frame during the playback of the target video includes:
- the target content is displayed at the target position corresponding to the video image frame with the target display parameters.
- the step of acquiring target display parameters of the target information includes:
- the target display parameter is determined according to the selection operation.
- the target display parameters include at least one of text display duration, text display mode, and text target display position.
- the text display manner includes displaying the target content, and/or simultaneously displaying the target content and the content.
- the step of acquiring first information corresponding to at least one video image frame in the target video includes:
- the preset information may be text information, may also be subtitle information, and may also be content text in an image frame, and the like.
- the present application also provides a video processing method, the method comprising:
- the target information includes image text information and/or voice text information.
- the step of determining or generating target information corresponding to the target video according to the first information and the second information includes:
- the target information is determined or generated according to the image text information and/or the voice text information.
- the third preset processing may be calibration processing or other processing; the fourth preset processing may be calibration processing or other processing.
- the step of performing third preset processing on the first information according to the second information to obtain processed image text information includes:
- the step of performing third preset processing on the first information according to the second information to obtain image text information includes: obtaining descriptive information and/or related information;
- the step of performing fourth preset processing on the second information according to the first information to obtain voice text information includes:
- the present application also provides an intelligent terminal, including: a memory and a processor, wherein a video processing program is stored in the memory, and when the video processing program is executed by the processor, the steps of any one of the methods described above are implemented.
- the present application also provides a storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any one of the above methods are implemented.
- the storage medium may be a computer-readable storage medium.
- the first information corresponding to at least one video image frame in the target video is obtained; and the first information is processed according to preset rules to determine or generate the target information corresponding to the video image frame.
- FIG. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application
- FIG. 2 is a system architecture diagram of a communication network provided by an embodiment of the present application.
- Fig. 3 is a schematic diagram of the hardware structure of the controller 140 shown according to the first embodiment
- Fig. 4 is a schematic diagram of a hardware structure of a network node 150 shown according to the first embodiment
- FIG. 5 is a schematic diagram of a hardware structure of a network node 160 shown according to the first embodiment
- FIG. 6 is a schematic diagram of the hardware structure of the controller 170 shown according to the second embodiment
- FIG. 7 is a schematic diagram of a hardware structure of a network node 180 according to a second embodiment
- Fig. 8 is a schematic flowchart of a video processing method according to the first embodiment
- FIG. 9 is a schematic flowchart of step S20 of the video processing method according to the first embodiment.
- Fig. 10 is a diagram of the playback interface of the video processing method shown according to the first embodiment
- FIG. 11 is a schematic flowchart of step S20 of the video processing method according to the second embodiment.
- Fig. 12 is a schematic flowchart of a video processing method according to a third embodiment
- Fig. 13 is a schematic flowchart of a video processing method according to a fourth embodiment
- Fig. 14 is a schematic flowchart of step S20 of the video processing method according to the fifth embodiment.
- Fig. 15 is a schematic flowchart of a video processing method according to a sixth embodiment
- Fig. 16 is a schematic diagram of an image frame according to a sixth embodiment
- Fig. 17 is a schematic flowchart of a video processing method according to a seventh embodiment
- Fig. 18 is a schematic flowchart of a video processing method according to a seventh embodiment
- FIG. 19 is a schematic flowchart of a video processing method S70 according to the eighth embodiment.
- Fig. 20 is a schematic flowchart of a video processing method according to a ninth embodiment.
- first, second, third, etc. may be used herein to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this document, first information may also be called second information, and similarly, second information may also be called first information.
- first information may also be called second information, and similarly, second information may also be called first information.
- second information may also be called first information.
- the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination”.
- the singular forms "a”, “an” and “the” are intended to include the plural forms as well, unless the context indicates otherwise.
- A, B, C means “any of the following: A; B; C; A and B; A and C; B and C; A and B and C
- A, B or C or "A, B and/or C” means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C”. Exceptions to this definition will only arise when combinations of elements, functions, steps or operations are inherently mutually exclusive in some way.
- the words “if”, “if” as used herein may be interpreted as “at” or “when” or “in response to determining” or “in response to detecting”.
- the phrases “if determined” or “if detected (the stated condition or event)” could be interpreted as “when determined” or “in response to the determination” or “when detected (the stated condition or event) )” or “in response to detection of (a stated condition or event)”.
- step codes such as S10 and S20 are used, the purpose of which is to express the corresponding content more clearly and concisely, and does not constitute a substantive limitation on the order.
- S20 will be executed first, followed by S10, etc., but these should be within the scope of protection of this application.
- Smart terminals can be implemented in various forms.
- the smart terminals described in this application may include mobile phones, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), navigation devices, Smart terminals such as wearable devices, smart bracelets, and pedometers, as well as fixed terminals such as digital TVs and desktop computers.
- PDA Personal Digital Assistant
- PMP portable media players
- navigation devices Smart terminals such as wearable devices, smart bracelets, and pedometers
- Smart terminals such as wearable devices, smart bracelets, and pedometers
- fixed terminals such as digital TVs and desktop computers.
- a mobile terminal will be taken as an example, and those skilled in the art will understand that, in addition to elements specially used for mobile purposes, the configurations according to the embodiments of the present application can also be applied to fixed-type terminals.
- FIG. 1 is a schematic diagram of the hardware structure of a mobile terminal implementing various embodiments of the present application.
- the mobile terminal 100 may include: an RF (Radio Frequency, radio frequency) unit 101, a WiFi module 102, an audio output unit 103, an A /V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111 and other components.
- RF Radio Frequency, radio frequency
- the radio frequency unit 101 can be used for sending and receiving information or receiving and sending signals during a call. Specifically, after receiving the downlink information of the base station, it is processed by the processor 110; in addition, the uplink data is sent to the base station.
- the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
- the radio frequency unit 101 can also communicate with the network and other devices through wireless communication.
- the above wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication, Global System for Mobile Communications), GPRS (General Packet Radio Service, General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 , Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, Time Division Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long-term evolution), TDD-LTE (Time Division Duplexing-Long Term Evolution, time-division duplex long-term evolution) and 5G, etc.
- GSM Global System of Mobile communication, Global System for Mobile Communications
- GPRS General Packet Radio Service
- CDMA2000 Code Division Multiple Access 2000
- WCDMA Wideband Code Division Multiple Access
- TD-SCDMA Time Division-Synchronous Code Division Multiple Access, Time Division Synchro
- WiFi is a short-distance wireless transmission technology.
- the mobile terminal can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 102, which provides users with wireless broadband Internet access.
- Fig. 1 shows the WiFi module 102, it can be understood that it is not an essential component of the mobile terminal, and can be completely omitted as required without changing the essence of the invention.
- the audio output unit 103 can store the audio received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 when the mobile terminal 100 is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, or the like.
- the audio data is converted into an audio signal and output as sound.
- the audio output unit 103 can also provide audio output related to a specific function performed by the mobile terminal 100 (eg, call signal reception sound, message reception sound, etc.).
- the audio output unit 103 may include a speaker, a buzzer, and the like.
- the A/V input unit 104 is used to receive audio or video signals.
- the A/V input unit 104 may include a graphics processing unit (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processing unit 1041 is used for still pictures or The image data of the video is processed.
- the processed image frames may be displayed on the display unit 106 .
- the image frames processed by the graphics processor 1041 may be stored in the memory 109 (or other storage media) or sent via the radio frequency unit 101 or the WiFi module 102 .
- the microphone 1042 can receive sound (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, and the like operating modes, and can process such sound as audio data.
- the processed audio (voice) data can be converted into a format transmittable to a mobile communication base station via the radio frequency unit 101 for output in case of a phone call mode.
- the microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the process of receiving and transmitting audio signals.
- the mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors.
- the light sensor includes an ambient light sensor and a proximity sensor.
- the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off the display when the mobile terminal 100 moves to the ear. panel 1061 and/or backlight.
- the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used for applications that recognize the posture of mobile phones (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for mobile phones, fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, Other sensors such as thermometers and infrared sensors will not be described in detail here.
- the display unit 106 is used to display information input by the user or information provided to the user.
- the display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.
- LCD Liquid Crystal Display
- OLED Organic Light-Emitting Diode
- the user input unit 107 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the mobile terminal.
- the user input unit 107 may include a touch panel 1071 and other input devices 1072 .
- the touch panel 1071 also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1071 or near the touch panel 1071). operation), and drive the corresponding connection device according to the preset program.
- the touch panel 1071 may include two parts, a touch detection device and a touch controller.
- the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into contact coordinates , and then sent to the processor 110, and can receive the command sent by the processor 110 and execute it.
- the touch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave.
- the user input unit 107 may also include other input devices 1072 .
- other input devices 1072 may include, but are not limited to, one or more of physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, etc., which are not specifically described here. limited.
- the touch panel 1071 may cover the display panel 1061.
- the touch panel 1071 detects a touch operation on or near it, it transmits to the processor 110 to determine the type of the touch event, and then the processor 110 determines the touch event according to the touch event.
- the corresponding visual output is provided on the display panel 1061 .
- the touch panel 1071 and the display panel 1061 are used as two independent components to realize the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 can be integrated.
- the implementation of the input and output functions of the mobile terminal is not specifically limited here.
- the interface unit 108 serves as an interface through which at least one external device can be connected with the mobile terminal 100 .
- an external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) ports, video I/O ports, headphone ports, and more.
- the interface unit 108 can be used to receive input from an external device (for example, data information, power, etc.) transfer data between devices.
- the memory 109 can be used to store software programs as well as various data.
- the memory 109 can mainly include a storage program area and a storage data area.
- the storage program area can store an operating system, at least one function required application program (such as a sound playback function, an image playback function, etc.) etc.
- the storage data area can be Store data (such as audio data, phone book, etc.) created according to the use of the mobile phone.
- the memory 109 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
- the processor 110 is the control center of the mobile terminal, and uses various interfaces and lines to connect various parts of the entire mobile terminal, by running or executing software programs and/or modules stored in the memory 109, and calling data stored in the memory 109 , execute various functions of the mobile terminal and process data, so as to monitor the mobile terminal as a whole.
- the processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor.
- the application processor mainly processes operating systems, user interfaces, and application programs, etc.
- the demodulation processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
- the mobile terminal 100 can also include a power supply 111 (such as a battery) for supplying power to various components.
- a power supply 111 can be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. and other functions.
- the mobile terminal 100 may also include a Bluetooth module, etc., which will not be repeated here.
- the following describes the communication network system on which the mobile terminal of the present application is based.
- FIG. 2 is a structure diagram of a communication network system provided by an embodiment of the present application.
- the communication network system is an LTE system of general mobile communication technology.
- 201 E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, Evolved UMTS Terrestrial Radio Access Network) 202, EPC (Evolved Packet Core, Evolved Packet Core Network) 203 and the operator's IP service 204.
- E-UTRAN Evolved UMTS Terrestrial Radio Access Network
- EPC Evolved Packet Core, Evolved Packet Core Network
- the UE 201 may be the above-mentioned terminal 100, which will not be repeated here.
- E-UTRAN 202 includes eNodeB 2021 and other eNodeB 2022 and so on.
- the eNodeB 2021 can be connected to other eNodeB 2022 through a backhaul (for example, X2 interface), the eNodeB 2021 is connected to the EPC 203 , and the eNodeB 2021 can provide access from the UE 201 to the EPC 203 .
- a backhaul for example, X2 interface
- EPC203 may include MME (Mobility Management Entity, Mobility Management Entity) 2031, HSS (Home Subscriber Server, Home Subscriber Server) 2032, other MME2033, SGW (Serving Gate Way, Serving Gateway) 2034, PGW (PDN Gate Way, packet data Network Gateway) 2035 and PCRF (Policy and Charging Rules Function, Policy and Charging Functional Entity) 2036, etc.
- MME2031 is a control node that processes signaling between UE201 and EPC203, and provides bearer and connection management.
- HSS2032 is used to provide some registers to manage functions such as home location register (not shown in the figure), and store some user-specific information about service characteristics, data rate, etc.
- PCRF2036 is the policy and charging control policy decision point of service data flow and IP bearer resources, it is the policy and charging execution function A unit (not shown) selects and provides available policy and charging control decisions.
- the IP service 204 may include Internet, Intranet, IMS (IP Multimedia Subsystem, IP Multimedia Subsystem) or other IP services.
- IMS IP Multimedia Subsystem, IP Multimedia Subsystem
- LTE system is used as an example above, those skilled in the art should know that this application is not only applicable to the LTE system, but also applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA and future new wireless communication systems.
- the network system (such as 5G), etc., is not limited here.
- FIG. 3 is a schematic diagram of a hardware structure of a controller 140 provided in the present application.
- the controller 140 includes: a memory 1401 and a processor 1402, the memory 1401 is used to store program instructions, and the processor 1402 is used to call the program instructions in the memory 1401 to execute the steps performed by the controller in the first method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
- the foregoing controller further includes a communication interface 1403 , and the communication interface 1403 may be connected to the processor 1402 through a bus 1404 .
- the processor 1402 can control the communication interface 1403 to implement the receiving and sending functions of the controller 140 .
- FIG. 4 is a schematic diagram of a hardware structure of a network node 150 provided in the present application.
- the network node 150 includes: a memory 1501 and a processor 1502, the memory 1501 is used to store program instructions, and the processor 1502 is used to call the program instructions in the memory 1501 to execute the steps performed by the first node in the first method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
- the foregoing controller further includes a communication interface 1503 , and the communication interface 1503 may be connected to the processor 1502 through a bus 1504 .
- the processor 1502 can control the communication interface 1503 to realize the functions of receiving and sending of the network node 150 .
- FIG. 5 is a schematic diagram of a hardware structure of a network node 160 provided in the present application.
- the network node 160 includes: a memory 1601 and a processor 1602, the memory 1601 is used to store program instructions, and the processor 1602 is used to call the program instructions in the memory 1601 to execute the steps performed by the intermediate node and the tail node in the first method embodiment above, The implementation principles and beneficial effects are similar, and will not be repeated here.
- the foregoing controller further includes a communication interface 1603 , and the communication interface 1603 may be connected to the processor 1602 through a bus 1604 .
- the processor 1602 can control the communication interface 1603 to realize the functions of receiving and sending of the network node 160 .
- FIG. 6 is a schematic diagram of a hardware structure of a controller 170 provided in the present application.
- the controller 170 includes: a memory 1701 and a processor 1702, the memory 1701 is used to store program instructions, and the processor 1702 is used to call the program instructions in the memory 1701 to execute the steps performed by the controller in the second method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
- FIG. 7 is a schematic diagram of a hardware structure of a network node 180 provided in the present application.
- the network node 180 includes: a memory 1801 and a processor 1802, the memory 1801 is used to store program instructions, and the processor 1802 is used to invoke the program instructions in the memory 1801 to execute the steps performed by the head node in the second method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
- the above-mentioned integrated modules implemented in the form of software function modules can be stored in a computer-readable storage medium.
- the above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (English: processor) to execute the methods of the various embodiments of the present application. partial steps.
- a computer program product includes one or more computer instructions.
- Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
- DSL digital subscriber line
- the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media.
- Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk, SSD), etc.
- FIG. 8 is a schematic flowchart of a video processing method according to a first embodiment.
- the methods include:
- S20 Process the first information according to a preset rule to determine or generate target information corresponding to the video image frame.
- the execution body of the video processing method of this embodiment may be a smart terminal and/or a server, and the smart terminal may be the smart terminal in the above embodiment.
- the smart terminal may be a smart phone, tablet, computer etc.
- the target video may be stored in the smart terminal in advance, or may be a video loaded in real time through a video playing application.
- the method of triggering the video processing of the target video may be that when the user actively triggers a playback request for the target video, the user actively triggers a video processing instruction, and the smart terminal calls the target video, and performs the processing on the target video.
- Executing video processing on the target video; optionally, the method of triggering video processing on the target video may also be to trigger a video processing instruction when the target video is stored in the smart terminal, and then perform video processing on the target video .
- the target video is composed of at least one video image frame, the video image frame includes a video image frame containing preset information, and the video image frame further includes a video image frame not containing preset information.
- a video image frame containing preset information may be determined from each video image frame of the target video, and then according to the The video image frame acquires the first information.
- the step S10 includes: judging whether preset information exists in the video image frame; if yes, acquiring the first information from the video image frame.
- the preset information may be text information, may also be subtitle information, and may also be content text in an image frame, and the like.
- the specific manner of determining the video image frame containing preset information from each image frame of the target video may be that the smart terminal starts from the initial image frame of the video image frame and identifies The content of each video image frame, and then judge whether the video image frame has preset information according to the content of the video image frame.
- the video image frame is determined as the video image frame containing the preset information, and then the information of the video image frame containing the preset information is obtained , using the information as the first information.
- the present application performs a similarity analysis on each of the video image frames containing preset information, so as to obtain the similarity between each video image frame, and then Similar image frames are determined based on the similarity, and then the similar image frames are screened to obtain video image frames to be processed.
- the higher the similarity between the video image frames the higher the probability that the information contained in the video image frames is consistent, and the lower the similarity between the video image frames, it means that the video image frames The information contained in is different information.
- the method of obtaining the video image frame to be processed may also be to first extract the first image frame containing preset information in the video frame, and then obtain each of the video frames based on the extracted first image frame.
- the degree of similarity between the first image frames further determining similar image frames corresponding to the first image frames based on the similarity, and further performing screening processing based on the similar image frames to obtain video image frames.
- this embodiment is based on first determining the first image frame containing preset information, and then filtering the first image frame again to obtain fewer video image frames, reducing video processing resources, and further improving improve the efficiency of video processing.
- the manner of acquiring the first information from the video image frame may be: based on an optical character recognition (OCR) algorithm from the video image
- OCR optical character recognition
- the OCR algorithm is used to translate the shape into computer text through a character recognition method, that is, to recognize information in the image frame and generate a corresponding text recognition result.
- OCR recognition is performed on each of the video image frames to obtain an image text recognition result of the image text information of the video image frames.
- the text displayed on the video image frame is "the sun is shining today"
- the obtained text is an image text recognition result of "the sun is shining today”.
- the first information includes first content and/or a first location.
- the first content is text content in a video image frame, such as "it is sunny today.”
- the first location It is the display position of the text in the video image frame.
- the display position of the text in the video image frame is obtained, and the display position is used as the image text A character position of the recognition result, and then determine the character position as the first position.
- the language type of the first content or the first location of the first information may not meet the needs of the user.
- the language type of the first content may be German, but the user cannot read German but only Chinese, then the first content does not meet the needs of the user, or the first location is in edge of the video image frame, but the user cannot see the edge clearly, then the first position does not meet the needs of the user.
- the first information is processed according to a preset rule, so as to process the first information into target information meeting user needs.
- the target information includes target content and/or target location
- the preset rules include preset language type conversion rules and/or preset location rules
- the preset language type conversion rules are used to convert the For the language type corresponding to the first content
- the preset position rule is used to adjust the first position.
- the S20 includes: S21, when the language type corresponding to the first content does not match the preset type, converting the first content into a target corresponding to the preset type content, and/or, determine or generate a target location corresponding to the first location according to preset location rules.
- the language type corresponding to the first content is a preset language type.
- the preset language type may be a system language, and/or the preset language type may be a set language.
- the set language can be configured by the user. The user can modify the system language of the smart terminal by himself, or the user can modify the language setting page of the video playback page by himself, or the user can Inputting language control instructions to the smart terminal.
- the smart terminal determines a preset language type according to the language control instruction.
- a way for the user to input voice control instructions to the smart terminal may be voice input, and optionally, the preset language types may include at least one language type.
- the language type corresponding to the first content After acquiring the language type corresponding to the first content, it is judged whether the language type matches the preset language type, and when the language type matches the preset language type, there is no need to convert the the language type of the first content; when the language type does not match the preset language type, convert the first content into the target content corresponding to the preset language type.
- the first content is "London Road”
- the language type is English
- the preset language type is Chinese
- the target content corresponding to converting the first content into the preset language type is "London Road”.
- the first text content is converted into the target content, so that when the target video is played later, the target content is displayed at the first text position corresponding to the first information in the video image frame .
- the left side of Figure 10 is the image frame before video processing
- the right side of Figure 10 is the image frame after video processing
- the image frame on the left side of Figure 5 has a road sign "London Road”, after video processing , "London Road” is displayed at the original position of the image frame on the left side of Fig. 10 .
- the embodiment of the present application also proposes a method of determining or generating a target position corresponding to the first position according to a preset position rule, where the preset position rule includes at least one of the following:
- the target location is determined or generated based on the preset operation.
- the target position may be the same as the first position, or may be different from the first position.
- the target location may be the same as the first location, and optionally, the preset location rule includes determining the first location as the target location.
- the target position may be different from the first position
- the preset position rule includes taking the first position as an origin, obtaining a position separated from the first position by a preset distance, and setting the location as the target location.
- the preset distance may be a user-defined setting
- the target position may be directly above the first position, may be to the left of the first position, or may be directly below the first position, Also to the right of said first position.
- the preset position rule may further include receiving a user's preset operation on the first content, and determining the target location according to the preset operation.
- the user may perform a preset operation within or outside the first location based on the first content.
- the preset position rule also includes acquiring the distance between the first position and important information of the video image frame, the important information may be a person in the video image frame, or the video image frame When the subtitle information of the image frame is too close to the important information, the important information may be blocked or the important information may be confused with the target information.
- the distance is less than a preset distance, determine a position whose distance from the important information is greater than or equal to a preset distance, and use the position as the target position.
- the preset position rule may also include obtaining the distance between the first position and the edge side of the video image frame, and obtaining the distance from the edge side when the distance is less than the preset distance A position greater than or equal to a preset distance is used as the target position.
- preset location rules include but are not limited to the above-mentioned several ways.
- the first information after acquiring the first information corresponding to the video image frame, includes the first content and/or the first location, and the first content is converted into the target corresponding to the preset language type content; and/or, determine or generate the target position corresponding to the first position according to the preset position rules, and when the subsequent user plays the target video, the target information can be obtained directly, and then the target video in the target video
- the target information is displayed in the video image frame, so that the user can quickly understand the text according to the target information, it is easier to understand the video content, and the user experience is improved, and by adjusting the first position, it is convenient for the user to view the target information, which meets the needs of the user. need.
- the S20 includes:
- Step S22 identifying the scene information of the video image frame, and determining or generating feature information according to the scene information
- Step S23 determining or generating the target information according to the feature information and the first information.
- the embodiment of the present application when the user is watching a video, in order to enable the user to quickly obtain part of the content displayed by the video image frame, the embodiment of the present application also proposes a method for automatically marking the first information according to the scene information of the video image frame .
- the video image frame is a video image frame corresponding to the first information
- the scene information includes the scene type in the video image frame, and the scene type can be indoor, traffic, natural scenery, humanities Landscapes, cities, villages, fields, etc.
- the specific implementation manner of identifying the scene information of the video image frame includes: acquiring the scene feature information of the video image frame according to a preset feature extraction algorithm, the scene feature information including image display parameter information, object One of the feature information and the environment feature information, compare the scene feature information with the preset scene feature information of the preset scene one by one, and then determine the target preset scene feature information corresponding to the scene feature information, according to the The target preset scene feature information determines the target preset scene corresponding to the target preset scene feature information, and then determines the scene type of the video image frame according to the target preset scene, and then determines the scene type according to the scene type.
- the preset scene includes several preset scenes, and the feature information of the preset scene information corresponding to different preset scenes is different.
- the feature information may be the scene information, for example, the scene information is a hospital, and " "Hospital" as the feature information; the feature information may include a scene icon converted according to the scene information, for example, the scene information is a hospital scene, and the hospital icon corresponding to the hospital scene is acquired; the feature information also
- the target object corresponding to the scene information may be included, and the target object category is a target object in the video image frame, for example, the scene information is a group photo scene, and the video image frame displays person A and person B, That is, person A and person B are the target objects, and then "person A and person B" is determined as the characteristic information.
- the manner of determining the feature information includes but not limited to the above three manners.
- the feature information may also include sequence numbers of the video image frames and the like.
- the target information is determined or generated according to the characteristic information and the first information.
- the manner of determining or generating the target information according to the feature information and the first information may be to use the feature information as the tag content of the first information, and then according to the first information and the The target information is determined or generated jointly with the marked content.
- the scene feature information of the video image frame is extracted by a preset feature extraction algorithm, and then the scene type corresponding to the video image frame is determined according to the scene feature information, and then the scene is determined according to the scene type information, and then determine or generate feature information according to the scene information, and then combine the feature information with the first information to generate the target information, so as to combine the first information with the scene type of the video image frame
- the association is quickly established, so that the user can quickly understand the scene type of the video image frame according to the target information.
- the method further includes:
- Step S30 acquiring second information corresponding to the voice information in the target video
- Step S40 performing a first preset process on the first information according to the second information, so as to determine or generate target information.
- the first preset processing may be calibration processing or other processing.
- the calibration processing may be based on voice calibration processing, and may also be text calibration processing.
- the voice information corresponding to each of the video image frames of the target video is obtained, and the first preset processing is performed on the first information according to the voice information, so as to determine or generate the target video. information, and the voice information is in one-to-one correspondence with the video image frames.
- the target information includes first content in the first information after first preset processing.
- the voice information is converted into corresponding second information through voice-to-text technology according to the voice information, and then the first information is performed a second time based on the second information.
- the second information is in one-to-one correspondence with the video image frames.
- the manner of performing the first preset processing on the first information according to the second information may be to acquire the content to be processed in the first content according to a preset acquisition rule, and then in the second information Determining the voice processing content corresponding to the content to be processed in the second information; performing a first preset process on the content to be processed according to the voice processing content, so as to determine or generate the target content.
- the preset acquisition rule may be to receive a user's processing instruction for the first content, the processing instruction includes content to be processed, and the preset acquisition rule may also be that the smart terminal intelligently acquires the first content. Pending content.
- the way for the smart terminal to intelligently acquire the content to be processed may be to use the first content as the content to be processed, or to obtain a probability coefficient corresponding to the first content.
- the probability coefficient is less than or equal to a preset probability coefficient, the first content is used as the content to be processed.
- the voice processing content corresponding to the content to be processed is determined in the second information.
- the second information is in one-to-one correspondence with the time stamps of the video image frames.
- the time stamp of the video image frame where the content to be processed is located is acquired, and the second information corresponding to the time stamp is determined according to the time stamp. Stamp the corresponding second information as the voice processing content.
- the voice processing content is voice-converted content.
- the voice processing content after acquiring the voice processing content, perform a first preset processing (such as calibration) on the content to be processed according to the voice processing content, so as to obtain the content to be processed after the first preset processing , further determining the first content after the first preset processing according to the content to be processed after the first preset processing, and then determining or Generate said target content.
- a first preset processing such as calibration
- the content to be processed includes "Londo Avenue”
- the speech processing content includes "London Avenue”
- the first content is calibrated as "London Avenue”.
- the video image frame corresponding to the content to be processed is acquired, and then the voice information corresponding to the video image frame is acquired, and the The content to be processed is subjected to a first preset processing (such as calibration).
- a first preset processing such as calibration.
- the content to be processed includes "Londo Avenue”
- the voice information includes “London Avenue”
- the to-be-processed content is calibrated as "London Avenue”.
- the first information is determined according to the first preset processed first content, and then The first information is processed according to a preset rule to determine or generate target information corresponding to the video image frame.
- the target information corresponding to the video image frame is used to improve the accuracy of image recognition text, thereby improving the accuracy of the target information, so that the user can correctly understand the text according to the target information, it is easier to understand the video content, and the user experience is improved.
- the S30 includes:
- the preset processing may be calibration processing or other processing.
- the smart terminal when the user is playing the target video, in order to facilitate the user to watch the video, the smart terminal will generate corresponding subtitles according to the voice information corresponding to the target video, and then display the subtitles on the frame of the video image.
- the voice information of the target video it is convenient for users to easily understand the voice information of the target video, but there may be dialects in the voice information, or when the voice information includes some professional terms, it is easy to cause the generated subtitles to be inaccurate.
- a method for preset processing of subtitles is proposed.
- the initial information is information converted through speech-to-speech technology according to the voice information, and after the initial information is acquired, preset processing is performed on the initial information according to the first information to obtain the The processed initial information, and then determine the second information according to the processed initial information.
- the specific implementation manner of performing preset processing on the initial information according to the first information includes: acquiring text to be processed corresponding to the initial information, combining the text to be processed with the first information Match the first content of the first content to obtain the target information in the first content that matches the text to be processed, the similarity between the target information and the text to be processed is greater than or equal to a preset threshold, and the target Information replaces the pending text.
- the text to be processed includes "P to P”
- the first content includes "P2P”
- the similarity between "P to P” and “P2P” is 98%
- the preset threshold is 95%
- the second information is generated according to the replaced to-be-processed text.
- the initial information corresponding to the voice information is preset through the first information to obtain accurate second information, so as to improve the accuracy of voice recognition and improve user experience.
- the S20 includes:
- the description information is description information used to describe the first content
- the description information may include attribute information corresponding to the first content
- the description information may also include the second content Status information corresponding to a content.
- the description information includes but not limited to the attribute information and status information, and the information may also include a description associated with the first content.
- the first content is "London Street”
- the second information includes attribute information for the “London Street”
- the attribute information is "London Street is 1 km long”
- the “length is 1 km” as the description information.
- the second information includes an adjective associated with the "London Street", for example, "London Street is a food street", and then "a food street” is used as the description information.
- the description information is associated with the first content.
- the identification information is generated according to the description information.
- the identification information may be a food identification; when the description information is "the weather in this area is rainy”, the identification information may be a rain identification.
- the associated information includes attribute information corresponding to the first information, and the attribute information may include language type information, customs and culture, religious culture information, location information, text interpretation information, and the like.
- the generated identification information may be: "voice type is A, religious culture is Christianity, rainy day”.
- the identification information when acquiring the description information and/or the association information, may also be determined according to the description information and the association information.
- the description information for the first content and associated information corresponding to the first content are obtained through the second information, and then the identification information is generated according to the description information and the description information, Furthermore, the identification information identifies the first content, so that the user can view the identification information while viewing the first content, so as to better understand the first content and quickly obtain information related to the first content. Relevant information improves user experience.
- the description information for the first content and associated information corresponding to the first content are obtained according to the second information, and then the first content is generated according to the description information and the associated information.
- the identification information corresponding to a content and then identify the first content according to the identification information, so that when the target video is subsequently played, the identification information is displayed while the first content is displayed, so that the user can update A good understanding of the first content and quick acquisition of information related to the first content improve user experience.
- the video processing method of this application proposes a sixth embodiment.
- the S20 also includes:
- S50 Determine or generate target information according to the target content and/or the target position, associating the target information with the video image frame; and/or, according to the target content, the time stamp of the video image frame And/or the target location is determined or target information is generated.
- the target content and/or the target location is obtained, and target information is determined or generated according to the target content and target location, and Associating the target information with the video image frame, so that in the process of playing the target video, the target information associated with the video image frame is directly acquired, and then the corresponding target position in the video image frame The target content corresponding to the target information is displayed at .
- the video image frame includes a corresponding time stamp, and the time stamp is used to represent a playback time stamp of the video image frame in the target video.
- the target video is composed of several video image frames, and each video image frame corresponds to a different time stamp.
- Figure 16 the left side of Figure 16 is a schematic diagram of each video image frame corresponding to the target video, including "F1, F2, F3, F4, F5, F6, F7, F8, F9, ", each video
- the image frames correspond to their respective time stamps, "T1, T2, T3, T4, T5, T6, T7, T8, T9, "
- the right side of Figure 16 is the corresponding video image frame of the target video schematic diagram.
- the method of determining or generating target information may also be to obtain the time stamp of the video image frame, and then determine according to the target content, the time stamp of the video image frame and/or the target position Or generate target information for later in the process of playing the target video, when the current playback time point is consistent with the timestamp, acquire the target information, and determine the target content corresponding to the timestamp according to the target information , and then display the target content at the corresponding target position in the video image frame.
- the target information is obtained directly, and then according to The target information determines the target content and/or target position corresponding to "01:30", and then displays the target content at the target position corresponding to the video image frame.
- the method of determining or generating the target information may also be to generate subtitles according to the target content and the time stamp of the video image frame, so that in the subsequent video playback, the current playback time point reaches the time When stamping, the target information is obtained, and then the target content is displayed at a preset position of the video image frame, and the preset position may be set by the user.
- the method of determining or generating the target information may also be to determine or generate the target information according to the target content and/or the preset position of the target information, and then combine the target information with the video image frame association, so as to directly acquire the target information associated with the video image frame in subsequent playing of the video, and then display the target content corresponding to the target information at a preset position in the video image frame.
- each video image frame corresponding to the target video includes its own frame number
- the way of determining or generating the subtitle may also be based on the target content, the frame number of the video image frame and the target position Generate the target information.
- the manners of determining or generating the target information include but are not limited to the above several manners.
- the target information corresponding to the target video is generated, the target information is saved for acquisition when a subsequent user needs to play the target video.
- the way of saving the target information may be to save the target information and the target video together in the same folder, and the target information is named according to preset rules to ensure that the target video is played.
- the target information in the video can be loaded normally.
- the way of saving the target information may also be to compress the video stream and voice stream corresponding to the target video into an audio file, and then package the audio file and the target information.
- the way of saving the target information may also be to embed the target information in the video image frame in an embedded manner, and during the specific implementation process, the target information is obtained, and the The target information, the video stream, and the audio stream jointly generate a video file of the target video.
- the target information is integrated into the coded stream of the video file, and when the target video is played subsequently, it is not necessary to acquire the target information, and the video file can be played directly.
- the target information can be directly obtained, and then the video in the target video
- the corresponding target position in the image frame displays the target content, so that the user can quickly understand the text according to the target content, understand the video content more easily, and improve user experience.
- the present application proposes a seventh embodiment, the method further includes:
- the target information of the target video is obtained according to the playback request, and then the target video is played, and the target video is played according to the target information in the The target content is displayed in the video image frame of the target video.
- the target video may be directly acquired, and then the target video may be played. At this point, the target content of the target information has been presented in the video image frame.
- the target video when the target information is saved in the same folder as the target information and the target video, when the target video is acquired, the target video is automatically loaded. target information in the same folder, and then display the target content of the target information in the video image frame.
- the target information is saved in the form of compressing the video stream and voice stream corresponding to the target video into an audio file, and then saving the audio file and the target information package, the audio is loaded simultaneously file and the target information, and then display the target content of the target information in the video image frame.
- the target information can be acquired when the target video is played to the video image frame, so as to reduce the The amount of data processed when playing the target video.
- the step of acquiring the target information includes:
- Step S61 Obtain the target information according to the association relationship between the video image frame and the target information in the target video, and/or, when a playback request of the target video is detected, according to the playback time point and the time in the target information The corresponding relationship of stamps is used to obtain the target information.
- the present application associates the target information with the video image frame, and then generates the target information and the video image frame frame associations.
- the association relationship includes the video image frame and target content appearing in the video image frame.
- the present application when playing to the video image frame, determine the target content corresponding to the video image frame according to the association relationship, and then display the target content at the target position in the video image frame .
- the target information includes the target content, the time stamp of the video image frame and/or the target position
- a playback request of the target video when a playback request of the target video is detected, during the process of playing the target video
- obtain the playback time point determine the timestamp corresponding to the playback time point according to the correspondence between the playback time point and the timestamp in the target information, and then obtain the target information according to the timestamp, and then according to
- the target information determines the video image frame corresponding to the time stamp and the corresponding target content, and the target content is displayed at the target position in the video image frame.
- the target information is acquired, and then the target video is played, so that when the corresponding video frame is played, the The target content is displayed at the position of the text on the screen, so that the user can quickly understand the text according to the target content, make it easier to understand the video content, and improve user experience.
- the embodiment of the present invention proposes an eighth embodiment, and the S70 includes:
- the target display parameters include at least one of text display duration, text display mode, and text target display position
- the text display mode includes displaying the target content, and/or simultaneously displaying the target content and the first content.
- the text display time corresponds to the time stamp of the video image frame. It is understandable that the same information may appear in consecutive image frames.
- the text display duration may be the duration of only displaying and playing a preset number of image frames, that is, selecting a preset number of initial image frames in the continuous image frames, and selecting a preset number of initial image frames in the preset number of initial image frames
- the target content is displayed, and the target content is no longer displayed when the image frame next to the initial image frame is played.
- the preset number can be 1 or 2.
- the text display manner includes displaying the target content alone or superimposing the target content and/or the first content.
- the target content is overlaid on the target position in a preset manner.
- the text display method is to superimpose the target content and/or the first content, the target content may be displayed above, below, left or right of the first content .
- the text display method further includes highlighting the target content separately, that is, highlighting the target content after the target content is overlaid on the target position in a preset manner; or, the The text display method further includes superimposing and displaying the target content and/or the first content, and highlighting the target content.
- the text target display position is the target display position of the target content in the video image frame, and the target display position may be different from the target position or different.
- the target display position is different from the target position, during the process of playing the target video, the target content is displayed at the target display position of the video image frame without displaying the target content at the target display position of the video image frame.
- the target position is displayed at the target position, so as to realize the user's function of adaptively adjusting the display position of the target content.
- the target display parameter may be a default display parameter, or may be set by the user through a selection interface, and the selection interface is used to provide the user with a function of adjusting the display parameter.
- the output mode of the selection interface may be that when the user inputs a play request of the target video, the smart terminal pops up the selection interface by itself.
- the output mode of the selection interface may also be that the user manually opens the selection interface during the process of watching the target video.
- an embodiment of the present application proposes a method for acquiring the target display parameters, and the S71 includes: outputting a selection interface; in response to a trigger selection operation on the selection interface, determining the target display parameter according to the selection operation parameter.
- the selection interface may display at least one display parameter and the adjustment range corresponding to each display parameter, the display parameters include at least one of text display duration, text display mode, and text display position, and the display parameters may also be Including language, display color, display area, etc.
- the user may trigger a selection operation on the selection interface according to his own needs, and the smart terminal determines the target display parameter according to the selection operation when detecting the selection operation.
- the adjustment range is "1 image frame, 2 image frames, 3 image frames"; when the display parameter is text display mode , the adjustment range is "display the target content alone, superimpose the target content and/or text, highlight the target content alone"; when the display parameter is language, the adjustment range is "Chinese, Japanese, German, Thai".
- the selection interface may also display various combined display parameters, and each combined display parameter includes several display parameters.
- each combined display parameter includes several display parameters.
- “Combined display parameter 1, combined display parameter 2, combined display parameter 3” is displayed in the selection interface, and combined display parameter 1 includes "Text display display: 1 image frame; text display method: display the target content alone ;Text target display position: below the text; Language: Chinese”.
- the user can select any combination of display parameters based on their own needs, and then trigger the selection operation of the selection interface.
- the smart terminal detects the selection operation, it determines the display parameters of the target combination based on the selection operation, and stores the target The combined display parameter is determined as the target display parameter.
- the smart terminal when it detects the user's selection operation, it can save the target display parameters corresponding to the selection parameters, and the next time the user triggers to play the video, it can directly play the video according to the target display parameters.
- the target display parameters Displaying the target content for the user not only facilitates the user to quickly understand the video content, but also satisfies the user's viewing needs, thereby improving the user's viewing experience.
- the present application also provides a video processing method.
- the video processing method includes:
- S90 Determine or generate target information corresponding to the target video according to the first information and the second information.
- the first information is preset information existing in the video image frame, and the preset information may be text information on the video image frame. Specifically, it is determined that the video image frame contains Whether there is preset information, and if so, acquire the first information from the video image frame.
- the method of obtaining the first information from the video image frame is to obtain the first information from the video image frame based on an optical character recognition (OCR) algorithm, and optionally, the first information includes First content and/or first location.
- OCR optical character recognition
- the second information is a speech recognition result obtained by recognizing corresponding speech information in the target video based on speech-to-text technology.
- the voice information is converted into corresponding second information through a voice-to-speech technology according to the voice information, and the second information is in one-to-one correspondence with the video image frames.
- the target information is determined or generated according to the first information and the second information.
- the target information may include the processed image text information of the first information, the target information may also include the processed voice text information of the second information, and the target information may also include The image text information and the voice text information.
- the step of determining or generating target information corresponding to the target video according to the first information and the second information includes: performing a third preset on the first information according to the second information processing to obtain processed image text information, and/or perform fourth preset processing on the second information according to the first information to obtain processed voice text information.
- the third preset processing may be calibration processing or other processing; the fourth preset processing may be calibration processing or other processing; the calibration processing may be voice calibration processing, and may also be text calibration processing.
- the first information includes first content.
- the first content of the first information is rough, when the first content is recognized based on OCR, it may appear that the first information is not recognized exact situation.
- the voice information One-to-one correspondence with the video image frames.
- the voice information is converted into corresponding second information through voice-to-speech technology according to the voice information, and then the third information is performed on the first information according to the second information.
- the second information is in one-to-one correspondence with the video image frames.
- the step of performing third preset processing on the first information according to the second information includes: acquiring content to be processed in the first information according to preset acquisition rules; The voice processing content corresponding to the content to be processed is determined in the information; and the content to be processed is subjected to preset processing according to the voice processing content to determine or generate the target information.
- the preset acquisition rule may be to receive a user's processing instruction for the first content, the processing instruction includes content to be processed, and the preset acquisition rule may also be that the smart terminal intelligently acquires the first content. Pending content.
- the way for the smart terminal to intelligently acquire the to-be-processed content may be to use the first content as the to-be-processed content, or to acquire a probability coefficient corresponding to the first content.
- the probability coefficient is less than or equal to a preset probability coefficient, the first content is used as the content to be processed.
- the voice processing content corresponding to the content to be processed is determined in the second information.
- the second information is in one-to-one correspondence with the time stamps of the video image frames.
- the time stamp of the video image frame where the content to be processed is located is acquired, and the second information corresponding to the time stamp is determined according to the time stamp. Stamp the corresponding second information as the voice processing content.
- the voice processing content is voice-converted second information.
- preset processing (such as calibration) is performed on the content to be processed according to the voice processing content, so as to obtain the content to be processed after preset processing, and then according to the The content to be processed after the preset processing determines the first content after the preset processing, and then determines or generates the target content according to the first information after the preset processing.
- the content to be processed includes "Londo Avenue”
- the speech processing content includes “London Avenue”
- the first content is calibrated as "London Avenue”.
- the video image frame corresponding to the content to be processed is acquired, and then the voice information corresponding to the video image frame is acquired, and the The content to be processed is processed by default.
- the content to be processed includes "Londo Avenue”, the voice information includes “London Avenue”, and then the to-be-processed content is calibrated as "London Avenue”.
- the smart terminal when the user is playing the target video, in order to facilitate the user to watch the video, the smart terminal will generate corresponding subtitles according to the dialogue content corresponding to the target video, and then display the subtitles below the video image frame to It is convenient for users to easily understand the voice information of the target video, but there may be dialects in the voice information, or when the voice information includes some professional terms, it is easy to cause inaccurate generated subtitles. Based on this, the embodiment of this application proposes a A method for preset processing of subtitles.
- the second information is information converted through speech-to-speech technology according to the voice information, and after the second information is acquired, preset processing is performed on the second information according to the first information, so as to The processed second information is obtained, and then the voice information is determined according to the processed second information.
- the specific implementation manner of performing the fourth preset processing on the second information according to the first information includes: acquiring the text to be processed corresponding to the second information, combining the text to be processed with the Match the first content of the first information to obtain the target text content in the first content that matches the text to be processed.
- the similarity between the target text content and the text to be processed is greater than or equal to a preset threshold, and the text to be processed is replaced by the target text content.
- the text to be processed includes "P to P"
- the first content includes "P2P”
- the similarity between "P to P" and “P2P” is 98%
- the preset threshold is 95%. Then prove that the "P2P" is the target content, and then replace "P to P" with "P2P".
- the phonetic text information is determined or generated according to the replaced text to be processed.
- the step of performing fourth preset processing on the second information according to the first information to obtain the processed voice-text information includes:
- the manner of judging whether the voice information corresponds to the first information may be to obtain the voice feature parameters corresponding to the voice information and the voice recognition parameters corresponding to the first information, and convert the voice feature parameters to Compared with the speech recognition parameters, and then obtain the target speech recognition parameters matching the speech feature parameters, it can be understood that the speech recognition parameters corresponding to the first information include the target speech recognition parameters matching the speech feature parameters When the voice recognition parameter is used, it is determined that the target content corresponding to the voice information exists in the first information.
- the manner of obtaining the speech feature parameters corresponding to the speech information may be to perform word segmentation processing on the speech information to obtain several sub-speech information, and then obtain the speech feature parameters corresponding to each of the sub-speech information, Further, the speech feature parameters corresponding to each sub-speech information are determined as the speech feature parameters corresponding to the speech information.
- the manner of obtaining the speech recognition parameters corresponding to the first information may be to perform word segmentation processing on the first content to obtain several sub-contents, and then obtain the speech recognition parameters corresponding to each of the sub-contents, Further, the speech recognition parameters corresponding to the first information are determined according to the speech recognition parameters corresponding to each sub-content.
- the target recognition parameters that match the speech feature parameters it is determined that the speech information corresponds to the first information, and then the text to be processed corresponding to the second information is obtained, and then according to the Target recognition parameter acquisition determines the target text content corresponding to the text to be processed, and then replaces the text to be processed with the target text content, so as to complete the fourth preset processing of the second information, and then obtain the Voice text messages.
- the target information can be obtained directly, and then the image text information and/or voice text information is displayed in the video image frame in the target video, so that the user can quickly understand it according to the image text
- the text information on the video image frame can accurately understand the dialogue content in the target video according to the voice text information, which makes it easier to understand the video content and improves user experience.
- the embodiment of the present application also provides an intelligent terminal, the intelligent terminal includes a memory and a processor, and the XX program is stored in the memory, and when the XX program is executed by the processor, the steps of the video processing method in any of the foregoing embodiments are implemented.
- An embodiment of the present application further provides a storage medium, on which a video processing program is stored, and when the video processing program is executed by a processor, the steps of the video processing method in any of the foregoing embodiments are implemented.
- An embodiment of the present application further provides a computer program product, the computer program product includes computer program code, and when the computer program code is run on the computer, the computer is made to execute the methods in the above various possible implementation manners.
- the embodiment of the present application also provides a chip, including a memory and a processor.
- the memory is used to store a computer program
- the processor is used to call and run the computer program from the memory, so that the device installed with the chip executes the above various possible implementation modes. Methods.
- Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs.
- the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
- the technical solution of the present application can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic CD, CD), including several instructions to make a terminal device (which may be a mobile phone, computer, server, controlled terminal, or network device, etc.) execute the method of each embodiment of the present application.
- all or part of them may be implemented by software, hardware, firmware or any combination thereof.
- software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
- a computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
- the computer can be a general purpose computer, special purpose computer, a computer network, or other programmable apparatus.
- Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media.
- Usable media may be magnetic media, (eg, floppy disk, memory disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), among others.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The present application provides a video processing method, a smart terminal, and a storage medium. The video processing method comprises: obtaining first information corresponding to at least one video image frame in a target video; and processing the first information according to a preset rule to determine or generate target information corresponding to the video image frame. In the present application, the target information may be obtained by processing the first information in the video image frame, thereby improving the video resource utilization rate and improving user experience.
Description
本申请涉及视频处理技术领域,尤其涉及一种视频处理方法、智能终端及存储介质。The present application relates to the technical field of video processing, and in particular to a video processing method, an intelligent terminal and a storage medium.
一些实现中,为了方便理解视频内容,一般在视频中,支持字幕显示功能,有些还支持字幕翻译功能,以协助用户更好地理解视频内容。视频图像中可能会出现不同语言的预设信息(如文字信息等),用户可能难以理解,进而影响用户的观感体验。In some implementations, in order to facilitate the understanding of video content, generally in the video, a subtitle display function is supported, and some also support a subtitle translation function to help users better understand the video content. Preset information (such as text information, etc.) in different languages may appear in the video image, which may be difficult for the user to understand, thereby affecting the user's viewing experience.
本申请的主要目的在于提供一种视频处理方法、智能终端及存储介质,旨在使用户可以根据转换后的目标信息快速理解视频中的预设信息。The main purpose of this application is to provide a video processing method, an intelligent terminal and a storage medium, so that users can quickly understand the preset information in the video according to the converted target information.
为实现上述目的,本申请提供的一种视频处理方法,所述视频处理方法包括以下步骤:In order to achieve the above object, the application provides a video processing method, the video processing method includes the following steps:
获取目标视频中至少一个视频图像帧对应的第一信息;Obtain first information corresponding to at least one video image frame in the target video;
根据预设规则对所述第一信息进行处理,以确定或生成与所述视频图像帧对应的目标信息。The first information is processed according to a preset rule to determine or generate target information corresponding to the video image frame.
可选地,所述根据预设规则对所述第一信息进行处理,以确定或生成与所述视频图像帧对应的目标信息的步骤,包括:Optionally, the step of processing the first information according to preset rules to determine or generate target information corresponding to the video image frame includes:
识别所述视频图像帧的场景信息,根据所述场景信息确定或生成特征信息;identifying the scene information of the video image frame, and determining or generating feature information according to the scene information;
根据所述特征信息和所述第一信息确定或生成所述目标信息。The target information is determined or generated according to the characteristic information and the first information.
可选地,所述方法还包括:Optionally, the method also includes:
获取所述目标视频中语音信息对应的第二信息;Obtaining second information corresponding to the voice information in the target video;
根据所述第二信息对所述第一信息进行第一预设处理,以确定或生成所述目标信息。Performing a first preset process on the first information according to the second information to determine or generate the target information.
可选地,所述获取所述目标视频中语音信息对应的第二信息的步骤,包括:Optionally, the step of acquiring the second information corresponding to the voice information in the target video includes:
获取所述语音信息对应的初始信息;Acquiring initial information corresponding to the voice information;
根据所述第一信息对所述初始信息进行第二预设处理,以获得所述第二信息。performing a second preset process on the initial information according to the first information to obtain the second information.
可选地,第一信息包括第一内容和/或第一位置,所述根据预设规则对所述第一信息进行处理的步骤,包括:Optionally, the first information includes first content and/or a first location, and the step of processing the first information according to preset rules includes:
获取所述第二信息中针对第一内容的描述信息和/或关联信息;Acquiring descriptive information and/or associated information for the first content in the second information;
根据所述描述信息和/或所述关联信息确定或生成所述第一内容对应的标识信息;Determine or generate identification information corresponding to the first content according to the description information and/or the association information;
根据所述标识信息标识所述第一内容,以确定或生成所述目标内容。Identifying the first content according to the identification information to determine or generate the target content.
可选地,所述根据预设规则对所述第一信息进行处理的步骤,包括:Optionally, the step of processing the first information according to preset rules includes:
在第一内容对应的语言类型与预设语言类型不匹配时,将所述第一内容转换为与所述预设语言类型对应的目标内容;和/或,根据预设位置规则确定或生成与第一位置对应的目标位置。When the language type corresponding to the first content does not match the preset language type, convert the first content into the target content corresponding to the preset language type; and/or determine or generate the target content according to the preset position rule The target location corresponding to the first location.
可选地,所述预设位置规则包括以下至少一种:Optionally, the preset location rules include at least one of the following:
将与所述第一位置间隔预设距离的位置作为目标位置;taking a position separated by a preset distance from the first position as a target position;
响应在第一位置之内或之外的预设操作,根据所述预设操作确定或生成所述目标位置。In response to a preset operation within or outside of a first location, the target location is determined or generated based on the preset operation.
可选地,所述预设操作可以是拖动操作,也可以是其他操作等。Optionally, the preset operation may be a drag operation, or other operations.
可选地,所述根据预设规则对所述第一信息进行处理的步骤之后,所述方法还包括:Optionally, after the step of processing the first information according to preset rules, the method further includes:
根据目标内容和/或目标位置确定或生成目标信息,关联所述目标信息和所述视频图像帧;和/或,根据所述目标内容,所述视频图像帧的时间戳和/或所述目标位置确定或生成所述目标信息。Determine or generate target information according to the target content and/or target position, and associate the target information with the video image frame; and/or, according to the target content, the time stamp of the video image frame and/or the target The location is determined or the target information is generated.
可选地,所述方法还包括:Optionally, the method also includes:
检测到所述目标视频的播放请求时,获取所述目标信息;When detecting the playback request of the target video, acquiring the target information;
在所述目标视频播放过程中,在所述视频图像帧对应的所述目标位置处显示所述目标内容。During the playing of the target video, the target content is displayed at the target position corresponding to the video image frame.
可选地,所述检测到所述目标视频的播放请求时,获取所述目标信息的步骤,包括:Optionally, when the play request of the target video is detected, the step of acquiring the target information includes:
根据所述目标视频中的视频图像帧与目标信息的关联关系获取所述目标信息,和/或,检测到所述目标视频的播放请求时,根据播放时间点与目标信息中的时间戳的对应关系获取所述目标信息。Obtain the target information according to the association relationship between the video image frame in the target video and the target information, and/or, when a playback request of the target video is detected, according to the correspondence between the playback time point and the timestamp in the target information The relationship acquires the target information.
可选地,所述在所述目标视频播放过程中,在所述目标图像帧对应的所述位置处显示所述目标内容的步骤,包括:Optionally, the step of displaying the target content at the position corresponding to the target image frame during the playback of the target video includes:
获取所述目标信息的目标显示参数;Acquiring target display parameters of the target information;
在所述目标视频播放过程中,在所述视频图像帧对应的所述目标位置处以所述目标显示参数显示所述目标内容。During the playing process of the target video, the target content is displayed at the target position corresponding to the video image frame with the target display parameters.
可选地,所述获取所述目标信息的目标显示参数的步骤,包括:Optionally, the step of acquiring target display parameters of the target information includes:
输出一选择界面;Output a selection interface;
响应针对所述选择界面的触发选择操作,根据所述选择操作确定所述目标显示参数。In response to a trigger selection operation on the selection interface, the target display parameter is determined according to the selection operation.
可选地,所述目标显示参数包括文字显示时长、文字显示方式以及文字目标显示位置的至少一个。 可选地,所述文字显示方式包括显示所述目标内容,和/或同时显示所述目标内容和所述内容。Optionally, the target display parameters include at least one of text display duration, text display mode, and text target display position. Optionally, the text display manner includes displaying the target content, and/or simultaneously displaying the target content and the content.
可选地,所述获取目标视频中至少一个视频图像帧对应的第一信息的步骤,包括:Optionally, the step of acquiring first information corresponding to at least one video image frame in the target video includes:
判断所述视频图像帧中是否存在预设信息;judging whether preset information exists in the video image frame;
若是,从所述视频图像帧中获取所述第一信息。If yes, acquire the first information from the video image frame.
可选地,所述预设信息可以是文本信息,也可以是字幕信息,还可以是图像帧中的内容文本等。Optionally, the preset information may be text information, may also be subtitle information, and may also be content text in an image frame, and the like.
本申请还提供一种视频处理方法,所述方法包括:The present application also provides a video processing method, the method comprising:
获取目标视频中至少一个视频图像帧对应的第一信息以及获取所述目标视频中语音信息对应的第二信息;Acquiring first information corresponding to at least one video image frame in the target video and acquiring second information corresponding to voice information in the target video;
根据所述第一信息和所述第二信息确定或生成所述目标视频对应的目标信息。Determine or generate target information corresponding to the target video according to the first information and the second information.
可选地,所述目标信息包括图像文本信息和/或语音文本信息。Optionally, the target information includes image text information and/or voice text information.
可选地,所述根据所述第一信息和所述第二信息确定或生成所述目标视频对应的目标信息的步骤,包括:Optionally, the step of determining or generating target information corresponding to the target video according to the first information and the second information includes:
根据所述第二信息对所述第一信息进行第三预设处理,以获取图像文本信息,和/或,根据所述第一信息对所述第二信息进行第四预设处理,以获取语音文本信息;Performing third preset processing on the first information according to the second information to obtain image text information, and/or performing fourth preset processing on the second information according to the first information to obtain voice text messages;
根据所述图像文本信息和/或所述语音文本信息确定或生成所述目标信息。The target information is determined or generated according to the image text information and/or the voice text information.
可选地,所述第三预设处理,可以是校准处理,也可以是其他处理等;所述第四预设处理,可以是校准处理,也可以是其他处理等。Optionally, the third preset processing may be calibration processing or other processing; the fourth preset processing may be calibration processing or other processing.
可选地,所述根据所述第二信息对所述第一信息进行第三预设处理,以获取处理后的图像文字信息的步骤,包括:Optionally, the step of performing third preset processing on the first information according to the second information to obtain processed image text information includes:
根据预设获取规则获取所述第一信息中的待处理内容;Acquiring the content to be processed in the first information according to a preset acquisition rule;
在所述第二信息确定与待处理内容对应的语音处理内容;Determine the voice processing content corresponding to the content to be processed in the second information;
根据所述语音处理内容对所述待处理内容进行第三预设处理,以确定或生成所述图像文本信息。Performing a third preset process on the content to be processed according to the voice processing content, so as to determine or generate the image text information.
可选地,所述根据所述第二信息对第一信息进行第三预设处理,以获取图像文本信息的步骤,包括:获取所述第二信息中针对第一内容的描述信息和/或关联信息;Optionally, the step of performing third preset processing on the first information according to the second information to obtain image text information includes: obtaining descriptive information and/or related information;
根据所述描述信息和/或关联信息确定或生成所述第一内容对应的标识信息;Determine or generate identification information corresponding to the first content according to the description information and/or associated information;
根据所述标识信息标识所述第一内容,以确定或生成所述图像文本信息。Identifying the first content according to the identification information to determine or generate the image text information.
可选地,所述根据所述第一信息对所述第二信息进行第四预设处理,以获取语音文本信息的步骤,包括:Optionally, the step of performing fourth preset processing on the second information according to the first information to obtain voice text information includes:
判断所述语音信息是否与所述第一信息相对应;judging whether the voice information corresponds to the first information;
若是,根据所述第一信息对所述第二信息进行第四预设处理,以获取语音文本信息。If yes, perform a fourth preset process on the second information according to the first information, so as to obtain voice-to-text information.
本申请还提供一种智能终端,包括:存储器、处理器,其中,所述存储器上存储有视频处理程序,所述视频处理程序被所述处理器执行时实现如上任一所述方法的步骤。The present application also provides an intelligent terminal, including: a memory and a processor, wherein a video processing program is stored in the memory, and when the video processing program is executed by the processor, the steps of any one of the methods described above are implemented.
本申请还提供一种存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上任一所述方法的步骤。The present application also provides a storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any one of the above methods are implemented.
可选地,所述存储介质可以为计算机可读存储介质。Optionally, the storage medium may be a computer-readable storage medium.
本申请通过获取目标视频中至少一个视频图像帧对应的第一信息;根据预设规则对所述第一信息进行处理,以确定或生成与所述视频图像帧对应的目标信息。通过上述技术方案,可以实现在播放视频时,将视频图像帧对应的第一信息处理为目标信息,用户可根据目标信息快速理解原预设信息(如原文字),解决了用户难以理解原预设信息的问题,进而提升了用户体验。In the present application, the first information corresponding to at least one video image frame in the target video is obtained; and the first information is processed according to preset rules to determine or generate the target information corresponding to the video image frame. Through the above technical solution, it can be realized that when playing a video, the first information corresponding to the video image frame is processed as the target information, and the user can quickly understand the original preset information (such as the original text) according to the target information, which solves the problem that the user is difficult to understand the original preset information. Set information problems, thereby improving the user experience.
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, for those of ordinary skill in the art, the Under the premise, other drawings can also be obtained based on these drawings.
图1为实现本申请各个实施例的一种智能终端的硬件结构示意图;FIG. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application;
图2为本申请实施例提供的一种通信网络系统架构图;FIG. 2 is a system architecture diagram of a communication network provided by an embodiment of the present application;
图3是根据第一实施例示出的控制器140的硬件结构示意图;Fig. 3 is a schematic diagram of the hardware structure of the controller 140 shown according to the first embodiment;
图4是根据第一实施例示出的网络节点150的硬件结构示意图;Fig. 4 is a schematic diagram of a hardware structure of a network node 150 shown according to the first embodiment;
图5是根据第一实施例示出的网络节点160的硬件结构示意图;FIG. 5 is a schematic diagram of a hardware structure of a network node 160 shown according to the first embodiment;
图6是根据第二实施例示出的控制器170的硬件结构示意图;FIG. 6 is a schematic diagram of the hardware structure of the controller 170 shown according to the second embodiment;
图7是根据第二实施例示出的网络节点180的硬件结构示意图;FIG. 7 is a schematic diagram of a hardware structure of a network node 180 according to a second embodiment;
图8是根据第一实施例示出的视频处理方法的流程示意图;Fig. 8 is a schematic flowchart of a video processing method according to the first embodiment;
图9是根据第一实施例示出的视频处理方法步骤S20的具体流程示意图;FIG. 9 is a schematic flowchart of step S20 of the video processing method according to the first embodiment;
图10是根据第一实施例示出的视频处理方法的播放界面图;Fig. 10 is a diagram of the playback interface of the video processing method shown according to the first embodiment;
图11是根据第二实施例示出的视频处理方法步骤S20的具体流程示意图;FIG. 11 is a schematic flowchart of step S20 of the video processing method according to the second embodiment;
图12是根据第三实施例示出的视频处理方法的流程示意图;Fig. 12 is a schematic flowchart of a video processing method according to a third embodiment;
图13是根据第四实施例示出的视频处理方法的流程示意图;Fig. 13 is a schematic flowchart of a video processing method according to a fourth embodiment;
图14是根据第五实施例示出的视频处理方法步骤S20的具体流程示意图。Fig. 14 is a schematic flowchart of step S20 of the video processing method according to the fifth embodiment.
图15是根据第六实施例示出的视频处理方法的流程示意图;Fig. 15 is a schematic flowchart of a video processing method according to a sixth embodiment;
图16是根据第六实施例示出的图像帧的示意图;Fig. 16 is a schematic diagram of an image frame according to a sixth embodiment;
图17是根据第七实施例示出的视频处理方法的流程示意图;Fig. 17 is a schematic flowchart of a video processing method according to a seventh embodiment;
图18是根据第七实施例示出的视频处理方法的流程示意图;Fig. 18 is a schematic flowchart of a video processing method according to a seventh embodiment;
图19是根据第八实施例示出的视频处理方法S70的具体流程示意图;FIG. 19 is a schematic flowchart of a video processing method S70 according to the eighth embodiment;
图20是根据第九实施例示出的视频处理方法的流程示意图。Fig. 20 is a schematic flowchart of a video processing method according to a ninth embodiment.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings. By means of the above drawings, specific embodiments of the present application have been shown, which will be described in more detail hereinafter. These drawings and text descriptions are not intended to limit the scope of the concept of the application in any way, but to illustrate the concept of the application for those skilled in the art by referring to specific embodiments.
本申请的实施方式Embodiment of this application
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素,可选地,本申请不同实施例中具有同样命名的部件、特征、要素可能具有相同含义,也可能具有不同含义,其具体含义需以其在该具体实施例中的解释或者进一步结合该具体实施例中上下文进行确定。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the statement "comprising a..." does not exclude the existence of other identical elements in the process, method, article or device that includes the element. Optionally, the present application Components, features, and elements with the same name in different embodiments may have the same meaning, or may have different meanings, and the specific meaning shall be determined based on the explanation in the specific embodiment or further combined with the context in the specific embodiment.
应当理解,尽管在本文可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本文范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语"如果"可以被解释成为"在……时"或"当……时"或"响应于确定"。再者,如同在本文中所使用的,单数形式“一”、“一个”和“该”旨在也包括复数形式,除非上下文中有相反的指示。应当进一步理解,术语“包含”、“包括”表明存在所述的特征、步骤、操作、元件、组件、项目、种类、和/或组,但不排除一个或多个其他特征、步骤、操作、元件、组件、项目、种类、和/或组的存在、出现或添加。本申请使用的术语“或”、“和/或”、“包括以下至少一个”等可被解释为包括性的,或意味着任一个或任何组合。例如,“包括以下至少一个:A、B、C”意味着“以下任一个:A;B;C;A和B;A和C;B和C;A和B和C”,再如,“A、B或C”或者“A、B和/或C”意味着“以下任一个:A;B;C;A和B;A和C;B和C;A和B和C”。仅当元件、功能、步骤或操作的组合在某些方式下内在地互相排斥时,才会出现该定义的例外。It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this document, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination". Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It should be further understood that the terms "comprising", "comprising" indicate the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not exclude one or more other features, steps, operations, The existence, occurrence or addition of an element, component, item, species, and/or group. The terms "or", "and/or", "comprising at least one of" and the like used in this application may be interpreted as inclusive, or mean any one or any combination. For example, "including at least one of the following: A, B, C" means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C", another example, " A, B or C" or "A, B and/or C" means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C". Exceptions to this definition will only arise when combinations of elements, functions, steps or operations are inherently mutually exclusive in some way.
应该理解的是,虽然本申请实施例中的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow chart in the embodiment of the present application are displayed sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the figure may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution order is not necessarily sequential Instead, it may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the words "if", "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting". Similarly, depending on the context, the phrases "if determined" or "if detected (the stated condition or event)" could be interpreted as "when determined" or "in response to the determination" or "when detected (the stated condition or event) )" or "in response to detection of (a stated condition or event)".
需要说明的是,在本文中,采用了诸如S10、S20等步骤代号,其目的是为了更清楚简要地表述相应内 容,不构成顺序上的实质性限制,本领域技术人员在具体实施时,可能会先执行S20后执行S10等,但这些均应在本申请的保护范围之内。It should be noted that, in this article, step codes such as S10 and S20 are used, the purpose of which is to express the corresponding content more clearly and concisely, and does not constitute a substantive limitation on the order. Those skilled in the art may, during specific implementation, S20 will be executed first, followed by S10, etc., but these should be within the scope of protection of this application.
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或者“单元”的后缀仅为了有利于本申请的说明,其本身没有特定的意义。因此,“模块”、“部件”或者“单元”可以混合地使用。In the following description, the use of suffixes such as 'module', 'part' or 'unit' for denoting elements is only for facilitating the description of the present application and has no specific meaning by itself. Therefore, 'module', 'part' or 'unit' may be mixedly used.
智能终端可以以各种形式来实施。例如,本申请中描述的智能终端可以包括诸如手机、平板电脑、笔记本电脑、掌上电脑、个人数字助理(Personal Digital Assistant,PDA)、便捷式媒体播放器(Portable Media Player,PMP)、导航装置、可穿戴设备、智能手环、计步器等智能终端,以及诸如数字TV、台式计算机等固定终端。Smart terminals can be implemented in various forms. For example, the smart terminals described in this application may include mobile phones, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), navigation devices, Smart terminals such as wearable devices, smart bracelets, and pedometers, as well as fixed terminals such as digital TVs and desktop computers.
后续描述中将以移动终端为例进行说明,本领域技术人员将理解的是,除了特别用于移动目的的元件之外,根据本申请的实施方式的构造也能够应用于固定类型的终端。In the subsequent description, a mobile terminal will be taken as an example, and those skilled in the art will understand that, in addition to elements specially used for mobile purposes, the configurations according to the embodiments of the present application can also be applied to fixed-type terminals.
请参阅图1,其为实现本申请各个实施例的一种移动终端的硬件结构示意图,该移动终端100可以包括:RF(Radio Frequency,射频)单元101、WiFi模块102、音频输出单元103、A/V(音频/视频)输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、处理器110、以及电源111等部件。本领域技术人员可以理解,图1中示出的移动终端结构并不构成对移动终端的限定,移动终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。下面结合图1对移动终端的各个部件进行具体的介绍:Please refer to FIG. 1 , which is a schematic diagram of the hardware structure of a mobile terminal implementing various embodiments of the present application. The mobile terminal 100 may include: an RF (Radio Frequency, radio frequency) unit 101, a WiFi module 102, an audio output unit 103, an A /V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111 and other components. Those skilled in the art can understand that the structure of the mobile terminal shown in Figure 1 does not constitute a limitation on the mobile terminal, and the mobile terminal may include more or less components than those shown in the figure, or combine some components, or different components layout. Each component of the mobile terminal is specifically introduced below in combination with FIG. 1:
射频单元101可用于收发信息或通话过程中,信号的接收和发送,具体的,将基站的下行信息接收后,给处理器110处理;另外,将上行的数据发送给基站。通常,射频单元101包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频单元101还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA2000(Code Division Multiple Access 2000,码分多址2000)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、TD-SCDMA(Time Division-Synchronous Code Division Multiple Access,时分同步码分多址)、FDD-LTE(Frequency Division Duplexing-Long Term Evolution,频分双工长期演进)、TDD-LTE(Time Division Duplexing-Long Term Evolution,分时双工长期演进)和5G等。The radio frequency unit 101 can be used for sending and receiving information or receiving and sending signals during a call. Specifically, after receiving the downlink information of the base station, it is processed by the processor 110; in addition, the uplink data is sent to the base station. Generally, the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with the network and other devices through wireless communication. The above wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication, Global System for Mobile Communications), GPRS (General Packet Radio Service, General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 , Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, Time Division Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long-term evolution), TDD-LTE (Time Division Duplexing-Long Term Evolution, time-division duplex long-term evolution) and 5G, etc.
WiFi属于短距离无线传输技术,移动终端通过WiFi模块102可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图1示出了WiFi模块102,但是可以理解的是,其并不属于移动终端的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-distance wireless transmission technology. The mobile terminal can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 102, which provides users with wireless broadband Internet access. Although Fig. 1 shows the WiFi module 102, it can be understood that it is not an essential component of the mobile terminal, and can be completely omitted as required without changing the essence of the invention.
音频输出单元103可以在移动终端100处于呼叫信号接收模式、通话模式、记录模式、语音识别模式、广播接收模式等等模式下时,将射频单元101或WiFi模块102接收的或者在存储器109中存储的音频数据转换成音频信号并且输出为声音。而且,音频输出单元103还可以提供与移动终端100执行的特定功能相关的音频输出(例如,呼叫信号接收声音、消息接收声音等等)。音频输出单元103可以包括扬声器、蜂鸣器等等。The audio output unit 103 can store the audio received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 when the mobile terminal 100 is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, or the like. The audio data is converted into an audio signal and output as sound. Also, the audio output unit 103 can also provide audio output related to a specific function performed by the mobile terminal 100 (eg, call signal reception sound, message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.
A/V输入单元104用于接收音频或视频信号。A/V输入单元104可以包括图形处理器(Graphics Processing Unit,GPU)1041和麦克风1042,图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示单元106上。经图形处理器1041处理后的图像帧可以存储在存储器109(或其它存储介质)中或者经由射频单元101或WiFi模块102进行发送。麦克风1042可以在电话通话模式、记录模式、语音识别模式等等运行模式中经由麦克风1042接收声音(音频数据),并且能够将这样的声音处理为音频数据。处理后的音频(语音)数据可以在电话通话模式的情况下转换为可经由射频单元101发送到移动通信基站的格式输出。麦克风1042可以实施各种类型的噪声消除(或抑制)算法以消除(或抑制)在接收和发送音频信号的过程中产生的噪声或者干扰。The A/V input unit 104 is used to receive audio or video signals. The A/V input unit 104 may include a graphics processing unit (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processing unit 1041 is used for still pictures or The image data of the video is processed. The processed image frames may be displayed on the display unit 106 . The image frames processed by the graphics processor 1041 may be stored in the memory 109 (or other storage media) or sent via the radio frequency unit 101 or the WiFi module 102 . The microphone 1042 can receive sound (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, and the like operating modes, and can process such sound as audio data. The processed audio (voice) data can be converted into a format transmittable to a mobile communication base station via the radio frequency unit 101 for output in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the process of receiving and transmitting audio signals.
移动终端100还包括至少一种传感器105,比如光传感器、运动传感器以及其他传感器。可选地,光传感器包括环境光传感器及接近传感器,可选地,环境光传感器可根据环境光线的明暗来调节显示面板1061的亮度,接近传感器可在移动终端100移动到耳边时,关闭显示面板1061和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。 显示单元106用于显示由用户输入的信息或提供给用户的信息。显示单元106可包括显示面板1061,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1061。The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor and a proximity sensor. Optionally, the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off the display when the mobile terminal 100 moves to the ear. panel 1061 and/or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used for applications that recognize the posture of mobile phones (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for mobile phones, fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, Other sensors such as thermometers and infrared sensors will not be described in detail here. The display unit 106 is used to display information input by the user or information provided to the user. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.
用户输入单元107可用于接收输入的数字或字符信息,以及产生与移动终端的用户设置以及功能控制有关的键信号输入。可选地,用户输入单元107可包括触控面板1071以及其他输入设备1072。触控面板1071,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1071上或在触控面板1071附近的操作),并根据预先设定的程式驱动相应的连接装置。触控面板1071可包括触摸检测装置和触摸控制器两个部分。可选地,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器110,并能接收处理器110发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1071。除了触控面板1071,用户输入单元107还可以包括其他输入设备1072。可选地,其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种,具体此处不做限定。The user input unit 107 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the mobile terminal. Optionally, the user input unit 107 may include a touch panel 1071 and other input devices 1072 . The touch panel 1071, also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1071 or near the touch panel 1071). operation), and drive the corresponding connection device according to the preset program. The touch panel 1071 may include two parts, a touch detection device and a touch controller. Optionally, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into contact coordinates , and then sent to the processor 110, and can receive the command sent by the processor 110 and execute it. In addition, the touch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1071 , the user input unit 107 may also include other input devices 1072 . Optionally, other input devices 1072 may include, but are not limited to, one or more of physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, etc., which are not specifically described here. limited.
可选地,触控面板1071可覆盖显示面板1061,当触控面板1071检测到在其上或附近的触摸操作后,传送给处理器110以确定触摸事件的类型,随后处理器110根据触摸事件的类型在显示面板1061上提供相应的视觉输出。虽然在图1中,触控面板1071与显示面板1061是作为两个独立的部件来实现移动终端的输入和输出功能,但是在某些实施例中,可以将触控面板1071与显示面板1061集成而实现移动终端的输入和输出功能,具体此处不做限定。Optionally, the touch panel 1071 may cover the display panel 1061. When the touch panel 1071 detects a touch operation on or near it, it transmits to the processor 110 to determine the type of the touch event, and then the processor 110 determines the touch event according to the touch event. The corresponding visual output is provided on the display panel 1061 . Although in FIG. 1, the touch panel 1071 and the display panel 1061 are used as two independent components to realize the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 can be integrated. The implementation of the input and output functions of the mobile terminal is not specifically limited here.
接口单元108用作至少一个外部装置与移动终端100连接可以通过的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元108可以用于接收来自外部装置的输入(例如,数据信息、电力等等)并且将接收到的输入传输到移动终端100内的一个或多个元件或者可以用于在移动终端100和外部装置之间传输数据。The interface unit 108 serves as an interface through which at least one external device can be connected with the mobile terminal 100 . For example, an external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) ports, video I/O ports, headphone ports, and more. The interface unit 108 can be used to receive input from an external device (for example, data information, power, etc.) transfer data between devices.
存储器109可用于存储软件程序以及各种数据。存储器109可主要包括存储程序区和存储数据区,可选地,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器109可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 109 can be used to store software programs as well as various data. The memory 109 can mainly include a storage program area and a storage data area. Optionally, the storage program area can store an operating system, at least one function required application program (such as a sound playback function, an image playback function, etc.) etc.; the storage data area can be Store data (such as audio data, phone book, etc.) created according to the use of the mobile phone. In addition, the memory 109 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
处理器110是移动终端的控制中心,利用各种接口和线路连接整个移动终端的各个部分,通过运行或执行存储在存储器109内的软件程序和/或模块,以及调用存储在存储器109内的数据,执行移动终端的各种功能和处理数据,从而对移动终端进行整体监控。处理器110可包括一个或多个处理单元;优选的,处理器110可集成应用处理器和调制解调处理器,可选地,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器110中。The processor 110 is the control center of the mobile terminal, and uses various interfaces and lines to connect various parts of the entire mobile terminal, by running or executing software programs and/or modules stored in the memory 109, and calling data stored in the memory 109 , execute various functions of the mobile terminal and process data, so as to monitor the mobile terminal as a whole. The processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor. Optionally, the application processor mainly processes operating systems, user interfaces, and application programs, etc. The demodulation processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
移动终端100还可以包括给各个部件供电的电源111(比如电池),优选的,电源111可以通过电源管理系统与处理器110逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。尽管图1未示出,移动终端100还可以包括蓝牙模块等,在此不再赘述。The mobile terminal 100 can also include a power supply 111 (such as a battery) for supplying power to various components. Preferably, the power supply 111 can be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. and other functions. Although not shown in FIG. 1 , the mobile terminal 100 may also include a Bluetooth module, etc., which will not be repeated here.
为了便于理解本申请实施例,下面对本申请的移动终端所基于的通信网络系统进行描述。In order to facilitate understanding of the embodiments of the present application, the following describes the communication network system on which the mobile terminal of the present application is based.
请参阅图2,图2为本申请实施例提供的一种通信网络系统架构图,该通信网络系统为通用移动通信技术的LTE系统,该LTE系统包括依次通讯连接的UE(User Equipment,用户设备)201,E-UTRAN(Evolved UMTS Terrestrial Radio Access Network,演进式UMTS陆地无线接入网)202,EPC(Evolved Packet Core,演进式分组核心网)203和运营商的IP业务204。Please refer to FIG. 2. FIG. 2 is a structure diagram of a communication network system provided by an embodiment of the present application. The communication network system is an LTE system of general mobile communication technology. ) 201, E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, Evolved UMTS Terrestrial Radio Access Network) 202, EPC (Evolved Packet Core, Evolved Packet Core Network) 203 and the operator's IP service 204.
可选地,UE201可以是上述终端100,此处不再赘述。Optionally, the UE 201 may be the above-mentioned terminal 100, which will not be repeated here.
E-UTRAN202包括eNodeB2021和其它eNodeB2022等。可选地,eNodeB2021可以通过回程(backhaul)(例如X2接口)与其它eNodeB2022连接,eNodeB2021连接到EPC203,eNodeB2021可以提供UE201到EPC203的接入。 E-UTRAN 202 includes eNodeB 2021 and other eNodeB 2022 and so on. Optionally, the eNodeB 2021 can be connected to other eNodeB 2022 through a backhaul (for example, X2 interface), the eNodeB 2021 is connected to the EPC 203 , and the eNodeB 2021 can provide access from the UE 201 to the EPC 203 .
EPC203可以包括MME(Mobility Management Entity,移动性管理实体)2031,HSS(Home Subscriber Server,归属用户服务器)2032,其它MME2033,SGW(Serving Gate Way,服务网关)2034,PGW(PDN Gate Way,分组数据网络网关)2035和PCRF(Policy and Charging Rules Function,政策和资费功能实体)2036等。可选地,MME2031是处理UE201和EPC203之间信令的控制节点,提供承载和连接管理。HSS2032用于提供一些寄存器来管理诸如归属位置寄存器(图中未示)之类的功能,并且 保存有一些有关服务特征、数据速率等用户专用的信息。所有用户数据都可以通过SGW2034进行发送,PGW2035可以提供UE 201的IP地址分配以及其它功能,PCRF2036是业务数据流和IP承载资源的策略与计费控制策略决策点,它为策略与计费执行功能单元(图中未示)选择及提供可用的策略和计费控制决策。EPC203 may include MME (Mobility Management Entity, Mobility Management Entity) 2031, HSS (Home Subscriber Server, Home Subscriber Server) 2032, other MME2033, SGW (Serving Gate Way, Serving Gateway) 2034, PGW (PDN Gate Way, packet data Network Gateway) 2035 and PCRF (Policy and Charging Rules Function, Policy and Charging Functional Entity) 2036, etc. Optionally, MME2031 is a control node that processes signaling between UE201 and EPC203, and provides bearer and connection management. HSS2032 is used to provide some registers to manage functions such as home location register (not shown in the figure), and store some user-specific information about service characteristics, data rate, etc. All user data can be sent through SGW2034, PGW2035 can provide UE 201 IP address allocation and other functions, PCRF2036 is the policy and charging control policy decision point of service data flow and IP bearer resources, it is the policy and charging execution function A unit (not shown) selects and provides available policy and charging control decisions.
IP业务204可以包括因特网、内联网、IMS(IP Multimedia Subsystem,IP多媒体子系统)或其它IP业务等。The IP service 204 may include Internet, Intranet, IMS (IP Multimedia Subsystem, IP Multimedia Subsystem) or other IP services.
虽然上述以LTE系统为例进行了介绍,但本领域技术人员应当知晓,本申请不仅仅适用于LTE系统,也可以适用于其他无线通信系统,例如GSM、CDMA2000、WCDMA、TD-SCDMA以及未来新的网络系统(如5G)等,此处不做限定。Although the LTE system is used as an example above, those skilled in the art should know that this application is not only applicable to the LTE system, but also applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA and future new wireless communication systems. The network system (such as 5G), etc., is not limited here.
图3为本申请提供的一种控制器140的硬件结构示意图。该控制器140包括:存储器1401和处理器1402,存储器1401用于存储程序指令,处理器1402用于调用存储器1401中的程序指令执行上述方法实施例一中控制器所执行的步骤,其实现原理以及有益效果类似,此处不再进行赘述。FIG. 3 is a schematic diagram of a hardware structure of a controller 140 provided in the present application. The controller 140 includes: a memory 1401 and a processor 1402, the memory 1401 is used to store program instructions, and the processor 1402 is used to call the program instructions in the memory 1401 to execute the steps performed by the controller in the first method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
可选地,上述控制器还包括通信接口1403,该通信接口1403可以通过总线1404与处理器1402连接。处理器1402可以控制通信接口1403来实现控制器140的接收和发送的功能。Optionally, the foregoing controller further includes a communication interface 1403 , and the communication interface 1403 may be connected to the processor 1402 through a bus 1404 . The processor 1402 can control the communication interface 1403 to implement the receiving and sending functions of the controller 140 .
图4为本申请提供的一种网络节点150的硬件结构示意图。该网络节点150包括:存储器1501和处理器1502,存储器1501用于存储程序指令,处理器1502用于调用存储器1501中的程序指令执行上述方法实施例一中首节点所执行的步骤,其实现原理以及有益效果类似,此处不再进行赘述。FIG. 4 is a schematic diagram of a hardware structure of a network node 150 provided in the present application. The network node 150 includes: a memory 1501 and a processor 1502, the memory 1501 is used to store program instructions, and the processor 1502 is used to call the program instructions in the memory 1501 to execute the steps performed by the first node in the first method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
可选地,上述控制器还包括通信接口1503,该通信接口1503可以通过总线1504与处理器1502连接。处理器1502可以控制通信接口1503来实现网络节点150的接收和发送的功能。Optionally, the foregoing controller further includes a communication interface 1503 , and the communication interface 1503 may be connected to the processor 1502 through a bus 1504 . The processor 1502 can control the communication interface 1503 to realize the functions of receiving and sending of the network node 150 .
图5为本申请提供的一种网络节点160的硬件结构示意图。该网络节点160包括:存储器1601和处理器1602,存储器1601用于存储程序指令,处理器1602用于调用存储器1601中的程序指令执行上述方法实施例一中中间节点和尾节点所执行的步骤,其实现原理以及有益效果类似,此处不再进行赘述。可选地,上述控制器还包括通信接口1603,该通信接口1603可以通过总线1604与处理器1602连接。处理器1602可以控制通信接口1603来实现网络节点160的接收和发送的功能。FIG. 5 is a schematic diagram of a hardware structure of a network node 160 provided in the present application. The network node 160 includes: a memory 1601 and a processor 1602, the memory 1601 is used to store program instructions, and the processor 1602 is used to call the program instructions in the memory 1601 to execute the steps performed by the intermediate node and the tail node in the first method embodiment above, The implementation principles and beneficial effects are similar, and will not be repeated here. Optionally, the foregoing controller further includes a communication interface 1603 , and the communication interface 1603 may be connected to the processor 1602 through a bus 1604 . The processor 1602 can control the communication interface 1603 to realize the functions of receiving and sending of the network node 160 .
图6为本申请提供的一种控制器170的硬件结构示意图。该控制器170包括:存储器1701和处理器1702,存储器1701用于存储程序指令,处理器1702用于调用存储器1701中的程序指令执行上述方法实施例二中控制器所执行的步骤,其实现原理以及有益效果类似,此处不再进行赘述。FIG. 6 is a schematic diagram of a hardware structure of a controller 170 provided in the present application. The controller 170 includes: a memory 1701 and a processor 1702, the memory 1701 is used to store program instructions, and the processor 1702 is used to call the program instructions in the memory 1701 to execute the steps performed by the controller in the second method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
图7为本申请提供的一种网络节点180的硬件结构示意图。该网络节点180包括:存储器1801和处理器1802,存储器1801用于存储程序指令,处理器1802用于调用存储器1801中的程序指令执行上述方法实施例二中首节点所执行的步骤,其实现原理以及有益效果类似,此处不再进行赘述。FIG. 7 is a schematic diagram of a hardware structure of a network node 180 provided in the present application. The network node 180 includes: a memory 1801 and a processor 1802, the memory 1801 is used to store program instructions, and the processor 1802 is used to invoke the program instructions in the memory 1801 to execute the steps performed by the head node in the second method embodiment above, and its implementation principle and beneficial effects are similar, and will not be repeated here.
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(英文:processor)执行本申请各个实施例方法的部分步骤。The above-mentioned integrated modules implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (English: processor) to execute the methods of the various embodiments of the present application. partial steps.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘solid state disk,SSD)等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. A computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media. Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk, SSD), etc.
基于上述移动终端硬件结构以及通信网络系统,提出本申请各个实施例。Based on the above mobile terminal hardware structure and communication network system, various embodiments of the present application are proposed.
第一实施例first embodiment
请参阅图8,图8是根据第一实施例示出的视频处理方法的流程示意图。所述方法包括:Please refer to FIG. 8 . FIG. 8 is a schematic flowchart of a video processing method according to a first embodiment. The methods include:
S10,获取目标视频中至少一个视频图像帧对应的第一信息;S10. Obtain first information corresponding to at least one video image frame in the target video;
S20,根据预设规则对所述第一信息进行处理,以确定或生成与所述视频图像帧对应的目标信息。S20. Process the first information according to a preset rule to determine or generate target information corresponding to the video image frame.
在本申请实施例中,本实施例的视频处理方法的执行主体可以是智能终端和/服务器,智能终端可以是上述实施例中的智能终端,可选地,智能终端可以是智能手机、平板、电脑等。In this embodiment of the application, the execution body of the video processing method of this embodiment may be a smart terminal and/or a server, and the smart terminal may be the smart terminal in the above embodiment. Optionally, the smart terminal may be a smart phone, tablet, computer etc.
可选地,所述目标视频可以是已提前保存在所述智能终端中,也可以是通过视频播放应用程序实时加载的视频。可选地,所述触发对目标视频进行视频处理的方式可以是用户主动触发对所述目标视频的 播放请求时,主动触发视频处理指令,所述智能终端调取所述目标视频,对所述目标视频执行视频处理;可选地,所述触发对目标视频进行视频处理的方式还可以是所述目标视频保存在所述智能终端时,触发视频处理指令,进而对所述目标视频执行视频处理。Optionally, the target video may be stored in the smart terminal in advance, or may be a video loaded in real time through a video playing application. Optionally, the method of triggering the video processing of the target video may be that when the user actively triggers a playback request for the target video, the user actively triggers a video processing instruction, and the smart terminal calls the target video, and performs the processing on the target video. Executing video processing on the target video; optionally, the method of triggering video processing on the target video may also be to trigger a video processing instruction when the target video is stored in the smart terminal, and then perform video processing on the target video .
可选地,所述目标视频由至少一个视频图像帧组成,所述视频图像帧包括含有预设信息的视频图像帧,所述视频图像帧还包括不含有预设信息的视频图像帧。可选地,在识别目标视频的视频图像帧中的第一信息前,可从所述目标视频的每个视频图像帧中确定含有预设信息的视频图像帧,进而根据所述含有预设信息的视频图像帧获取所述第一信息。Optionally, the target video is composed of at least one video image frame, the video image frame includes a video image frame containing preset information, and the video image frame further includes a video image frame not containing preset information. Optionally, before identifying the first information in the video image frame of the target video, a video image frame containing preset information may be determined from each video image frame of the target video, and then according to the The video image frame acquires the first information.
可选地,所述S10包括:判断所述视频图像帧中是否存在预设信息;若是,从所述视频图像帧中获取所述第一信息。Optionally, the step S10 includes: judging whether preset information exists in the video image frame; if yes, acquiring the first information from the video image frame.
可选地,所述预设信息可以是文本信息,也可以是字幕信息,还可以是图像帧中的内容文本等。Optionally, the preset information may be text information, may also be subtitle information, and may also be content text in an image frame, and the like.
可选地,所述从所述目标视频的每个图像帧中确定所述含有预设信息的视频图像帧的具体方式可以是所述智能终端从所述视频图像帧的初始图像帧开始,识别每个视频图像帧的内容,进而根据所述视频图像帧的内容判断所述视频图像帧是否存在预设信息。可选地,当识别到视频图像帧存在预设信息时,则将所述视频图像帧确定为所述含有预设信息的视频图像帧,进而获取所述含有预设信息的视频图像帧的信息,将所述信息作为所述第一信息。通过对所述目标视频中的含有预设信息的视频图像帧的信息进行识别,进而实现只对含有预设信息的视频图像帧进行视频处理,提高了视频处理的效率。Optionally, the specific manner of determining the video image frame containing preset information from each image frame of the target video may be that the smart terminal starts from the initial image frame of the video image frame and identifies The content of each video image frame, and then judge whether the video image frame has preset information according to the content of the video image frame. Optionally, when it is recognized that the video image frame has preset information, the video image frame is determined as the video image frame containing the preset information, and then the information of the video image frame containing the preset information is obtained , using the information as the first information. By identifying the information of the video image frames containing the preset information in the target video, video processing is performed only on the video image frames containing the preset information, and the efficiency of video processing is improved.
可选地,所述获取所述含有预设信息的视频图像帧后,可能会存在其他的包含信息的视频图像帧中所包含的信息相同。可选地,本申请在获取所述含有预设信息的视频图像帧后,对各个所述含有预设信息的视频图像帧进行相似度分析,以获取各个视频图像帧之间的相似度,进而基于所述相似度确定相似图像帧,进而对所述相似图像帧进行筛选,以获取待处理的视频图像帧。可选地,视频图像帧之间的相似度越高,代表所述视频图像帧中包含的信息是一致的可能性越高,视频图像帧之间的相似度越低,代表所述视频图像帧中包含的信息是不同的信息。Optionally, after the video image frame containing preset information is acquired, there may be other video image frames containing information that contain the same information. Optionally, after obtaining the video image frames containing preset information, the present application performs a similarity analysis on each of the video image frames containing preset information, so as to obtain the similarity between each video image frame, and then Similar image frames are determined based on the similarity, and then the similar image frames are screened to obtain video image frames to be processed. Optionally, the higher the similarity between the video image frames, the higher the probability that the information contained in the video image frames is consistent, and the lower the similarity between the video image frames, it means that the video image frames The information contained in is different information.
可选地,获取所述获取待处理的视频图像帧的方式还可以是先提取所述视频帧中的含有预设信息的第一图像帧,进而基于提取出的第一图像帧获取各个所述第一图像帧之间的相似度,进而基于所述相似度确定所述第一图像帧对应的相似图像帧,进而基于所述相似图像帧进行筛选处理,以获取视频图像帧。可选地,本实施例基于先确定含有预设信息的第一图像帧,进而再次对所述第一图像帧进行筛选,以获取更少的视频图像帧,减少了视频处理的资源,进而提高了视频处理的效率。Optionally, the method of obtaining the video image frame to be processed may also be to first extract the first image frame containing preset information in the video frame, and then obtain each of the video frames based on the extracted first image frame. The degree of similarity between the first image frames, further determining similar image frames corresponding to the first image frames based on the similarity, and further performing screening processing based on the similar image frames to obtain video image frames. Optionally, this embodiment is based on first determining the first image frame containing preset information, and then filtering the first image frame again to obtain fewer video image frames, reducing video processing resources, and further improving improve the efficiency of video processing.
可选地,所述获取所述含有预设信息的视频图像帧后,从所述视频图像帧中获取所述第一信息的方式可以是:基于光学字符识别(OCR)算法从所述视频图像帧获取所述第一信息。Optionally, after the acquisition of the video image frame containing preset information, the manner of acquiring the first information from the video image frame may be: based on an optical character recognition (OCR) algorithm from the video image The frame acquires the first information.
可选地,OCR算法用于通过字符识别方法将形状翻译成计算机文字,即对图像帧中的信息进行识别并生成对应的文字识别结果。Optionally, the OCR algorithm is used to translate the shape into computer text through a character recognition method, that is, to recognize information in the image frame and generate a corresponding text recognition result.
可选地,在得到所述视频图像帧后,对各个所述视频图像帧进行OCR识别,得到所述视频图像帧的图像文字信息的图像文字识别结果。可选地,所述视频图像帧显示的文字为“今天阳光灿烂”,对所述视频图像帧进行OCR识别后,得到的文字为“今天阳光灿烂”的图像文字识别结果。Optionally, after the video image frames are obtained, OCR recognition is performed on each of the video image frames to obtain an image text recognition result of the image text information of the video image frames. Optionally, the text displayed on the video image frame is "the sun is shining today", and after performing OCR recognition on the video image frame, the obtained text is an image text recognition result of "the sun is shining today".
可选地,所述第一信息包括第一内容和/或第一位置,可选地,第一内容为视频图像帧中的文字内容,如“今天阳光灿烂”,可选地,第一位置为文字在视频图像帧的显示位置。Optionally, the first information includes first content and/or a first location. Optionally, the first content is text content in a video image frame, such as "it is sunny today." Optionally, the first location It is the display position of the text in the video image frame.
可选地,对所述视频图像帧进行OCR识别的过程中,获取图像文字识别结果(文字)的同时,获取所述文字在所述视频图像帧的显示位置,将所述显示位置作为图像文字识别结果的文字位置,进而将所述文字位置确定为所述第一位置。Optionally, in the process of performing OCR recognition on the video image frame, while obtaining the image text recognition result (text), the display position of the text in the video image frame is obtained, and the display position is used as the image text A character position of the recognition result, and then determine the character position as the first position.
可选地,在获取所述视频图像帧的第一信息后,可能会存在所述第一内容的语言类型或所述第一信息的第一位置不符合用户的需求。可选地,所述第一内容的语言类型可能为德文,而用户看不懂德文,只看得懂中文,则所述第一内容不符合用户的需求,或所述第一位置处于视频图像帧的边缘,而用户看不清边缘,则所述第一位置不符合用户需求。可选地,本申请实施例在获取所述第一信息后,根据预设规则对所述第一信息,以将所述第一信息处理成符合用户需求的目标信息。Optionally, after the first information of the video image frame is acquired, the language type of the first content or the first location of the first information may not meet the needs of the user. Optionally, the language type of the first content may be German, but the user cannot read German but only Chinese, then the first content does not meet the needs of the user, or the first location is in edge of the video image frame, but the user cannot see the edge clearly, then the first position does not meet the needs of the user. Optionally, in this embodiment of the present application, after the first information is acquired, the first information is processed according to a preset rule, so as to process the first information into target information meeting user needs.
可选地,所述目标信息包括目标内容和/或目标位置,所述预设规则包括预设语言类型转换规则和/或预设位置规则,所述预设语言类型转换规则用于转换所述第一内容对应的语言类型,所述预设位置规则用于调整所述第一位置。Optionally, the target information includes target content and/or target location, and the preset rules include preset language type conversion rules and/or preset location rules, and the preset language type conversion rules are used to convert the For the language type corresponding to the first content, the preset position rule is used to adjust the first position.
可选地,参照图9,所述S20包括:S21,在所述第一内容对应的语言类型与预设类型不匹配时,将所述第一内容转换为与所述预设类型对应的目标内容,和/或,根据预设位置规则确定或生成与所述第一位置对应的目标位置。Optionally, referring to FIG. 9 , the S20 includes: S21, when the language type corresponding to the first content does not match the preset type, converting the first content into a target corresponding to the preset type content, and/or, determine or generate a target location corresponding to the first location according to preset location rules.
可选地,判断所述第一内容对应的语言类型是否为预设语言类型。可选地,所述预设语言类型可以是为系统语言,和/或所述预设语言类型为设定语言。可选地,所述设定语言可以是用户自行配置,用户自行配置的方式可以是自行修改智能终端的系统语言,还可以是用户在视频播放页面的语言设置页面 中自行修改,还可以是用户向所述智能终端输入语言控制指令。可选地所述智能终端在接收到所述语言控制指令后,根据所述语言控制指令确定预设语言类型。可选地,用户向所述智能终端输入语音控制指令的方式可以是语音输入,可选地,所述预设语言类型可以包括至少一种语言类型。Optionally, it is determined whether the language type corresponding to the first content is a preset language type. Optionally, the preset language type may be a system language, and/or the preset language type may be a set language. Optionally, the set language can be configured by the user. The user can modify the system language of the smart terminal by himself, or the user can modify the language setting page of the video playback page by himself, or the user can Inputting language control instructions to the smart terminal. Optionally, after receiving the language control instruction, the smart terminal determines a preset language type according to the language control instruction. Optionally, a way for the user to input voice control instructions to the smart terminal may be voice input, and optionally, the preset language types may include at least one language type.
可选地,获取所述第一内容对应的语言类型后,判断所述语言类型与所述预设语言类型是否匹配,在所述语言类型与所述预设语言类型匹配时,则无需转换所述第一内容的语言类型;在所述语言类型与所述预设语言类型不匹配时,将所述第一内容转换为所述预设语言类型对应的目标内容。可选地,所述第一内容为“London Road”,所述语言类型为英语,所述预设语言类型为中文,将所述第一内容转换为所述预设语言类型对应的目标内容为“伦敦道”。Optionally, after acquiring the language type corresponding to the first content, it is judged whether the language type matches the preset language type, and when the language type matches the preset language type, there is no need to convert the the language type of the first content; when the language type does not match the preset language type, convert the first content into the target content corresponding to the preset language type. Optionally, the first content is "London Road", the language type is English, the preset language type is Chinese, and the target content corresponding to converting the first content into the preset language type is "London Road".
可选地,将所述第一文字内容转换为目标内容,以方便后续在播放所述目标视频时,在所述视频图像帧中的第一信息对应的第一文字位置处显示的是所述目标内容。可选地参照图10,图10左侧为视频处理前的图像帧,图10右侧为视频处理后的图像帧,在图5左侧的图像帧有路牌“London Road”,进行视频处理后,图10左侧的图像帧原有的位置处显示“伦敦道”。Optionally, the first text content is converted into the target content, so that when the target video is played later, the target content is displayed at the first text position corresponding to the first information in the video image frame . Optionally referring to Figure 10, the left side of Figure 10 is the image frame before video processing, the right side of Figure 10 is the image frame after video processing, and the image frame on the left side of Figure 5 has a road sign "London Road", after video processing , "London Road" is displayed at the original position of the image frame on the left side of Fig. 10 .
可选地,本申请实施例还提出了一种根据预设位置规则确定或生成所述第一位置对应的目标位置,所述预设位置规则包括以下至少一种:Optionally, the embodiment of the present application also proposes a method of determining or generating a target position corresponding to the first position according to a preset position rule, where the preset position rule includes at least one of the following:
将与所述第一位置间隔预设距离的位置作为目标位置;taking a position separated by a preset distance from the first position as a target position;
响应在第一位置之内或之外的预设操作,根据所述预设操作确定或生成所述目标位置。In response to a preset operation within or outside of a first location, the target location is determined or generated based on the preset operation.
可选地,所述目标位置可以与所述第一位置相同,还可以与所述第一位置不同。Optionally, the target position may be the same as the first position, or may be different from the first position.
可选地,所述目标位置可以与所述第一位置相同,可选地,所述预设位置规则包括将所述第一位置确定为所述目标位置。Optionally, the target location may be the same as the first location, and optionally, the preset location rule includes determining the first location as the target location.
可选地,所述目标位置可以与所述第一位置不同,所述预设位置规则包括以所述第一位置作为原点,获取与所述第一位置间隔预设距离的位置,将所述位置作为所述目标位置。可选地,所述预设距离可以是用户自定义设置,所述目标位置可以在所述第一位置正上方,可以在所述第一位置左侧,可以在所述第一位置正下方,还可以在所述第一位置右侧。Optionally, the target position may be different from the first position, and the preset position rule includes taking the first position as an origin, obtaining a position separated from the first position by a preset distance, and setting the location as the target location. Optionally, the preset distance may be a user-defined setting, and the target position may be directly above the first position, may be to the left of the first position, or may be directly below the first position, Also to the right of said first position.
可选地,所述预设位置规则还包括可以是接收用户针对所述第一内容的预设操作,根据所述预设操作确定所述目标位置。可选地,用户可基于所述第一内容在所述第一位置之内或之外进行预设操作。可选地,所述预设位置规则还包括获取所述第一位置与所述视频图像帧的重要信息的距离,所述重要信息可以是所述视频图像帧的人物,还可以是所述视频图像帧的字幕信息,在所述第一位置与所述重要信息距离过近时,容易导致遮挡所述重要信息或导致所述重要信息与所述目标信息混淆。可选地,在所述距离小于预设距离时,确定与所述重要信息的距离大于或等于预设距离的位置,将所述位置作为所述目标位置。Optionally, the preset position rule may further include receiving a user's preset operation on the first content, and determining the target location according to the preset operation. Optionally, the user may perform a preset operation within or outside the first location based on the first content. Optionally, the preset position rule also includes acquiring the distance between the first position and important information of the video image frame, the important information may be a person in the video image frame, or the video image frame When the subtitle information of the image frame is too close to the important information, the important information may be blocked or the important information may be confused with the target information. Optionally, when the distance is less than a preset distance, determine a position whose distance from the important information is greater than or equal to a preset distance, and use the position as the target position.
可选地,所述预设位置规则还包括可以是获取所述第一位置与所述视频图像帧的边缘侧的距离,在所述距离小于预设距离时,获取与所述边缘侧的距离大于或等于预设距离的位置,将所述位置作为所述目标位置。Optionally, the preset position rule may also include obtaining the distance between the first position and the edge side of the video image frame, and obtaining the distance from the edge side when the distance is less than the preset distance A position greater than or equal to a preset distance is used as the target position.
可以理解的是,所述预设位置规则包括但不限于上述几种方式。It can be understood that the preset location rules include but are not limited to the above-mentioned several ways.
在本申请实施例中,在获取视频图像帧对应的第一信息后,所述第一信息包括第一内容和/或第一位置,将所述第一内容转换成预设语言类型对应的目标内容;和/或,根据预设位置规则确定或生成所述第一位置对应的目标位置,通过在后续用户播放所述目标视频时,可直接获取所述目标信息,进而所述目标视频中的视频图像帧中显示所述目标信息,以供用户根据所述目标信息可快速理解文字,更容易理解视频内容,提升用户体验,并且通过调整第一位置,方便用户查看目标信息,满足了用户的需求。第二实施例In the embodiment of the present application, after acquiring the first information corresponding to the video image frame, the first information includes the first content and/or the first location, and the first content is converted into the target corresponding to the preset language type content; and/or, determine or generate the target position corresponding to the first position according to the preset position rules, and when the subsequent user plays the target video, the target information can be obtained directly, and then the target video in the target video The target information is displayed in the video image frame, so that the user can quickly understand the text according to the target information, it is easier to understand the video content, and the user experience is improved, and by adjusting the first position, it is convenient for the user to view the target information, which meets the needs of the user. need. second embodiment
可选地,参照图11,基于第一实施例,所述S20包括:Optionally, referring to FIG. 11 , based on the first embodiment, the S20 includes:
步骤S22,识别所述视频图像帧的场景信息,根据所述场景信息确定或生成特征信息;Step S22, identifying the scene information of the video image frame, and determining or generating feature information according to the scene information;
步骤S23,根据所述特征信息和所述第一信息确定或生成所述目标信息。Step S23, determining or generating the target information according to the feature information and the first information.
可选地,在用户观看视频的时候,为了使用户快速获取视频图像帧显示的部分内容,本申请实施例还提出一种根据视频图像帧的场景信息自动对所述第一信息进行标注的方法。Optionally, when the user is watching a video, in order to enable the user to quickly obtain part of the content displayed by the video image frame, the embodiment of the present application also proposes a method for automatically marking the first information according to the scene information of the video image frame .
可选地,所述视频图像帧为所述第一信息对应的视频图像帧,所述场景信息包括所述视频图像帧中的场景类型,所述场景类型可以是室内,交通,自然风景,人文风景,城市,村落,田野等。Optionally, the video image frame is a video image frame corresponding to the first information, and the scene information includes the scene type in the video image frame, and the scene type can be indoor, traffic, natural scenery, humanities Landscapes, cities, villages, fields, etc.
可选地,所述识别所述视频图像帧的场景信息的具体实施方式包括:根据预设特征提取算法获取所述视频图像帧的场景特征信息,所述场景特征信息包括图像显示参数信息,对象特征信息以及环境特征信息的其中一个,将所述场景特征信息与预设场景的预设场景特征信息一一比对,进而确定所述场景特征信息对应的目标预设场景特征信息,根据所述目标预设场景特征信息确定所述目标预设场景特征信息对应的目标预设场景,进而根据所述目标预设场景确定为所述视频图像帧的场景类型,进而根据 所述场景类型确定所述场景信息,可选地,所述预设场景包括若干个预设场景,不同预设场景对应的预设场景信息特征信息不同。Optionally, the specific implementation manner of identifying the scene information of the video image frame includes: acquiring the scene feature information of the video image frame according to a preset feature extraction algorithm, the scene feature information including image display parameter information, object One of the feature information and the environment feature information, compare the scene feature information with the preset scene feature information of the preset scene one by one, and then determine the target preset scene feature information corresponding to the scene feature information, according to the The target preset scene feature information determines the target preset scene corresponding to the target preset scene feature information, and then determines the scene type of the video image frame according to the target preset scene, and then determines the scene type according to the scene type. For scene information, optionally, the preset scene includes several preset scenes, and the feature information of the preset scene information corresponding to different preset scenes is different.
可选地,在获取所述场景信息后,根据所述场景信息生成所述视频图像帧对应的特征信息,所述特征信息可以是所述场景信息,例如,所述场景信息为医院,将“医院”作为所述特征信息;所述特征信息可以包括根据所述场景信息转换而成的场景图标,例如,所述场景信息为医院场景,获取医院场景对应的医院图标;;所述特征信息还可以包括所述场景信息对应的目标物体,所述目标物体类别为所述视频图像帧中的目标物体,例如,所述场景信息为合照场景,并且所述视频图像帧显示人物A和人物B,即人物A和人物B为所述目标物体,进而将“人物A和人物B”确定为所述特征信息。Optionally, after the scene information is acquired, feature information corresponding to the video image frame is generated according to the scene information, the feature information may be the scene information, for example, the scene information is a hospital, and " "Hospital" as the feature information; the feature information may include a scene icon converted according to the scene information, for example, the scene information is a hospital scene, and the hospital icon corresponding to the hospital scene is acquired; the feature information also The target object corresponding to the scene information may be included, and the target object category is a target object in the video image frame, for example, the scene information is a group photo scene, and the video image frame displays person A and person B, That is, person A and person B are the target objects, and then "person A and person B" is determined as the characteristic information.
可以理解的是,所述特征信息的确定方式包括但不限于上述三种方式。所述特征信息还可以是包括所述视频图像帧的序号等。It can be understood that the manner of determining the feature information includes but not limited to the above three manners. The feature information may also include sequence numbers of the video image frames and the like.
可选地,在获取所述特征信息后,根据所述特征信息和所述第一信息确定或生成所述目标信息。Optionally, after the characteristic information is acquired, the target information is determined or generated according to the characteristic information and the first information.
可选地,根据所述特征信息和所述第一信息确定或生成所述目标信息的方式可以是将所述特征信息作为所述第一信息的标注内容,进而根据所述第一信息以及所述标注内容共同确定或生成所述目标信息。在本申请实施例中,通过预设特征提取算法提取所述视频图像帧的场景特征信息,进而根据所述场景特征信息确定所述视频图像帧对应的场景类型,进而根据所述场景类型确定场景信息,进而根据所述场景信息确定或生成特征信息,进而将所述特征信息与所述第一信息共同结合生成所述目标信息,从而将所述第一信息与所述视频图像帧的场景类型快速建立关联,以方便用户可根据所述目标信息快速了解所述视频图像帧的场景类型。Optionally, the manner of determining or generating the target information according to the feature information and the first information may be to use the feature information as the tag content of the first information, and then according to the first information and the The target information is determined or generated jointly with the marked content. In the embodiment of the present application, the scene feature information of the video image frame is extracted by a preset feature extraction algorithm, and then the scene type corresponding to the video image frame is determined according to the scene feature information, and then the scene is determined according to the scene type information, and then determine or generate feature information according to the scene information, and then combine the feature information with the first information to generate the target information, so as to combine the first information with the scene type of the video image frame The association is quickly established, so that the user can quickly understand the scene type of the video image frame according to the target information.
第三实施例third embodiment
可选地,参照图12,基于第一实施例,所述方法还包括:Optionally, referring to FIG. 12, based on the first embodiment, the method further includes:
步骤S30,获取所述目标视频中语音信息对应的第二信息;Step S30, acquiring second information corresponding to the voice information in the target video;
步骤S40,根据所述第二信息对所述第一信息进行第一预设处理,以确定或生成目标信息。Step S40, performing a first preset process on the first information according to the second information, so as to determine or generate target information.
可选地,所述第一预设处理,可以是校准处理,也可以是其他处理等。其中,所述校准处理可以是根据语音校准处理,还可以是文本校准处理。Optionally, the first preset processing may be calibration processing or other processing. Wherein, the calibration processing may be based on voice calibration processing, and may also be text calibration processing.
可选地,所述第一信息的第一内容较潦草时,在基于OCR识别出所述第一信息时,可能会出现识别出所述第一信息不准确的情况。可选地,本申请实施例通过获取所述目标视频各个所述视频图像帧对应的语音信息,根据所述语音信息对所述第一信息进行第一预设处理,以确定或生成所述目标信息,所述语音信息与所述视频图像帧一一对应。Optionally, when the first content of the first information is rough, when the first information is recognized based on OCR, the first information may be recognized inaccurately. Optionally, in this embodiment of the present application, the voice information corresponding to each of the video image frames of the target video is obtained, and the first preset processing is performed on the first information according to the voice information, so as to determine or generate the target video. information, and the voice information is in one-to-one correspondence with the video image frames.
可选地,所述目标信息包括所述第一信息中的第一内容经过第一预设处理后的第一内容。Optionally, the target information includes first content in the first information after first preset processing.
可选地,在获取所述语音信息后,根据所述语音信息通过语音转文本技术将所述语音信息转换成对应的第二信息,进而根据所述第二信息对所述第一信息进行第一预设处理,所述第二信息与所述视频图像帧一一对应。Optionally, after the voice information is acquired, the voice information is converted into corresponding second information through voice-to-text technology according to the voice information, and then the first information is performed a second time based on the second information. A preset process, the second information is in one-to-one correspondence with the video image frames.
可选地,所述根据所述第二信息对所述第一信息进行第一预设处理的方式可以是根据预设获取规则获取所述第一内容中的待处理内容,进而在所述第二信息中确定与所述待处理内容对应的语音处理内容;根据所述语音处理内容对所述待处理内容进行第一预设处理,以确定或生成所述目标内容。Optionally, the manner of performing the first preset processing on the first information according to the second information may be to acquire the content to be processed in the first content according to a preset acquisition rule, and then in the second information Determining the voice processing content corresponding to the content to be processed in the second information; performing a first preset process on the content to be processed according to the voice processing content, so as to determine or generate the target content.
可选地,所述预设获取规则可以是接收用户针对所述第一内容的处理指令,所述处理指令包括待处理内容,所述预设获取规则还可以是所述智能终端智能获取所述待处理内容。可选地,所述智能终端智能获取所述待处理内容的方式可以是将所述第一内容作为所述待处理内容,还可以是获取所述第一内容对应的概率系数。可选地,在所述概率系数小于或等于预设概率系数时,将所述第一内容作为所述待处理内容。Optionally, the preset acquisition rule may be to receive a user's processing instruction for the first content, the processing instruction includes content to be processed, and the preset acquisition rule may also be that the smart terminal intelligently acquires the first content. Pending content. Optionally, the way for the smart terminal to intelligently acquire the content to be processed may be to use the first content as the content to be processed, or to obtain a probability coefficient corresponding to the first content. Optionally, when the probability coefficient is less than or equal to a preset probability coefficient, the first content is used as the content to be processed.
可选地,在获取所述待处理内容后,在所述第二信息中确定与所述待处理内容对应的语音处理内容。Optionally, after the content to be processed is acquired, the voice processing content corresponding to the content to be processed is determined in the second information.
可选地,所述第二信息与所述视频图像帧的时间戳一一对应。可选地,在获取所述待处理内容后,获取所述待处理内容所在的视频图像帧的时间戳,根据所述时间戳确定与所述时间戳对应的第二信息,将与所述时间戳对应的第二信息作为所述语音处理内容。Optionally, the second information is in one-to-one correspondence with the time stamps of the video image frames. Optionally, after the content to be processed is acquired, the time stamp of the video image frame where the content to be processed is located is acquired, and the second information corresponding to the time stamp is determined according to the time stamp. Stamp the corresponding second information as the voice processing content.
可选地,所述语音处理内容为语音转换后的内容。Optionally, the voice processing content is voice-converted content.
可选地,在获取所述语音处理内容后,根据所述语音处理内容对所述待处理内容进行第一预设处理(如校准),以获取第一预设处理后的所述待处理内容,进而根据所述第一预设处理后的所述待处理内容确定第一预设处理后的所述第一内容,进而根据所述第一预设处理后的所述第一内容以确定或生成所述目标内容。可选地,所述待处理内容包括“伦铎大道”,所述语音处理内容包括“伦敦大道”,进而将所述第一内容校准为“伦敦大道”。Optionally, after acquiring the voice processing content, perform a first preset processing (such as calibration) on the content to be processed according to the voice processing content, so as to obtain the content to be processed after the first preset processing , further determining the first content after the first preset processing according to the content to be processed after the first preset processing, and then determining or Generate said target content. Optionally, the content to be processed includes "Londo Avenue", the speech processing content includes "London Avenue", and then the first content is calibrated as "London Avenue".
可选地,在又一实施例中,在获取所述待处理内容后,获取所述待处理内容对应的视频图像帧,进而获取所述视频图像帧对应的语音信息,根据所述语音信息对所述待处理内容进行第一预设处理(如校准)。可选地,所述待处理内容包括“伦铎大道”,所述语音信息包括“伦敦大道”,进而将所述待 处理校准为“伦敦大道”。Optionally, in yet another embodiment, after the content to be processed is acquired, the video image frame corresponding to the content to be processed is acquired, and then the voice information corresponding to the video image frame is acquired, and the The content to be processed is subjected to a first preset processing (such as calibration). Optionally, the content to be processed includes "Londo Avenue", the voice information includes "London Avenue", and then the to-be-processed content is calibrated as "London Avenue".
可选地,在又一实施例中,在获取第一预设处理后的所述第一内容后,根据所述第一预设处理后的所述第一内容确定所述第一信息,进而根据预设规则对所述第一信息进行处理,以确定或生成与所述视频图像帧对应的目标信息。Optionally, in yet another embodiment, after the first preset processed first content is acquired, the first information is determined according to the first preset processed first content, and then The first information is processed according to a preset rule to determine or generate target information corresponding to the video image frame.
在本申请实施例中,通过获取所述目标视频中语音信息对应的第二信息,进而根据所述第二信息对所述第一信息中的第一内容进行第一预设处理,进而根据第一预设处理后的第一内容确定第一预设处理后的第一信息,进而根据预设规则对所述第一预设处理后的所述第一内容进行处理,以确定或生成与所述视频图像帧对应的目标信息,以提高图像识别文字的准确性,进而提高了目标信息的准确性,以供用户根据所述目标信息可正确理解文字,更容易理解视频内容,提升用户体验。In this embodiment of the present application, by acquiring the second information corresponding to the voice information in the target video, and then performing the first preset processing on the first content in the first information according to the second information, and then according to the second information The first content after the preset processing determines the first information after the first preset processing, and then processes the first content after the first preset processing according to preset rules to determine or generate The target information corresponding to the video image frame is used to improve the accuracy of image recognition text, thereby improving the accuracy of the target information, so that the user can correctly understand the text according to the target information, it is easier to understand the video content, and the user experience is improved.
第四实施例Fourth embodiment
可选地,参照图13,基于第二实施例,所述S30包括:Optionally, referring to FIG. 13 , based on the second embodiment, the S30 includes:
S31,获取所述语音信息对应的初始信息;S31. Obtain initial information corresponding to the voice information;
S32,根据所述第一信息对所述初始信息进行第二预设处理,以获得所述第二信息。S32. Perform a second preset process on the initial information according to the first information to obtain the second information.
可选地,所述预设处理,可以是校准处理,也可以是其他处理等。Optionally, the preset processing may be calibration processing or other processing.
在本申请实施例中,用户在播放所述目标视频时,为了方便用户观看视频,智能终端会根据所述目标视频对应的语音信息生成对应的字幕,进而将所述字幕显示在视频图像帧的下方,以方便用户容易理解目标视频的语音信息,但可能会存在语音信息存在方言,或所述语音信息包括一些专业术语时,容易导致生成的字幕不准确的情况,基于此,本申请实施例提出了一种对字幕进行预设处理的方法。In this embodiment of the application, when the user is playing the target video, in order to facilitate the user to watch the video, the smart terminal will generate corresponding subtitles according to the voice information corresponding to the target video, and then display the subtitles on the frame of the video image. Below, it is convenient for users to easily understand the voice information of the target video, but there may be dialects in the voice information, or when the voice information includes some professional terms, it is easy to cause the generated subtitles to be inaccurate. Based on this, the embodiment of this application A method for preset processing of subtitles is proposed.
可选地,所述初始信息为根据所述语音信息通过语音转技术转换的信息,在获取所述初始信息后,根据所述第一信息对所述初始信息进行预设处理,以获得所述处理后的初始信息,进而根据所述处理后的所述初始信息确定所述第二信息。Optionally, the initial information is information converted through speech-to-speech technology according to the voice information, and after the initial information is acquired, preset processing is performed on the initial information according to the first information to obtain the The processed initial information, and then determine the second information according to the processed initial information.
可选地,所述根据所述第一信息对所述初始信息进行预设处理的具体实施方式包括:获取所述初始信息对应的待处理文字,将所述待处理文字与所述第一信息的第一内容进行匹配,以获取所述第一内容中与所述待处理文字匹配的目标信息,所述目标信息与所述待处理文字的相似度大于或等于预设阈值,将所述目标信息替换所述待处理文字。可选地:所述待处理文字包括“P to P”,所述第一内容中包括“P2P”,“P to P”与“P2P”的相似度为98%,预设阈值为95%,则证明所述“P2P”为所述目标信息,进而将所述目标信息替换“P to P”。Optionally, the specific implementation manner of performing preset processing on the initial information according to the first information includes: acquiring text to be processed corresponding to the initial information, combining the text to be processed with the first information Match the first content of the first content to obtain the target information in the first content that matches the text to be processed, the similarity between the target information and the text to be processed is greater than or equal to a preset threshold, and the target Information replaces the pending text. Optionally: the text to be processed includes "P to P", the first content includes "P2P", the similarity between "P to P" and "P2P" is 98%, and the preset threshold is 95%, Then prove that the "P2P" is the target information, and then replace the target information with "P to P".
可选地,将目标信息替换所述待处理文字后,根据替换后的待处理生成所述第二信息。Optionally, after replacing the to-be-processed text with target information, the second information is generated according to the replaced to-be-processed text.
在本申请实施例中,通过第一信息对所述语音信息对应的初始信息进行预设处理,以获取准确的第二信息,以提供语音识别的准确性,进而提升用户体验。In the embodiment of the present application, the initial information corresponding to the voice information is preset through the first information to obtain accurate second information, so as to improve the accuracy of voice recognition and improve user experience.
第五实施例fifth embodiment
可选地,参照图14,基于上述实施例,所述S20包括:Optionally, referring to FIG. 14, based on the foregoing embodiment, the S20 includes:
S24,获取所述第二信息中针对所述第一内容的描述信息和/或关联信息;S24. Obtain descriptive information and/or associated information for the first content in the second information;
S25,根据所述描述信息和/或关联信息确定或生成所述第一内容对应的标识信息;S25. Determine or generate identification information corresponding to the first content according to the description information and/or associated information;
S26,根据所述标识信息标识所述第一内容,以确定或生成所述目标内容。S26. Identify the first content according to the identification information, so as to determine or generate the target content.
在本申请实施例中,所述描述信息为用于所述描述第一内容的描述信息,所述描述信息可以包括所述第一内容对应的属性信息,所述描述信息还可以包括所述第一内容对应的状态信息。可选地,所述描述信息包括但不限于所述属性信息以及状态信息,所述信息还可以包括与所述第一内容相关联的形容。可选地,所述第一内容为“伦敦街”,所述第二信息中包括针对所述“伦敦街”的属性信息,所述属性信息为“伦敦街长度为1公里”,将“长度为1公里”作为所述描述信息。可选地,所述第二信息中包括针对所述“伦敦街”相关联的形容词,例如“伦敦街为一条美食街”,进而将“一条美食街”作为所述描述信息。In this embodiment of the present application, the description information is description information used to describe the first content, the description information may include attribute information corresponding to the first content, and the description information may also include the second content Status information corresponding to a content. Optionally, the description information includes but not limited to the attribute information and status information, and the information may also include a description associated with the first content. Optionally, the first content is "London Street", the second information includes attribute information for the "London Street", the attribute information is "London Street is 1 km long", and the "length is 1 km" as the description information. Optionally, the second information includes an adjective associated with the "London Street", for example, "London Street is a food street", and then "a food street" is used as the description information.
可选地,所述描述信息与所述第一内容相关联。Optionally, the description information is associated with the first content.
可选地,在获取所述描述信息后,根据所述描述信息生成对应的标识信息。可选地,在所述描述信息为“一条美食街”时,所述标识信息可以是美食标识,在所述描述信息为“该地区的天气为雨天”,所述标识信息可以是雨天标识。Optionally, after the description information is acquired, corresponding identification information is generated according to the description information. Optionally, when the description information is "a food street", the identification information may be a food identification; when the description information is "the weather in this area is rainy", the identification information may be a rain identification.
可选地,所述关联信息包括所述第一信息对应的属性信息,所述属性信息可以包括语言类型信息,风俗文化,宗教文化信息、位置信息、文字解释信息等。Optionally, the associated information includes attribute information corresponding to the first information, and the attribute information may include language type information, customs and culture, religious culture information, location information, text interpretation information, and the like.
可选地,在获取所述关联信息,根据所述关联信息生成对应的标识信息。可选地,所述第一信息为“伦敦街”,所述关联信息为“伦敦街的语言类型为A,宗教文化信息为“基督教”,所述描述信息为“伦敦街的气候为雨天”,生成的标识信息可以是:“语音类型为A,宗教文化为基督教,雨天”。Optionally, after acquiring the association information, corresponding identification information is generated according to the association information. Optionally, the first information is "London Street", the associated information is "The language type of London Street is A, the religious and cultural information is "Christianity", and the description information is "The climate of London Street is rainy" , the generated identification information may be: "voice type is A, religious culture is Christianity, rainy day".
可选地,在获取所述描述信息和/或所述关联信息,还可以是根据所述描述信息和所述关联信息确定所述标识信息。Optionally, when acquiring the description information and/or the association information, the identification information may also be determined according to the description information and the association information.
可以理解的是,所述视频图像帧出现第一信息时,若仅仅是对第一信息进行翻译或对所述第一信息对应的第一位置进行调整,若用户第一次看到所述第一信息中的内容,可能对所述内容并不了解,导致用户在观看所述视频图像帧时,仅仅只能查看内容,而无法获取与所述内容相关信息。而本申请实施例通过所述第二信息获取针对所述第一内容的描述信息以及获取所述第一内容对应的关联信息,进而根据所述描述信息以及所述描述信息生成所述标识信息,进而进而所述标识信息标识所述第一内容,以供用户查看所述第一内容的同时,查看所述标识信息,进而更好的理解所述第一内容以及快速获取与所述第一内容相关信息,提升了用户体验。It can be understood that when the first information appears in the video image frame, if only the first information is translated or the first position corresponding to the first information is adjusted, if the user sees the first information for the first time The content in the information may not be known to the content, so when viewing the video image frame, the user can only view the content, but cannot obtain information related to the content. However, in this embodiment of the present application, the description information for the first content and associated information corresponding to the first content are obtained through the second information, and then the identification information is generated according to the description information and the description information, Furthermore, the identification information identifies the first content, so that the user can view the identification information while viewing the first content, so as to better understand the first content and quickly obtain information related to the first content. Relevant information improves user experience.
在本申请实施例中,根据所述第二信息获取针对所述第一内容的描述信息以及获取所述第一内容对应的关联信息,进而根据所述描述信息以及所述关联信息生成所述第一内容对应的标识信息,进而根据所述标识信息标识所述第一内容,以供后续播放所述目标视频时,在显示所述第一内容的同时,显示所述标识信息,以供用户更好的理解所述第一内容以及快速获取与所述第一内容相关信息,提升了用户体验。In this embodiment of the present application, the description information for the first content and associated information corresponding to the first content are obtained according to the second information, and then the first content is generated according to the description information and the associated information. The identification information corresponding to a content, and then identify the first content according to the identification information, so that when the target video is subsequently played, the identification information is displayed while the first content is displayed, so that the user can update A good understanding of the first content and quick acquisition of information related to the first content improve user experience.
第六实施例Sixth embodiment
可选地,参照图15,基于第一实施例,本申请视频处理方法提出了第六实施例。所述S20之后,还包括:Optionally, referring to FIG. 15 , based on the first embodiment, the video processing method of this application proposes a sixth embodiment. After the S20, also includes:
S50,根据所述目标内容和/或所述目标位置确定或生成目标信息,关联所述目标信息和所述视频图像帧;和/或,根据所述目标内容,所述视频图像帧的时间戳和/或所述目标位置确定或生成目标信息。在本申请实施例中,根据预设规则对所述第一信息进行处理后,获取所述目标内容和/或所述目标位置,根据所述目标内容与目标位置位置确定或生成目标信息,并将所述目标信息与所述视频图像帧建立关联,以供后续在播放目标视频的过程中,直接获取与所述视频图像帧关联的目标信息,进而在所述视频图像帧中对应的目标位置处显示所述目标信息对应的目标内容。S50. Determine or generate target information according to the target content and/or the target position, associating the target information with the video image frame; and/or, according to the target content, the time stamp of the video image frame And/or the target location is determined or target information is generated. In this embodiment of the present application, after the first information is processed according to preset rules, the target content and/or the target location is obtained, and target information is determined or generated according to the target content and target location, and Associating the target information with the video image frame, so that in the process of playing the target video, the target information associated with the video image frame is directly acquired, and then the corresponding target position in the video image frame The target content corresponding to the target information is displayed at .
可选地,所述视频图像帧包括对应的时间戳,所述时间戳用于表示所述视频图像帧在所述目标视频中的播放时间戳。可选地,所述目标视频由若干个视频图像帧组成,各个视频图像帧分别对应着不同的时间戳,在播放所述目标视频时,根据各个所述视频图像帧对应的时间戳以小到大的时间排列顺序依次播放各个视频图像帧。参照图16,图16左侧为所述目标视频对应的各个视频视频图像帧的示意图,包括“F1,F2,F3,F4,F5,F6,F7,F8,F9,...”,各个视频图像帧分别对应着各自的时间戳,“T1,T2,T3,T4,T5,T6,T7,T8,T9,...”,图16右侧为所述目标视频对应的各个视频图像帧的示意图。Optionally, the video image frame includes a corresponding time stamp, and the time stamp is used to represent a playback time stamp of the video image frame in the target video. Optionally, the target video is composed of several video image frames, and each video image frame corresponds to a different time stamp. When playing the target video, according to the time stamp corresponding to each of the video image frames, the The large chronological order plays the individual video image frames sequentially. Referring to Figure 16, the left side of Figure 16 is a schematic diagram of each video image frame corresponding to the target video, including "F1, F2, F3, F4, F5, F6, F7, F8, F9, ...", each video The image frames correspond to their respective time stamps, "T1, T2, T3, T4, T5, T6, T7, T8, T9, ...", and the right side of Figure 16 is the corresponding video image frame of the target video schematic diagram.
在本申请实施例中,确定或生成目标信息的方式还可以是获取所述视频图像帧的时间戳,进而根据所述目标内容、所述视频图像帧的时间戳和/或所述目标位置确定或生成目标信息,以供后续在播放目标视频的过程中,在当前播放时间点与所述时间戳一致时,获取所述目标信息,根据所述目标信息确定与所述时间戳对应的目标内容,进而在所述视频图像帧中对应的目标位置处显示所述目标内容。可选地,在所述播放时间点为“01分30秒”,所述视频图像帧在所述目标视频中的时间戳也是“01分30秒”,则直接获取所述目标信息,进而根据所述目标信息确定“01分30秒”对应的目标内容和/或目标位置,进而在所述视频图像帧对应的目标位置处显示所述目标内容。In this embodiment of the present application, the method of determining or generating target information may also be to obtain the time stamp of the video image frame, and then determine according to the target content, the time stamp of the video image frame and/or the target position Or generate target information for later in the process of playing the target video, when the current playback time point is consistent with the timestamp, acquire the target information, and determine the target content corresponding to the timestamp according to the target information , and then display the target content at the corresponding target position in the video image frame. Optionally, when the playback time point is "01:30", and the time stamp of the video image frame in the target video is also "01:30", then the target information is obtained directly, and then according to The target information determines the target content and/or target position corresponding to "01:30", and then displays the target content at the target position corresponding to the video image frame.
可选地,所述目标信息的确定或生成方式还可以是根据所述目标内容与所述视频图像帧的时间戳生成字幕,以供后续在播放视频中,在当前播放时间点到达所述时间戳时,获取所述目标信息,进而在所述视频图像帧的预设位置处显示所述目标内容,所述预设位置可以是用户自行设定。Optionally, the method of determining or generating the target information may also be to generate subtitles according to the target content and the time stamp of the video image frame, so that in the subsequent video playback, the current playback time point reaches the time When stamping, the target information is obtained, and then the target content is displayed at a preset position of the video image frame, and the preset position may be set by the user.
可选地,所述目标信息的确定或生成方式还可以是根据所述目标内容和/或所述目标信息的预设位置确定或生成目标信息,进而将所述目标信息与所述视频图像帧关联,以供后续在播放视频中,直接获取与所述视频图像帧关联的目标信息,进而在所述视频图像帧中在预设位置处显示所述目标信息对应的目标内容。Optionally, the method of determining or generating the target information may also be to determine or generate the target information according to the target content and/or the preset position of the target information, and then combine the target information with the video image frame association, so as to directly acquire the target information associated with the video image frame in subsequent playing of the video, and then display the target content corresponding to the target information at a preset position in the video image frame.
可选地,所述目标视频对应的各个视频图像帧包括各自的帧序号,所述字幕的确定或生成方式还可以是根据所述目标内容,所述视频图像帧的帧序号以及所述目标位置生成所述目标信息。Optionally, each video image frame corresponding to the target video includes its own frame number, and the way of determining or generating the subtitle may also be based on the target content, the frame number of the video image frame and the target position Generate the target information.
可选地,所述目标信息的确定或生成方式包括但不限于上述几种方式。Optionally, the manners of determining or generating the target information include but are not limited to the above several manners.
可选地,在生成所述目标视频对应的目标信息后,保存所述目标信息,以供后续用户需播放所述目标视频时获取。Optionally, after the target information corresponding to the target video is generated, the target information is saved for acquisition when a subsequent user needs to play the target video.
可选地,所述保存所述目标信息的方式可以是将所述目标信息与所述目标视频共同保存在同一个文件夹中,所述目标信息根据预设规则命名以确保在播放所述目标视频时所述目标信息可以正常加载。Optionally, the way of saving the target information may be to save the target information and the target video together in the same folder, and the target information is named according to preset rules to ensure that the target video is played. The target information in the video can be loaded normally.
可选地,所述保存所述目标信息的方式还可以是将所述目标视频对应的视频流以及语音流压制成音频 文件,进而将所述音频文件以及所述目标信息封包。Optionally, the way of saving the target information may also be to compress the video stream and voice stream corresponding to the target video into an audio file, and then package the audio file and the target information.
可选地,所述保存所述目标信息的方式还可以是将所述目标信息以内嵌的方式内嵌于所述视频图像帧中,在具体实施过程中,获取所述目标信息,将所述目标信息、视频流、音频流共同生成所述目标视频的视频文件。此时,所述目标信息被集成在视频文件的编码流中,在后续播放所述目标视频时,无需获取所述目标信息,而直接播放所述视频文件即可。Optionally, the way of saving the target information may also be to embed the target information in the video image frame in an embedded manner, and during the specific implementation process, the target information is obtained, and the The target information, the video stream, and the audio stream jointly generate a video file of the target video. At this time, the target information is integrated into the coded stream of the video file, and when the target video is played subsequently, it is not necessary to acquire the target information, and the video file can be played directly.
在本申请实施例中,通过将所述目标内容和/或目标位置生成对应的目标信息,在后续用户播放所述目标视频时,可直接获取所述目标信息,进而所述目标视频中的视频图像帧中对应的目标位置显示所述目标内容,以供用户根据所述目标内容可快速理解文字,更容易理解视频内容,提升用户体验。In the embodiment of the present application, by generating the corresponding target information of the target content and/or target location, when the subsequent user plays the target video, the target information can be directly obtained, and then the video in the target video The corresponding target position in the image frame displays the target content, so that the user can quickly understand the text according to the target content, understand the video content more easily, and improve user experience.
第七实施例Seventh embodiment
参照图17,基于上述所有实施例,本申请提出第七实施例,所述方法,还包括:Referring to Figure 17, based on all the above-mentioned embodiments, the present application proposes a seventh embodiment, the method further includes:
S60,检测到所述目标视频的播放请求时,获取所述目标信息;S60. Acquiring the target information when a play request of the target video is detected;
S70,在所述目标视频播放过程中,在所述视频图像帧对应的所述目标位置处显示所述目标内容。S70. During playing the target video, display the target content at the target position corresponding to the video image frame.
在本申请实施例中,用户向智能终端发起所述目标视频的播放请求时,根据所述播放请求获取所述目标视频的目标信息,进而播放所述目标视频,根据所述目标信息在所述目标视频的视频图像帧中显示所述目标内容。In this embodiment of the application, when the user initiates a playback request of the target video to the smart terminal, the target information of the target video is obtained according to the playback request, and then the target video is played, and the target video is played according to the target information in the The target content is displayed in the video image frame of the target video.
可选地,在所述目标信息的保存方式为所述目标信息内嵌于所述视频图像帧时,可直接获取所述目标视频,进而播放所述目标视频。此时,所述目标信息的目标内容已呈现在所述视频图像帧中。Optionally, when the target information is stored in such a way that the target information is embedded in the video image frame, the target video may be directly acquired, and then the target video may be played. At this point, the target content of the target information has been presented in the video image frame.
可选地,在所述目标信息的保存方式为将所述目标信息与所述目标视频共同保存在同一个文件夹中时,在获取所述目标视频的同时,自动加载与所述目标视频在同一个文件夹中的目标信息,进而将所述目标信息的目标内容显示在所述视频图像帧中。Optionally, when the target information is saved in the same folder as the target information and the target video, when the target video is acquired, the target video is automatically loaded. target information in the same folder, and then display the target content of the target information in the video image frame.
可选地,在所述目标信息的保存方式为将所述目标视频对应的视频流以及语音流压制成音频文件,进而将所述音频文件以及所述目标信息封包保存时,同时加载所述音频文件以及所述目标信息,进而将所述目标信息的目标内容显示在所述视频图像帧中。Optionally, when the target information is saved in the form of compressing the video stream and voice stream corresponding to the target video into an audio file, and then saving the audio file and the target information package, the audio is loaded simultaneously file and the target information, and then display the target content of the target information in the video image frame.
可选地,基于所述目标信息对应着视频图像帧,在加载完所述目标信息时,可在所述目标视频播放至所述视频图像帧时,获取所述目标信息,以减少智能终端在播放所述目标视频时的数据处理量。Optionally, based on the target information corresponding to the video image frame, after the target information is loaded, the target information can be acquired when the target video is played to the video image frame, so as to reduce the The amount of data processed when playing the target video.
可选地,参照图18,所述检测到所述目标视频的播放请求时,获取所述目标信息的步骤包括:Optionally, referring to FIG. 18 , when the play request of the target video is detected, the step of acquiring the target information includes:
步骤S61,根据所述目标视频中的视频图像帧与目标信息的关联关系获取所述目标信息,和/或,检测到所述目标视频的播放请求时,根据播放时间点与目标信息中的时间戳的对应关系获取所述目标信息。Step S61: Obtain the target information according to the association relationship between the video image frame and the target information in the target video, and/or, when a playback request of the target video is detected, according to the playback time point and the time in the target information The corresponding relationship of stamps is used to obtain the target information.
可选地,在所述目标信息包括所述目标内容和/或所述目标位置时,本申请将所述目标信息与所述视频图像帧进行关联,进而生成所述目标信息与所述视频图像帧的关联关系。可选地,所述关联关系包括所述视频图像帧以及出现在所述视频图像帧的目标内容。可选地,在播放至所述视频图像帧时,根据所述关联关系确定所述视频图像帧对应的目标内容,进而将所述目标内容显示在所述视频图像帧中的所述目标位置处。Optionally, when the target information includes the target content and/or the target location, the present application associates the target information with the video image frame, and then generates the target information and the video image frame frame associations. Optionally, the association relationship includes the video image frame and target content appearing in the video image frame. Optionally, when playing to the video image frame, determine the target content corresponding to the video image frame according to the association relationship, and then display the target content at the target position in the video image frame .
可选地,在所述目标信息包括目标内容,所述视频图像帧的时间戳和/或所述目标位置时,在检测到所述目标视频的播放请求时,在播放所述目标视频的过程中,获取播放时间点,根据所述播放时间点与所述目标信息中的时间戳的对应关系确定所述播放时间点对应的时间戳,进而根据所述时间戳获取所述目标信息,然后根据所述目标信息确定所述时间戳对应的视频图像帧以及对应的目标内容,在所述视频图像帧中的所述目标位置处显示所述目标内容。Optionally, when the target information includes the target content, the time stamp of the video image frame and/or the target position, when a playback request of the target video is detected, during the process of playing the target video In, obtain the playback time point, determine the timestamp corresponding to the playback time point according to the correspondence between the playback time point and the timestamp in the target information, and then obtain the target information according to the timestamp, and then according to The target information determines the video image frame corresponding to the time stamp and the corresponding target content, and the target content is displayed at the target position in the video image frame.
在本申请实施例中,在检测到所述目标视频的播放请求时,获取所述目标信息,进而播放所述目标视频,以实现在播放至对应的视频图像帧时,在所述视频图像帧上的所述文字位置处显示所述目标内容,以供用户根据所述目标内容可快速理解文字,更容易理解视频内容,提升用户体验。In the embodiment of the present application, when the play request of the target video is detected, the target information is acquired, and then the target video is played, so that when the corresponding video frame is played, the The target content is displayed at the position of the text on the screen, so that the user can quickly understand the text according to the target content, make it easier to understand the video content, and improve user experience.
第八实施例Eighth embodiment
参照图19,基于第七实施例,本发明实施例提出了第八实施例,所述S70包括:Referring to FIG. 19, based on the seventh embodiment, the embodiment of the present invention proposes an eighth embodiment, and the S70 includes:
S71,获取所述目标信息的目标显示参数;S71. Obtain target display parameters of the target information;
S72,在所述目标视频播放过程中,在所述视频图像帧对应的所述目标位置处以所述目标显示参数显示所述目标内容。S72. During the playing process of the target video, display the target content at the target position corresponding to the video image frame with the target display parameters.
在本申请实施例中,所述目标显示参数包括文字显示时长、文字显示方式以及文字目标显示位置的至少一个,所述文字显示方式包括显示所述目标内容,和/或同时显示所述目标内容和所述第一内容。可选地,所述文字显示时间与所述视频图像帧的时间戳对应。可以理解的是,在连续的图像帧中有可能会出现相同的信息。可选地,所述文字显示时长可以是只显示播放预设数量个图像帧的时长,即在所述连续的图像帧中选取预设数量个初始图像帧,在所述预设个初始图像帧显示所述目标内容,在播 放至所述初始图像帧的下一个图像帧时,不再显示所述目标内容。所述预设数量可以是1个,还可以是2个。可选地,在所述预设数量为3个时,3个图像帧的时长为5s,则所述文字显示时长为5s。可选地,所述文字显示方式包括单独显示所述目标内容或叠加显示所述目标内容和/或所述第一内容。在所述文字显示方式为单独显示所述目标内容时,将所述目标内容以预设方式覆盖在所述目标位置处。可选地,在所述文字显示方式为叠加显示所述目标内容和/或所述第一内容时,可将所述目标内容显示在所述第一内容的上方,下方,左侧或右侧。In the embodiment of the present application, the target display parameters include at least one of text display duration, text display mode, and text target display position, and the text display mode includes displaying the target content, and/or simultaneously displaying the target content and the first content. Optionally, the text display time corresponds to the time stamp of the video image frame. It is understandable that the same information may appear in consecutive image frames. Optionally, the text display duration may be the duration of only displaying and playing a preset number of image frames, that is, selecting a preset number of initial image frames in the continuous image frames, and selecting a preset number of initial image frames in the preset number of initial image frames The target content is displayed, and the target content is no longer displayed when the image frame next to the initial image frame is played. The preset number can be 1 or 2. Optionally, when the preset number is 3, the duration of 3 image frames is 5s, and the text display duration is 5s. Optionally, the text display manner includes displaying the target content alone or superimposing the target content and/or the first content. When the text display mode is to display the target content alone, the target content is overlaid on the target position in a preset manner. Optionally, when the text display method is to superimpose the target content and/or the first content, the target content may be displayed above, below, left or right of the first content .
可选地,所述文字显示方式还包括单独高亮显示所述目标内容,即将所述目标内容以预设方式覆盖在所述目标位置处后,将所述目标内容高亮显示;或者,所述文字显示方式还包括叠加显示所述目标内容和/或所述第一内容,将所述目标内容高亮显示。Optionally, the text display method further includes highlighting the target content separately, that is, highlighting the target content after the target content is overlaid on the target position in a preset manner; or, the The text display method further includes superimposing and displaying the target content and/or the first content, and highlighting the target content.
可选地,所述文字目标显示位置为所述目标内容在所述视频图像帧的目标显示位置,所述目标显示位置与所述目标位置可以不同,也可以不同。可选地,在所述目标显示位置与所述目标位置不同时,在播放所述目标视频的过程中,在所述视频图像帧的目标显示位置处显示所述目标内容,而无需在所述目标位置处显示所述目标位置,以实现用户自适应调整目标内容显示位置的功能。Optionally, the text target display position is the target display position of the target content in the video image frame, and the target display position may be different from the target position or different. Optionally, when the target display position is different from the target position, during the process of playing the target video, the target content is displayed at the target display position of the video image frame without displaying the target content at the target display position of the video image frame. The target position is displayed at the target position, so as to realize the user's function of adaptively adjusting the display position of the target content.
可选地,所述目标显示参数可以是默认显示参数,还可以是用户通过选择界面自行设定,所述选择界面用于为用户提供调整显示参数的功能。可选地,所述选择界面的输出方式可以是在用户输入所述目标视频的播放请求时,所述智能终端自行弹出所述选择界面。可选地,所述选择界面的输出方式还可以是在用户观看所述目标视频的过程中,手动打开所述选择界面。Optionally, the target display parameter may be a default display parameter, or may be set by the user through a selection interface, and the selection interface is used to provide the user with a function of adjusting the display parameter. Optionally, the output mode of the selection interface may be that when the user inputs a play request of the target video, the smart terminal pops up the selection interface by itself. Optionally, the output mode of the selection interface may also be that the user manually opens the selection interface during the process of watching the target video.
可选地,本申请实施例提出来一种获取所述目标显示参数的方法,所述S71包括:输出一选择界面;响应针对所述选择界面的触发选择操作,根据所述选择操作确定目标显示参数。Optionally, an embodiment of the present application proposes a method for acquiring the target display parameters, and the S71 includes: outputting a selection interface; in response to a trigger selection operation on the selection interface, determining the target display parameter according to the selection operation parameter.
可选地,所述选择界面可以显示至少一种显示参数以及各个显示参数对应的调整范围,所述显示参数包括文字显示时长、文字显示方式以及文字显示位置的至少一个,所述显示参数还可以包括语言、显示颜色、显示面积等。可选地,在输出所述选择界面后,用户可以根据自身需求触发所述选择界面的选择操作,智能终端在检测到所述选择操作时,根据所述选择操作确定所述目标显示参数。Optionally, the selection interface may display at least one display parameter and the adjustment range corresponding to each display parameter, the display parameters include at least one of text display duration, text display mode, and text display position, and the display parameters may also be Including language, display color, display area, etc. Optionally, after the selection interface is output, the user may trigger a selection operation on the selection interface according to his own needs, and the smart terminal determines the target display parameter according to the selection operation when detecting the selection operation.
可选地,在所述显示参数为文字显示时长时,所述调整范围为“1个图像帧,2个图像帧,3个图像帧...”;在所述显示参数为文字显示方式时,所述调整范围为“单独显示目标内容、叠加显示所述目标内容和/或文字、单独高亮显示所述目标内容...”;在所述显示参数为语言时,所述调整范围为“中文、日文、德文、泰文...”。Optionally, when the display parameter is text display duration, the adjustment range is "1 image frame, 2 image frames, 3 image frames..."; when the display parameter is text display mode , the adjustment range is "display the target content alone, superimpose the target content and/or text, highlight the target content alone..."; when the display parameter is language, the adjustment range is "Chinese, Japanese, German, Thai...".
可选地,所述选择界面还可以显示各个组合显示参数,每个组合显示参数包括若干个显示参数。例如:选择界面中显示“组合显示参数1,组合显示参数2,组合显示参数3...”,组合显示参数1中包括“文字显示显示:1个图像帧;文字显示方式:单独显示目标内容;文字目标显示位置:文字下方;语言:中文”。可选地,用户可基于自身需求选择任意组合显示参数,进而触发所述选择界面的选择操作,智能终端在检测到所述选择操作,基于所述选择操作确定目标组合显示参数,将所述目标组合显示参数确定为所述目标显示参数。Optionally, the selection interface may also display various combined display parameters, and each combined display parameter includes several display parameters. For example: "Combined display parameter 1, combined display parameter 2, combined display parameter 3..." is displayed in the selection interface, and combined display parameter 1 includes "Text display display: 1 image frame; text display method: display the target content alone ;Text target display position: below the text; Language: Chinese". Optionally, the user can select any combination of display parameters based on their own needs, and then trigger the selection operation of the selection interface. After the smart terminal detects the selection operation, it determines the display parameters of the target combination based on the selection operation, and stores the target The combined display parameter is determined as the target display parameter.
可选地,智能终端在检测到用户的选择操作时,可保存所述选择参数对应的目标显示参数,在下一次用户触发播放视频时,可直接根据所述目标显示参数播放视频。Optionally, when the smart terminal detects the user's selection operation, it can save the target display parameters corresponding to the selection parameters, and the next time the user triggers to play the video, it can directly play the video according to the target display parameters.
在本申请实施例中,通过为用户提供选择界面,以供用户根据自身观看需求基于所述选择界面自适应调整所述目标内容的显示参数,以在播放所述目标视频时,根据目标显示参数为用户显示所述目标内容,在方便用户快速理解视频内容的同时,还满足用户的观看需求,进而提升用户的观看体验。In this embodiment of the present application, by providing users with a selection interface for users to adaptively adjust the display parameters of the target content based on the selection interface based on their own viewing needs, so that when the target video is played, the target display parameters Displaying the target content for the user not only facilitates the user to quickly understand the video content, but also satisfies the user's viewing needs, thereby improving the user's viewing experience.
第九实施例Ninth embodiment
本申请还提供一种视频处理方法,参照图20,所述视频处理方法包括:The present application also provides a video processing method. Referring to FIG. 20, the video processing method includes:
S80,获取目标视频中至少一个视频图像帧对应的第一信息以及获取所述目标视频中语音信息对应的第二信息;S80. Obtain first information corresponding to at least one video image frame in the target video and obtain second information corresponding to voice information in the target video;
S90,根据所述第一信息和所述第二信息确定或生成所述目标视频对应的目标信息。S90. Determine or generate target information corresponding to the target video according to the first information and the second information.
在本申请实施例中,所述第一信息为所述视频图像帧中存在的预设信息,所述预设信息可以是视频图像帧上的文本信息,具体地,判断所述视频图像帧中是否存在预设信息,若是,从所述视频图像帧中获取所述第一信息。可选地,从所述视频图像帧获取所述第一信息的方式为基于光学字符识别(OCR)算法从所述视频图像帧获取所述第一信息,可选地,所述第一信息包括第一内容和/或第一位置。In this embodiment of the present application, the first information is preset information existing in the video image frame, and the preset information may be text information on the video image frame. Specifically, it is determined that the video image frame contains Whether there is preset information, and if so, acquire the first information from the video image frame. Optionally, the method of obtaining the first information from the video image frame is to obtain the first information from the video image frame based on an optical character recognition (OCR) algorithm, and optionally, the first information includes First content and/or first location.
可选地,所述第二信息为所述目标视频中对应的语音信息基于语音转文字技术识别得出的语音识别结果。可选地,在获取所述语音信息后,根据所述语音信息通过语音转技术将所述语音信息转换成对应的第二信息,所述第二信息与所述视频图像帧一一对应。Optionally, the second information is a speech recognition result obtained by recognizing corresponding speech information in the target video based on speech-to-text technology. Optionally, after the voice information is acquired, the voice information is converted into corresponding second information through a voice-to-speech technology according to the voice information, and the second information is in one-to-one correspondence with the video image frames.
可选地,在获取所述第一信息以及第二信息后,根据所述第一信息和所述第二信息确定或生成所述目标信息。可选地,所述目标信息可以包括所述第一信息经过处理后的图像文本信息,所述目标信息还 可以包括所述第二信息经过处理后的语音文本信息,所述目标信息还可以包括所述图像文本信息以及所述语音文本信息。Optionally, after the first information and the second information are acquired, the target information is determined or generated according to the first information and the second information. Optionally, the target information may include the processed image text information of the first information, the target information may also include the processed voice text information of the second information, and the target information may also include The image text information and the voice text information.
可选地,所述根据所述第一信息和所述第二信息确定或生成所述目标视频对应的目标信息的步骤包括:根据所述第二信息对所述第一信息进行第三预设处理,以获取处理后的图像文本信息,和/或,根据所述第一信息对所述第二信息进行第四预设处理,以获取处理后的语音文本信息。Optionally, the step of determining or generating target information corresponding to the target video according to the first information and the second information includes: performing a third preset on the first information according to the second information processing to obtain processed image text information, and/or perform fourth preset processing on the second information according to the first information to obtain processed voice text information.
可选地,所述第三预设处理,可以是校准处理,也可以是其他处理等;所述第四预设处理可以是校准处理,也可以是其他处理;所述校准处理可以是语音校准处理,还可以是文本校准处理。Optionally, the third preset processing may be calibration processing or other processing; the fourth preset processing may be calibration processing or other processing; the calibration processing may be voice calibration processing, and may also be text calibration processing.
可选地,所述第一信息包括第一内容,所述第一信息的第一内容较潦草时,在基于OCR识别出所述第一内容时,可能会出现识别出所述第一信息不准确的情况。本申请实施例通过获取所述目标视频各个所述视频图像帧对应的语音信息,根据所述语音信息对所述第一信息进行预设处理,以确定或生成所述目标内容,所述语音信息与所述视频图像帧一一对应。Optionally, the first information includes first content. When the first content of the first information is rough, when the first content is recognized based on OCR, it may appear that the first information is not recognized exact situation. In this embodiment of the present application, by acquiring the voice information corresponding to each of the video image frames of the target video, and performing preset processing on the first information according to the voice information, to determine or generate the target content, the voice information One-to-one correspondence with the video image frames.
可选地,在获取所述语音信息后,根据所述语音信息通过语音转技术将所述语音信息转换成对应的第二信息,进而根据所述第二信息对所述第一信息进行第三预设处理,所述第二信息与所述视频图像帧一一对应。Optionally, after the voice information is acquired, the voice information is converted into corresponding second information through voice-to-speech technology according to the voice information, and then the third information is performed on the first information according to the second information. Preset processing, the second information is in one-to-one correspondence with the video image frames.
可选地,所述根据所述第二信息对所述第一信息进行第三预设处理的步骤包括:根据预设获取规则获取所述第一信息中的待处理内容;在所述第二信息中确定与所述待处理内容对应的语音处理内容;根据所述语音处理内容对所述待处理内容进行预设处理,以确定或生成所述目标信息。Optionally, the step of performing third preset processing on the first information according to the second information includes: acquiring content to be processed in the first information according to preset acquisition rules; The voice processing content corresponding to the content to be processed is determined in the information; and the content to be processed is subjected to preset processing according to the voice processing content to determine or generate the target information.
可选地,所述预设获取规则可以是接收用户针对所述第一内容的处理指令,所述处理指令包括待处理内容,所述预设获取规则还可以是所述智能终端智能获取所述待处理内容。可选地,所述智能终端智能获取所述待处理的方式可以是将所述第一内容作为所述待处理内容,还可以是获取所述第一内容对应的概率系数。可选地,在所述概率系数小于或等于预设概率系数时,将所述第一内容作为所述待处理内容。Optionally, the preset acquisition rule may be to receive a user's processing instruction for the first content, the processing instruction includes content to be processed, and the preset acquisition rule may also be that the smart terminal intelligently acquires the first content. Pending content. Optionally, the way for the smart terminal to intelligently acquire the to-be-processed content may be to use the first content as the to-be-processed content, or to acquire a probability coefficient corresponding to the first content. Optionally, when the probability coefficient is less than or equal to a preset probability coefficient, the first content is used as the content to be processed.
可选地,在获取所述待处理内容后,在所述第二信息中确定与所述待处理内容对应的语音处理内容。Optionally, after the content to be processed is acquired, the voice processing content corresponding to the content to be processed is determined in the second information.
可选地,所述第二信息与所述视频图像帧的时间戳一一对应。可选地,在获取所述待处理内容后,获取所述待处理内容所在的视频图像帧的时间戳,根据所述时间戳确定与所述时间戳对应的第二信息,将与所述时间戳对应的第二信息作为所述语音处理内容。Optionally, the second information is in one-to-one correspondence with the time stamps of the video image frames. Optionally, after the content to be processed is acquired, the time stamp of the video image frame where the content to be processed is located is acquired, and the second information corresponding to the time stamp is determined according to the time stamp. Stamp the corresponding second information as the voice processing content.
可选地,所述语音处理内容为语音转换后的第二信息。Optionally, the voice processing content is voice-converted second information.
可选地,在获取所述语音处理内容后,根据所述语音处理内容对所述待处理内容进行预设处理(如校准),以获取预设处理后的所述待处理内容,进而根据所述预设处理后的所述待处理内容确定预设处理后的所述第一内容,进而根据所述预设处理后的所述第一信息以确定或生成所述目标内容。可选地,所述待处理内容包括“伦铎大道”,所述语音处理内容包括“伦敦大道”,进而将所述第一内容校准为“伦敦大道”。Optionally, after the voice processing content is acquired, preset processing (such as calibration) is performed on the content to be processed according to the voice processing content, so as to obtain the content to be processed after preset processing, and then according to the The content to be processed after the preset processing determines the first content after the preset processing, and then determines or generates the target content according to the first information after the preset processing. Optionally, the content to be processed includes "Londo Avenue", the speech processing content includes "London Avenue", and then the first content is calibrated as "London Avenue".
可选地,在又一实施例中,在获取所述待处理内容后,获取所述待处理内容对应的视频图像帧,进而获取所述视频图像帧对应的语音信息,根据所述语音信息对所述待处理内容进行预设处理。可选地,所述待处理内容包括“伦铎大道”,所述语音信息包括“伦敦大道”,进而将所述待处理校准为“伦敦大道”。Optionally, in yet another embodiment, after the content to be processed is acquired, the video image frame corresponding to the content to be processed is acquired, and then the voice information corresponding to the video image frame is acquired, and the The content to be processed is processed by default. Optionally, the content to be processed includes "Londo Avenue", the voice information includes "London Avenue", and then the to-be-processed content is calibrated as "London Avenue".
可选地,用户在播放所述目标视频时,为了方便用户观看视频,智能终端会根据所述目标视频对应的对话内容生成对应的字幕,进而将所述字幕显示在视频图像帧的下方,以方便用户容易理解目标视频的语音信息,但可能会存在语音信息存在方言,或所述语音信息包括一些专业术语时,容易导致生成的字幕不准确的情况,基于此,本申请实施例提出了一种对字幕进行预设处理的方法。Optionally, when the user is playing the target video, in order to facilitate the user to watch the video, the smart terminal will generate corresponding subtitles according to the dialogue content corresponding to the target video, and then display the subtitles below the video image frame to It is convenient for users to easily understand the voice information of the target video, but there may be dialects in the voice information, or when the voice information includes some professional terms, it is easy to cause inaccurate generated subtitles. Based on this, the embodiment of this application proposes a A method for preset processing of subtitles.
可选地,所述第二信息为根据所述语音信息通过语音转技术转换的信息,在获取所述第二信息后,根据所述第一信息对所述第二信息进行预设处理,以获得所述处理后的第二信息,进而根据所述处理后的所述第二信息确定所述语音信息。Optionally, the second information is information converted through speech-to-speech technology according to the voice information, and after the second information is acquired, preset processing is performed on the second information according to the first information, so as to The processed second information is obtained, and then the voice information is determined according to the processed second information.
可选地,所述根据所述第一信息对所述第二信息进行第四预设处理的具体实施方式包括:获取所述第二信息对应的待处理文字,将所述待处理文字与所述第一信息的第一内容进行匹配,以获取所述第一内容中与所述待处理文字匹配的目标文本内容。可选地,所述目标文本内容与所述待处理文字的相似度大于或等于预设阈值,将所述目标文本内容替换所述待处理文字。可选地,所述待处理文字包括“P to P”,所述第一内容中包括“P2P”,“P to P”与“P2P”的相似度为98%,预设阈值为95%,则证明所述“P2P”为所述目标内容,进而将“P to P”替换为“P2P”。Optionally, the specific implementation manner of performing the fourth preset processing on the second information according to the first information includes: acquiring the text to be processed corresponding to the second information, combining the text to be processed with the Match the first content of the first information to obtain the target text content in the first content that matches the text to be processed. Optionally, the similarity between the target text content and the text to be processed is greater than or equal to a preset threshold, and the text to be processed is replaced by the target text content. Optionally, the text to be processed includes "P to P", the first content includes "P2P", the similarity between "P to P" and "P2P" is 98%, and the preset threshold is 95%. Then prove that the "P2P" is the target content, and then replace "P to P" with "P2P".
可选地,将目标信息替换所述待处理文字后,根据替换后的待处理文字确定或生成所述语音文本信息。Optionally, after the text to be processed is replaced by the target information, the phonetic text information is determined or generated according to the replaced text to be processed.
可选地,在根据所述第一信息对所述第二信息进行第四预设处理前,需提前判断所述第一信息中的第一内容是否存在与所述待处理文字匹配的目标文本内容,若所述第一内容不存在与所述待处理文字匹配的目标文本内容,则无需根据所述第一信息对所述第二信息进行第四预设处理,基于此,本申请实 施例还提出一种判断是否需要对所述第二信息进行第四预设处理的方法。Optionally, before performing the fourth preset processing on the second information according to the first information, it is necessary to judge in advance whether there is a target text matching the text to be processed in the first content of the first information content, if there is no target text content matching the text to be processed in the first content, it is not necessary to perform fourth preset processing on the second information based on the first information. Based on this, the embodiment of the present application A method for judging whether to perform fourth preset processing on the second information is also proposed.
可选地,所述根据所述第一信息对所述第二信息进行第四预设处理,以获取处理后的语音文本信息的步骤,包括:Optionally, the step of performing fourth preset processing on the second information according to the first information to obtain the processed voice-text information includes:
判断所述语音信息是否与所述第一信息相对应;judging whether the voice information corresponds to the first information;
若是,根据所述第一信息对所述第二信息进行第四预设处理,以获取处理后的语音文本信息。If yes, perform a fourth preset process on the second information according to the first information, so as to obtain processed voice-text information.
可选地,判断所述语音信息是否与所述第一信息相对应的方式可以是获取所述语音信息对应的语音特征参数以及所述第一信息对应的语音识别参数,将所述语音特征参数与所述语音识别参数比对,进而获取与所述语音特征参数匹配的目标语音识别参数,可以理解的是,在所述第一信息对应的语音识别参数包括与所述语音特征参数匹配的目标语音识别参数时,确定所述第一信息中存在与所述语音信息相对应的目标内容。Optionally, the manner of judging whether the voice information corresponds to the first information may be to obtain the voice feature parameters corresponding to the voice information and the voice recognition parameters corresponding to the first information, and convert the voice feature parameters to Compared with the speech recognition parameters, and then obtain the target speech recognition parameters matching the speech feature parameters, it can be understood that the speech recognition parameters corresponding to the first information include the target speech recognition parameters matching the speech feature parameters When the voice recognition parameter is used, it is determined that the target content corresponding to the voice information exists in the first information.
可选地,所述获取所述语音信息对应的语音特征参数的方式可以是将所述语音信息进行分词处理,以获取若干个子语音信息,进而获取各个所述子语音信息对应的语音特征参数,进而将各个所述子语音信息对应的语音特征参数确定为所述语音信息对应的语音特征参数。Optionally, the manner of obtaining the speech feature parameters corresponding to the speech information may be to perform word segmentation processing on the speech information to obtain several sub-speech information, and then obtain the speech feature parameters corresponding to each of the sub-speech information, Further, the speech feature parameters corresponding to each sub-speech information are determined as the speech feature parameters corresponding to the speech information.
可选地,所述获取所述第一信息对应的语音识别参数的方式可以是将所述第一内容进行分词处理,以获取若干个子内容,进而获取各个所述子内容对应的语音识别参数,进而根据各个所述子内容对应的语音识别参数确定所述第一信息对应的语音识别参数。Optionally, the manner of obtaining the speech recognition parameters corresponding to the first information may be to perform word segmentation processing on the first content to obtain several sub-contents, and then obtain the speech recognition parameters corresponding to each of the sub-contents, Further, the speech recognition parameters corresponding to the first information are determined according to the speech recognition parameters corresponding to each sub-content.
可选地,在获取与所述语音特征参数匹配的目标识别参数后,确定所述语音信息与所述第一信息相对应,进而获取所述第二信息对应的待处理文字,进而根据所述目标识别参数获取确定所述待处理文字对应的目标文本内容,进而将所述待处理文字替换成所述目标文本内容,以完成对所述第二信息的第四预设处理,进而获取所述语音文本信息。Optionally, after obtaining the target recognition parameters that match the speech feature parameters, it is determined that the speech information corresponds to the first information, and then the text to be processed corresponding to the second information is obtained, and then according to the Target recognition parameter acquisition determines the target text content corresponding to the text to be processed, and then replaces the text to be processed with the target text content, so as to complete the fourth preset processing of the second information, and then obtain the Voice text messages.
在本申请实施例中,通过获取所述目标视频对应的第一信息和/或第二信息,进而根据所述第一信息对所述第二信息进行预设处理,以获取语音文本信息,和/或根据所述第二信息对所述第一信息进行预设处理,以获取图像文本信息,进而根据所述图像文本信息和/或语音文本信息确定或生成目标信息,以供后续在播放所述目标视频时,可直接获取所述目标信息,进而所述目标视频中的视频图像帧中显示所述图像文本信息和/或语音文本信息,以供用户根据所述图像文本嘻嘻可快速理解视频图像帧上的文本信息,根据所述语音文本信息可准确理解目标视频中的对话内容,更容易理解视频内容,提升用户体验。In the embodiment of the present application, by obtaining the first information and/or the second information corresponding to the target video, and then performing preset processing on the second information according to the first information, so as to obtain the voice text information, and /or perform preset processing on the first information according to the second information to obtain image text information, and then determine or generate target information according to the image text information and/or voice text information for subsequent playback When the target video is described, the target information can be obtained directly, and then the image text information and/or voice text information is displayed in the video image frame in the target video, so that the user can quickly understand it according to the image text The text information on the video image frame can accurately understand the dialogue content in the target video according to the voice text information, which makes it easier to understand the video content and improves user experience.
本申请实施例还提供一种智能终端,智能终端包括存储器、处理器,存储器上存储有XX程序,XX程序被处理器执行时实现上述任一实施例中的视频处理方法的步骤。The embodiment of the present application also provides an intelligent terminal, the intelligent terminal includes a memory and a processor, and the XX program is stored in the memory, and when the XX program is executed by the processor, the steps of the video processing method in any of the foregoing embodiments are implemented.
本申请实施例还提供一种存储介质,存储介质上存储有视频处理程序,视频处理程序被处理器执行时实现上述任一实施例中的视频处理方法的步骤。An embodiment of the present application further provides a storage medium, on which a video processing program is stored, and when the video processing program is executed by a processor, the steps of the video processing method in any of the foregoing embodiments are implemented.
在本申请实施例提供的智能终端和计算机可读存储介质的实施例中,可以包含任一上述XX方法实施例的全部技术特征,说明书拓展和解释内容与上述方法的各实施例基本相同,在此不再做赘述。In the embodiments of the intelligent terminal and the computer-readable storage medium provided in the embodiments of the present application, all the technical features of any of the above-mentioned XX method embodiments may be included, and the expansion and explanation of the description are basically the same as the embodiments of the above-mentioned methods. This will not be repeated here.
本申请实施例还提供一种计算机程序产品,计算机程序产品包括计算机程序代码,当计算机程序代码在计算机上运行时,使得计算机执行如上各种可能的实施方式中的方法。An embodiment of the present application further provides a computer program product, the computer program product includes computer program code, and when the computer program code is run on the computer, the computer is made to execute the methods in the above various possible implementation manners.
本申请实施例还提供一种芯片,包括存储器和处理器,存储器用于存储计算机程序,处理器用于从存储器中调用并运行计算机程序,使得安装有芯片的设备执行如上各种可能的实施方式中的方法。The embodiment of the present application also provides a chip, including a memory and a processor. The memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device installed with the chip executes the above various possible implementation modes. Methods.
可以理解,上述场景仅是作为示例,并不构成对于本申请实施例提供的技术方案的应用场景的限定,本申请的技术方案还可应用于其他场景。例如,本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。It can be understood that the above scenario is only an example, and does not constitute a limitation on the application scenario of the technical solution provided by the embodiment of the present application, and the technical solution of the present application can also be applied to other scenarios. For example, those skilled in the art know that with the evolution of the system architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。The steps in the methods of the embodiments of the present application can be adjusted, combined and deleted according to actual needs.
本申请实施例设备中的单元可以根据实际需要进行合并、划分和删减。Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs.
在本申请中,对于相同或相似的术语概念、技术方案和/或应用场景描述,一般只在第一次出现时进行详细描述,后面再重复出现时,为了简洁,一般未再重复阐述,在理解本申请技术方案等内容时,对于在后未详细描述的相同或相似的术语概念、技术方案和/或应用场景描述等,可以参考其之前的相关详细描述。In this application, descriptions of the same or similar terms, concepts, technical solutions and/or application scenarios are generally only described in detail when they appear for the first time, and when they appear repeatedly later, for the sake of brevity, they are generally not repeated. When understanding the technical solutions and other contents of the present application, for the same or similar term concepts, technical solutions and/or application scenario descriptions that are not described in detail later, you can refer to the previous relevant detailed descriptions.
在本申请中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In this application, the description of each embodiment has its own emphasis. For the parts that are not detailed or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.
本申请技术方案的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本申请记载的范围。The various technical features of the technical solution of the present application can be combined arbitrarily. For the sake of concise description, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It should be regarded as the scope described in this application.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需 的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,被控终端,或者网络设备等)执行本申请每个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic CD, CD), including several instructions to make a terminal device (which may be a mobile phone, computer, server, controlled terminal, or network device, etc.) execute the method of each embodiment of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络,或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、存储盘、磁带)、光介质(例如,DVD),或者半导体介质(例如固态存储盘Solid State Disk(SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, special purpose computer, a computer network, or other programmable apparatus. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media. Usable media may be magnetic media, (eg, floppy disk, memory disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), among others.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.
Claims (20)
- 一种视频处理方法,其中,所述方法包括:A video processing method, wherein the method comprises:获取目标视频中至少一个视频图像帧对应的第一信息;Obtain first information corresponding to at least one video image frame in the target video;根据预设规则对所述第一信息进行处理,以确定或生成与所述视频图像帧对应的目标信息。The first information is processed according to a preset rule to determine or generate target information corresponding to the video image frame.
- 如权利要求1所述的方法,其中,所述根据预设规则对所述第一信息进行处理,以确定或生成与所述视频图像帧对应的目标信息的步骤,包括:The method according to claim 1, wherein the step of processing the first information according to preset rules to determine or generate target information corresponding to the video image frame comprises:识别所述视频图像帧的场景信息,根据所述场景信息确定或生成特征信息;identifying the scene information of the video image frame, and determining or generating feature information according to the scene information;根据所述特征信息和所述第一信息确定或生成所述目标信息。The target information is determined or generated according to the characteristic information and the first information.
- 如权利要求1所述的方法,其中,所述方法还包括:The method of claim 1, wherein the method further comprises:获取所述目标视频中语音信息对应的第二信息;Obtaining second information corresponding to the voice information in the target video;根据所述第二信息对所述第一信息进行第一预设处理,以确定或生成所述目标信息。Performing a first preset process on the first information according to the second information to determine or generate the target information.
- 如权利要求3所述的方法,其中,所述获取所述目标视频中语音信息对应的第二信息的步骤,包括:The method according to claim 3, wherein the step of obtaining the second information corresponding to the voice information in the target video comprises:获取所述语音信息对应的初始信息;Acquiring initial information corresponding to the voice information;根据所述第一信息对所述初始信息进行第二预设处理,以获得所述第二信息。performing a second preset process on the initial information according to the first information to obtain the second information.
- 如权利要求3所述的方法,其中,所述根据预设规则对所述第一信息进行处理的步骤,包括:The method according to claim 3, wherein the step of processing the first information according to preset rules includes:获取所述第二信息中针对第一内容的描述信息和/或关联信息;Acquiring descriptive information and/or associated information for the first content in the second information;根据所述描述信息和/或所述关联信息确定或生成所述第一内容对应的标识信息;Determine or generate identification information corresponding to the first content according to the description information and/or the association information;根据所述标识信息标识所述第一内容,以确定或生成所述目标内容。Identifying the first content according to the identification information to determine or generate the target content.
- 如权利要求1至5中任一项所述的方法,其中,所述根据预设规则对所述第一信息进行处理的步骤,包括:The method according to any one of claims 1 to 5, wherein the step of processing the first information according to preset rules includes:在第一内容对应的语言类型与预设语言类型不匹配时,将所述第一内容转换为与所述预设语言类型对应的目标内容;和/或,根据预设位置规则确定或生成与第一位置对应的目标位置。When the language type corresponding to the first content does not match the preset language type, convert the first content into the target content corresponding to the preset language type; and/or determine or generate the target content according to the preset position rule The target location corresponding to the first location.
- 如权利要求6所述的方法,其中,所述预设位置规则包括以下至少一种:The method according to claim 6, wherein the preset location rules include at least one of the following:将与所述第一位置间隔预设距离的位置作为目标位置;taking a position separated by a preset distance from the first position as a target position;响应在第一位置之内或之外的预设操作,根据所述预设操作确定或生成所述目标位置。In response to a preset operation within or outside of a first location, the target location is determined or generated based on the preset operation.
- 如权利要求1至5中任一项所述的方法,其中,所述根据预设规则对所述第一信息进行处理的步骤之后,所述方法还包括:The method according to any one of claims 1 to 5, wherein, after the step of processing the first information according to preset rules, the method further comprises:根据目标内容和/或目标位置确定或生成目标信息,关联所述目标信息和所述视频图像帧;和/或,根据所述目标内容,所述视频图像帧的时间戳和/或所述目标位置确定或生成所述目标信息。Determine or generate target information according to the target content and/or target position, and associate the target information with the video image frame; and/or, according to the target content, the time stamp of the video image frame and/or the target The location is determined or the target information is generated.
- 如权利要求1至5中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 5, wherein said method further comprises:检测到所述目标视频的播放请求时,获取所述目标信息;When detecting the playback request of the target video, acquiring the target information;在所述目标视频播放过程中,在所述视频图像帧对应的所述目标位置处显示所述目标内容。During the playing of the target video, the target content is displayed at the target position corresponding to the video image frame.
- 如权利要求9所述的方法,其中,所述检测到所述目标视频的播放请求时,获取所述目标信息的步骤,包括:The method according to claim 9, wherein, when the play request of the target video is detected, the step of obtaining the target information comprises:根据所述目标视频中的视频图像帧与目标信息的关联关系获取所述目标信息,和/或,检测到所述目标视频的播放请求时,根据播放时间点与目标信息中的时间戳的对应关系获取所述目标信息。Obtain the target information according to the association relationship between the video image frame in the target video and the target information, and/or, when a playback request of the target video is detected, according to the correspondence between the playback time point and the timestamp in the target information The relationship acquires the target information.
- 如权利要求9所述的方法,其中,所述在所述目标视频播放过程中,在所述目标图像帧对应的所述位置处显示所述目标内容的步骤,包括:The method according to claim 9, wherein the step of displaying the target content at the position corresponding to the target image frame during the playback of the target video comprises:获取所述目标信息的目标显示参数;Acquiring target display parameters of the target information;在所述目标视频播放过程中,在所述视频图像帧对应的所述目标位置处以所述目标显示参数显示所述目标内容。During the playing process of the target video, the target content is displayed at the target position corresponding to the video image frame with the target display parameters.
- 如权利要11所述的方法,其中,所述获取所述目标信息的目标显示参数的步骤,包括:The method according to claim 11, wherein said step of obtaining target display parameters of said target information comprises:输出一选择界面;Output a selection interface;响应针对所述选择界面的触发选择操作,根据所述选择操作确定所述目标显示参数。In response to a trigger selection operation on the selection interface, the target display parameter is determined according to the selection operation.
- 如权利要求1至5中任一项所述的方法,其中,所述获取目标视频中至少一个视频图像帧对应的第一信息的步骤,包括:The method according to any one of claims 1 to 5, wherein the step of obtaining the first information corresponding to at least one video image frame in the target video includes:判断所述视频图像帧中是否存在预设信息;judging whether preset information exists in the video image frame;若是,从所述视频图像帧中获取所述第一信息。If yes, acquire the first information from the video image frame.
- 一种视频处理方法,其中,所述方法包括:A video processing method, wherein the method includes:获取目标视频中至少一个视频图像帧对应的第一信息以及获取所述目标视频中语音信息对应的第二信息;Acquiring first information corresponding to at least one video image frame in the target video and acquiring second information corresponding to voice information in the target video;根据所述第一信息和所述第二信息确定或生成所述目标视频对应的目标信息。Determine or generate target information corresponding to the target video according to the first information and the second information.
- 如权利要求14所述的方法,其中,所述根据所述第一信息和所述第二信息确定或生成所述目标视 频对应的目标信息的步骤,包括:The method according to claim 14, wherein the step of determining or generating target information corresponding to the target video according to the first information and the second information includes:根据所述第二信息对所述第一信息进行第三预设处理,以获取图像文本信息,和/或,根据所述第一信息对所述第二信息进行第四预设处理,以获取语音文本信息;Performing third preset processing on the first information according to the second information to obtain image text information, and/or performing fourth preset processing on the second information according to the first information to obtain voice text messages;根据所述图像文本信息和/或所述语音文本信息确定或生成所述目标信息。The target information is determined or generated according to the image text information and/or the voice text information.
- 如权利要求15所述的方法,其中,所述根据所述第二信息对所述第一信息进行第三预设处理,以获取处理后的图像文本信息的步骤,包括:The method according to claim 15, wherein the step of performing third preset processing on the first information according to the second information to obtain processed image text information includes:根据预设获取规则获取所述第一信息中的待处理内容;Acquiring the content to be processed in the first information according to a preset acquisition rule;在所述第二信息确定与待处理内容对应的语音处理内容;Determine the voice processing content corresponding to the content to be processed in the second information;根据所述语音处理内容对所述待处理内容进行第三预设处理,以确定或生成所述图像文本信息。Performing a third preset process on the content to be processed according to the voice processing content, so as to determine or generate the image text information.
- 如权利要求15所述的方法,其中,所述根据所述第二信息对第一信息进行第三预设处理,以获取图像文本信息的步骤,包括:The method according to claim 15, wherein the step of performing a third preset process on the first information according to the second information to obtain image text information includes:获取所述第二信息中针对第一内容的描述信息和/或关联信息;Acquiring descriptive information and/or associated information for the first content in the second information;根据所述描述信息和/或关联信息确定或生成所述第一内容对应的标识信息;Determine or generate identification information corresponding to the first content according to the description information and/or associated information;根据所述标识信息标识所述第一内容,以确定或生成所述图像文本信息。Identifying the first content according to the identification information to determine or generate the image text information.
- 如权利要求15所述的方法,其中,所述根据所述第一信息对所述第二信息进行第四预设处理,以获取语音文本信息的步骤,包括:The method according to claim 15, wherein said step of performing fourth preset processing on said second information according to said first information to obtain voice text information comprises:判断所述语音信息是否与所述第一信息相对应;judging whether the voice information corresponds to the first information;若是,根据所述第一信息对所述第二信息进行第四预设处理,以获取语音文本信息。If yes, perform a fourth preset process on the second information according to the first information, so as to obtain voice-to-text information.
- 一种智能终端,其中,所述智能终端包括:存储器、处理器,其中,所述存储器上存储有视频处理程序,所述视频处理程序被所述处理器执行时实现如权利要求1所述的视频处理方法的步骤。An intelligent terminal, wherein the intelligent terminal comprises: a memory and a processor, wherein a video processing program is stored on the memory, and when the video processing program is executed by the processor, the method according to claim 1 is realized. Steps of a video processing method.
- 一种存储介质,其中,所述存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1所述的视频处理方法的步骤。A storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the video processing method according to claim 1 are implemented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/134410 WO2023097446A1 (en) | 2021-11-30 | 2021-11-30 | Video processing method, smart terminal, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/134410 WO2023097446A1 (en) | 2021-11-30 | 2021-11-30 | Video processing method, smart terminal, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023097446A1 true WO2023097446A1 (en) | 2023-06-08 |
Family
ID=86611288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/134410 WO2023097446A1 (en) | 2021-11-30 | 2021-11-30 | Video processing method, smart terminal, and storage medium |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023097446A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104219459A (en) * | 2014-09-30 | 2014-12-17 | 上海摩软通讯技术有限公司 | Video language translation method and system and intelligent display device |
CN106604125A (en) * | 2016-12-29 | 2017-04-26 | 北京奇艺世纪科技有限公司 | Video subtitle determining method and video subtitle determining device |
CN107273895A (en) * | 2017-06-15 | 2017-10-20 | 幻视互动(北京)科技有限公司 | Method for the identification of video flowing real-time text and translation of head-wearing type intelligent equipment |
CN111416950A (en) * | 2020-03-26 | 2020-07-14 | 腾讯科技(深圳)有限公司 | Video processing method and device, storage medium and electronic equipment |
CN112995749A (en) * | 2021-02-07 | 2021-06-18 | 北京字节跳动网络技术有限公司 | Method, device and equipment for processing video subtitles and storage medium |
-
2021
- 2021-11-30 WO PCT/CN2021/134410 patent/WO2023097446A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104219459A (en) * | 2014-09-30 | 2014-12-17 | 上海摩软通讯技术有限公司 | Video language translation method and system and intelligent display device |
CN106604125A (en) * | 2016-12-29 | 2017-04-26 | 北京奇艺世纪科技有限公司 | Video subtitle determining method and video subtitle determining device |
CN107273895A (en) * | 2017-06-15 | 2017-10-20 | 幻视互动(北京)科技有限公司 | Method for the identification of video flowing real-time text and translation of head-wearing type intelligent equipment |
CN111416950A (en) * | 2020-03-26 | 2020-07-14 | 腾讯科技(深圳)有限公司 | Video processing method and device, storage medium and electronic equipment |
CN112995749A (en) * | 2021-02-07 | 2021-06-18 | 北京字节跳动网络技术有限公司 | Method, device and equipment for processing video subtitles and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108289244B (en) | Video subtitle processing method, mobile terminal and computer readable storage medium | |
CN108572764B (en) | Character input control method and device and computer readable storage medium | |
CN110033769B (en) | Recorded voice processing method, terminal and computer readable storage medium | |
CN113556492B (en) | Thumbnail generation method, mobile terminal and readable storage medium | |
CN109302528B (en) | Photographing method, mobile terminal and computer readable storage medium | |
CN112732134A (en) | Information identification method, mobile terminal and storage medium | |
CN113126844A (en) | Display method, terminal and storage medium | |
CN112700783A (en) | Communication sound changing method, terminal equipment and storage medium | |
CN112163148A (en) | Information display method, mobile terminal and storage medium | |
WO2023108444A1 (en) | Image processing method, intelligent terminal, and storage medium | |
WO2023097446A1 (en) | Video processing method, smart terminal, and storage medium | |
WO2024055333A1 (en) | Image processing method, smart device, and storage medium | |
CN112532786B (en) | Image display method, terminal device, and storage medium | |
CN113555002A (en) | Data processing method, mobile terminal and storage medium | |
CN114092366A (en) | Image processing method, mobile terminal and storage medium | |
CN114442886A (en) | Data processing method, intelligent terminal and storage medium | |
CN113286106A (en) | Video recording method, mobile terminal and storage medium | |
CN113901245A (en) | Picture searching method, intelligent terminal and storage medium | |
CN109656658B (en) | Editing object processing method and device and computer readable storage medium | |
CN112199964A (en) | Text translation method, electronic device and readable storage medium | |
CN112672213A (en) | Video information processing method and device and computer readable storage medium | |
CN115809670A (en) | Content translation method, intelligent terminal and storage medium | |
CN107623780B (en) | Editing method, terminal and computer readable storage medium | |
CN114020190A (en) | Terminal control method, intelligent terminal and storage medium | |
CN116343748A (en) | Control method, intelligent terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21965890 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |