CN113824899B

CN113824899B - Video processing method, video processing device, electronic equipment and medium

Info

Publication number: CN113824899B
Application number: CN202111101321.7A
Authority: CN
Inventors: 单文睿; 陈进生; 王正宜; 吴悦; 郑程; 李晋芳; 曹溪语; 孙晓萌; 郭永惠; 张晶; 秦志伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2022-11-04
Anticipated expiration: 2041-09-18
Also published as: CN113824899A

Abstract

The disclosure provides a video processing method, a video processing device, electronic equipment and a medium, and relates to the technical field of videos, in particular to the field of human-computer interaction. The implementation scheme is as follows: acquiring at least one text information segment of a video to be processed, wherein each text information segment in the at least one text information segment is a part of text information corresponding to voice information of the video to be processed; and for any one text information segment in the at least one text information segment, in response to determining that the first processing instruction for the text information segment is received, determining a related video segment corresponding to the text information segment in the video to be processed, wherein the text information segment corresponds to the voice information in the related video segment; and executing the first processing aiming at the relevant video segment corresponding to the text information segment.

Description

Video processing method, video processing device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of video technologies, and in particular, to the field of human-computer interaction, and in particular, to a method and an apparatus for video processing, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of video technology, a large number of video works are generated on a video platform every day. The recorded original video needs to be processed through a series of video processing post-works to obtain the video works, and the post-works need to occupy a large workload. How to improve the video processing efficiency is a problem to be solved urgently in the field.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for video processing.

According to an aspect of the present disclosure, there is provided a video processing method including: acquiring at least one text information segment of a video to be processed, wherein each text information segment in the at least one text information segment is a part of text information corresponding to voice information of the video to be processed; and for any one text information segment in the at least one text information segment, in response to determining that the first processing instruction for the text information segment is received, determining a related video segment corresponding to the text information segment in the video to be processed, wherein the text information segment corresponds to the voice information in the related video segment; and executing the first processing aiming at the relevant video segment corresponding to the text information segment.

According to an aspect of the present disclosure, there is provided a video processing apparatus including: the video processing device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is configured to acquire at least one text information segment of a video to be processed, and each text information segment in the at least one text information segment is a part of text information corresponding to voice information of the video to be processed; and a processing unit configured to perform processing for any one of the at least one piece of text information, the processing unit comprising: the determining subunit is configured to, in response to determining that the first processing instruction for the text information segment is received, determine a related video segment corresponding to the text information segment in the video to be processed, wherein the text information segment corresponds to the voice information in the related video segment; and an execution subunit configured to execute a first process for a relevant video segment corresponding to the piece of textual information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above-described method.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program realizes the above-described method when executed by a processor.

According to one or more embodiments of the present disclosure, video processing efficiency can be improved, and thus user experience can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

fig. 2A shows a flow diagram of a method of video processing according to an embodiment of the present disclosure;

fig. 2B shows a flow diagram of another method of video processing according to an embodiment of the present disclosure;

fig. 3 shows a schematic diagram of a video processing method according to an embodiment of the present disclosure;

fig. 4 shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

In the related art, in order to implement processing of a video, a user is often required to drag a video progress bar, locate a video segment to be processed in the video in a manner of browsing video content, and then execute processing of the video segment. This method is cumbersome and time consuming, and especially when a user uses a mobile device, such as a mobile phone, to process a video, the size of the display screen of the mobile device and the limitations of the interaction method are limited, and the user sometimes needs to drag the progress bar back and forth repeatedly on the screen with a hand to locate a video segment to be processed in the video, which causes an obstacle to the video processing.

Based on this, the present disclosure provides a video processing method, which establishes a binding relationship between a video segment and a text information segment in a video to be processed, and executes, for any text information segment in the video to be processed, first processing for a relevant video segment corresponding to the text information segment in a case where a first processing instruction for the text information segment is received.

Because the text information can intuitively reflect the content displayed in the video, the user can quickly determine the corresponding video segment to be processed in the video to be processed by browsing the text information segment, so that the situation that the user can know the content of each part in the video to be processed only by dragging the video progress bar can be avoided, the video processing efficiency is improved, and the user experience is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an example system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the method of video processing to be performed.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may use the

client device

101, 102, 103, 104, 105, and/or 106 to issue a first processing instruction for a segment of textual information or a global processing instruction for the video to be processed. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablets, personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database used by the server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order. The head model in this embodiment is not a head model for a specific user, and cannot reflect personal information of a specific user.

Fig. 2A and 2B illustrate a video processing method flowchart according to an embodiment of the present disclosure, and as shown in fig. 2A and 2B, a video processing method includes: step S201, acquiring at least one text information segment of a video to be processed, wherein each text information segment in the at least one text information segment is a part of text information corresponding to voice information of the video to be processed; and step S201, aiming at any one text information segment in the at least one text information segment, executing the following processing: step S201-1, in response to determining that a first processing instruction for the text information segment is received, determining a related video segment corresponding to the text information segment in the video to be processed, wherein the text information segment corresponds to voice information in the related video segment; and step S201-2, executing the first processing of the relevant video segment corresponding to the text information segment.

Aiming at some videos taking voice content as a main factor, such as live television and the like, the problem that users pay most attention to ensuring accuracy, clearness and smoothness of the voice content in the videos is often solved. According to the scheme, the text information corresponding to the voice information in the video to be processed is used as a clue, and the corresponding relation between the text information segment and the video segment is established, so that the user can automatically and synchronously realize the corresponding processing of the video segment to be processed through the processing of the text information segment, and finally the processed video with smooth voice information can be obtained.

With respect to step S201, according to some embodiments, acquiring at least one text information segment of the video to be processed may include: acquiring text information corresponding to voice information in a video to be processed; and dividing the text information to obtain at least one text information segment. Thereby, at least one text information segment for video processing can be conveniently acquired.

The text information corresponding to the voice information of the video to be processed can be acquired through a voice recognition technology, and can also be manually entered, which is not limited herein.

According to some embodiments, dividing the text information may comprise: identifying at least one stop point in the text information; and dividing the text information based on the at least one stop point. Therefore, the text information can be conveniently divided.

In one embodiment, the pause point may be a semantic break point. Specifically, by performing semantic analysis on the text information, at least one semantic break point in the text information is identified, and the text information is divided at a position corresponding to each semantic break point in the at least one semantic break point. Each text information segment obtained by dividing the text information has complete semantic content. For example, each piece of textual information may be a complete sentence. Thus, the user can perform processing on the video to be processed in units of each sentence in the voice information.

In another embodiment, the pause point may be a time pause point. Each character in the text information has a respective timestamp, and identifying at least one stopping point in the text information may include: and in response to the time difference between a previous time stamp corresponding to a previous character and a next time stamp corresponding to a next character in any two adjacent characters in the text information being larger than a preset threshold value, at least one of the previous time stamp and the next time stamp is taken as a time stop point. Thus, the text information can be divided into positions corresponding to each of the at least one time stop points. So that each text information segment obtained by dividing the text information has coherent voice information. Therefore, the silent paragraphs in the video to be processed can be identified from the video to be processed, and subsequent targeted processing is facilitated.

According to some embodiments, dividing the text information may further comprise: and dividing the text information according to the maximum number of characters which can be contained in each text information segment. Therefore, the length of each text information segment can be limited, and the text information segments can be conveniently displayed.

In one embodiment, the maximum number of characters that can be contained in each segment of text information may be determined based on one or more factors of the size of the display screen, font size, font type, and the like.

According to some embodiments, the at least one text information segment of the acquired video to be processed may be arranged in a sequence in a time sequence on the display screen, and the user may conveniently browse the content of each part of the video to be processed by browsing the sequence formed by the at least one text information segment.

Further, the user may execute step S202 to perform corresponding processing for each of the at least one text information segment of the acquired video to be processed.

With respect to step S201-1, according to some embodiments, the first processing instruction may be issued directly for the text information segment by the user. For example, at least one text information segment has two repeated text information segments, and the user can directly issue the first processing instruction for removing one text information segment.

According to some embodiments, the method may further comprise: after at least one text information segment of a video to be processed is obtained, determining the type corresponding to each text information segment in the at least one text information segment; acquiring a global processing instruction for a video to be processed, wherein the global processing instruction comprises a processing type; and for each text information segment in at least one text information segment, responding to the type corresponding to the text information segment contained in the processing type, and determining that a first processing instruction for the text information segment is received. Batch processing of multiple video segments can thereby be achieved.

According to some embodiments, the processing type includes one or more of silent sentences, verbal sentences, or repeated sentences.

According to some embodiments, for any one of the at least one piece of text information, in response to the absence of characters in the piece of text information, the type to which the piece of text information corresponds is determined to be a silent sentence. Different from the method of determining the silent video segments in the video by adopting the audio amplitude, the silent sentences in the text information are determined by the method to position the silent video segments in the video to be processed, so that the interference of environmental noise on judgment can be avoided, and the judgment accuracy is improved.

According to some embodiments, for any one of the at least one segment of text information, the segment of text information is determined to be a spoken sentence in response to only one or more spoken words (e.g., kay, o, phi, etc.) being included in the segment of text information.

According to some embodiments, for any two temporally adjacent pieces of text information of the at least one piece of text information, in response to the content similarity in the two pieces of text information being above a preset threshold, e.g., the character repetition rate in the two pieces of text information being greater than 60%, determining at least one of the two pieces of text information as a repeat sentence

According to some embodiments, on the basis of determining the type corresponding to each text information segment in at least one text information segment, a shortcut key capable of selecting one or more text information segments of the same type is provided, so that a user can conveniently select the text information segments of the same type at one time through the shortcut key, and the uniform processing on the text information segments of the same type is conveniently executed.

According to some embodiments, determining the relevant video segment corresponding to the text information segment in the video to be processed comprises: determining a first time stamp corresponding to a first character of the text information section and a second time stamp corresponding to a tail character of the text information section; and determining a relevant video segment corresponding to the text information segment in the video to be processed based on the first time stamp and the second time stamp. Thus, it is possible to easily determine the relevant video segment corresponding to the text information segment so that the text information segment corresponds to the speech information in the relevant video segment.

According to some embodiments, the first processing comprises one or more of a removal operation, an insertion operation, a position transformation operation, a muting operation, or an editing operation.

The removing operation may be used to delete the relevant video segment corresponding to the text information segment, the inserting operation may be used to insert another video segment before or after the relevant video segment corresponding to the text information segment, the position transforming operation may be used to adjust the position of the relevant video segment corresponding to the text information segment, the muting operation may set the relevant video segment corresponding to the text information segment to mute, and the editing operation may set a filter or animation, etc. to the relevant video segment corresponding to the text information segment.

On this basis, step S201-2 can be executed, that is, for the first processing instruction of the text information segment, the first processing for the relevant video segment corresponding to the text information segment is executed.

Fig. 3 shows a schematic diagram of a video processing method according to an embodiment of the present disclosure. As shown in fig. 3, in the display pages 310 to 330, a plurality of pieces of text information are arranged in a row on the display page in chronological order below the video to be processed 340 and its progress bar.

In the display page 310, the user selects a first segment of text information in the display page 310, so that the to-be-processed video 340 and the progress bar thereof can be played to a position corresponding to the first segment of text information (i.e. the beginning of the to-be-processed video 340). And displaying an icon representing the removed first processing instruction on the right side of the selected first text information segment, wherein the user can issue a removal processing instruction aiming at the text information segment by operating the icon.

The user may enter the display page 320 by selecting "multiple choice" in the display page 310, in which display page 320 the user may select a number of pieces of text information to be processed at once. The user may also enter the display page 320 by selecting "one-key optimization" in the display page 310, and automatically select all text information segments belonging to the type to be processed under the preset condition, such as silent sentences, language sentences, repeated sentences, and the like, in the display page 320.

The processed video and the plurality of text information segments corresponding to the processed video are obtained by selecting "remove segment" from the display page 320 and entering the display page 330.

Fig. 4 shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 4, a video processing apparatus 400 includes: a first obtaining unit 410, configured to obtain at least one text information segment of a video to be processed, where each text information segment of the at least one text information segment is a part of text information corresponding to voice information of the video to be processed; and a processing unit 420 configured to perform processing for any one of the at least one piece of text information, the processing unit 420 comprising: a determining subunit 421, configured to, in response to determining that the first processing instruction for the text information segment is received, determine a relevant video segment corresponding to the text information segment in the video to be processed, where the text information segment corresponds to the speech information in the relevant video segment; and an execution subunit 422 configured to execute the first processing for the relevant video segment corresponding to the text information segment.

According to some embodiments, the first obtaining unit comprises: the acquisition subunit is configured to acquire text information corresponding to the voice information in the video to be processed; and a dividing subunit configured to divide the text information to obtain at least one text information segment.

According to some embodiments, dividing the sub-unit comprises: a module for identifying at least one stop point in the text information; and a module for dividing the text information based on the at least one pause point.

According to some embodiments, dividing the sub-unit further comprises: and the module is used for dividing the text information according to the maximum number of characters which can be contained in each text information segment.

According to some embodiments, determining the sub-unit comprises: a module for determining a first time stamp corresponding to a first character of the text information segment and a second time stamp corresponding to a last character of the text information segment; and a module for determining a relevant video segment corresponding to the text information segment in the video to be processed based on the first time stamp and the second time stamp.

According to some embodiments, the apparatus further comprises: the first determining unit is configured to determine a type corresponding to each text information segment in at least one text information segment after the at least one text information segment of the video to be processed is acquired; the second acquisition unit is configured to acquire a global processing instruction for the video to be processed, wherein the global processing instruction comprises a processing type; and the second determining unit is used for responding to the type corresponding to the text information segment contained in the processing type and determining that the first processing instruction aiming at the text information segment is received aiming at each text information segment in at least one text information segment.

According to some embodiments, the first processing comprises one or more of a removal operation, an insertion operation, a position transformation operation, a muting operation, or an editing operation. According to another aspect of the present disclosure, there is also disclosed an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform any one of the methods described above.

According to another aspect of the present disclosure, a non-transitory computer readable storage medium having computer instructions stored thereon for causing a computer to perform any one of the above methods is also disclosed.

According to another aspect of the disclosure, a computer program product is also disclosed, comprising a computer program, wherein the computer program realizes any of the above methods when executed by a processor.

According to an embodiment of the present disclosure, there is also provided an electronic device, a readable storage medium, and a computer program product.

Referring to fig. 5, a block diagram of a structure of an electronic device 500, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506, an output unit 507, a storage unit 508, and a communication unit 509. The input unit 506 may be any type of device capable of inputting information to the device 500, and the input unit 506 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote controller. Output unit 507 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 508 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 1302.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as a video processing method. For example, in some embodiments, the video processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 509. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the video processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the video processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or removing steps, may be used. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. A video processing method, comprising:

acquiring at least one text information segment of a video to be processed, wherein each text information segment in the at least one text information segment is a part of text information corresponding to voice information of the video to be processed and corresponds to a sentence with complete semantic content in the voice information; and

for any one text information segment of the at least one text information segment, performing the following processes:

in response to determining that a first processing instruction is received for the piece of text information, performing a first process for the piece of text information;

determining a related video segment corresponding to the text information segment in the video to be processed, wherein the text information segment corresponds to the voice information in the related video segment; and

and synchronizing the first processing aiming at the relevant video segment corresponding to the text information segment.

2. The method of claim 1, wherein said obtaining at least one text information segment of the video to be processed comprises:

acquiring the text information corresponding to the voice information in the video to be processed; and

and dividing the text information to obtain the at least one text information segment.

3. The method of claim 2, wherein the dividing the textual information comprises:

identifying at least one stop point in the text information; and

the text information is divided based on the at least one pause point.

4. The method of claim 2 or 3, wherein the dividing the textual information further comprises:

and dividing the text information according to the maximum number of characters which can be contained in each text information segment.

5. The method of claim 1, wherein the determining the relevant video segment corresponding to the text information segment in the video to be processed comprises:

determining a first time stamp corresponding to a first character of the text information section and a second time stamp corresponding to a tail character of the text information section; and

and determining a relevant video segment corresponding to the text information segment in the video to be processed based on the first time stamp and the second time stamp.

6. The method of claim 1, further comprising:

after the at least one text information segment of the video to be processed is obtained, determining the type corresponding to each text information segment in the at least one text information segment;

acquiring a global processing instruction for the video to be processed, wherein the global processing instruction comprises a processing type; and

and for each text information segment in the at least one text information segment, determining that the first processing instruction for the text information segment is received in response to the processing type including the type corresponding to the text information segment.

7. The method of claim 6, wherein the processing type includes one or more of silent sentences, verbal sentences, or repeated sentences.

8. The method of claim 1, wherein the first processing comprises one or more of a removal operation, an insertion operation, a position transformation operation, a muting operation, or an editing operation.

9. A video processing apparatus comprising:

the video processing device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is configured to acquire at least one text information segment of a video to be processed, and each text information segment in the at least one text information segment is a part of text information corresponding to voice information of the video to be processed and corresponds to a sentence with complete semantic content in the voice information; and

a processing unit configured to perform processing for any of the at least one piece of text information, the processing unit comprising:

a determining subunit, configured to, in response to determining that the first processing instruction for the piece of text information is received, perform first processing for the piece of text information; determining a related video segment corresponding to the text information segment in the video to be processed, wherein the text information segment corresponds to the voice information in the related video segment; and

and the execution subunit is configured to synchronize the first processing with respect to the relevant video segment corresponding to the text information segment.

10. The apparatus of claim 9, wherein the first obtaining unit comprises:

the acquisition subunit is configured to acquire the text information corresponding to the voice information in the video to be processed; and

a dividing subunit configured to divide the text information to obtain the at least one text information segment.

11. The apparatus of claim 10, wherein the molecular scribing unit comprises:

a module for identifying at least one stop point in the text information; and

a module that divides the text information based on the at least one pause point.

12. The apparatus of claim 10 or 11, wherein the molecular dividing unit further comprises:

and the module is used for dividing the text information according to the maximum number of characters which can be contained in each text information segment.

13. The apparatus of claim 9, wherein the determining subunit comprises:

a module for determining a first time stamp corresponding to the first character of the text information section and a second time stamp corresponding to the last character of the text information section; and

14. The apparatus of claim 9, further comprising:

the first determining unit is configured to determine a type corresponding to each text information segment in at least one text information segment after the at least one text information segment of the video to be processed is obtained;

the second acquisition unit is configured to acquire a global processing instruction for the video to be processed, wherein the global processing instruction includes a processing type; and

and the second determining unit is configured to determine, for each of the at least one text information segment, that the first processing instruction for the text information segment is received in response to a type corresponding to the text information segment included in the processing types.

15. The apparatus of claim 14, wherein the type of processing comprises one or more of silent sentences, verbal sentences, or repeated sentences.

16. The apparatus of claim 9, wherein the first processing comprises one or more of a remove operation, an insert operation, a position transform operation, a mute operation, or an edit operation.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.