CN111601154A - Video processing method and related equipment - Google Patents

Video processing method and related equipment Download PDF

Info

Publication number
CN111601154A
CN111601154A CN202010381164.9A CN202010381164A CN111601154A CN 111601154 A CN111601154 A CN 111601154A CN 202010381164 A CN202010381164 A CN 202010381164A CN 111601154 A CN111601154 A CN 111601154A
Authority
CN
China
Prior art keywords
information
media file
audio information
voice
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010381164.9A
Other languages
Chinese (zh)
Other versions
CN111601154B (en
Inventor
王鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN202010381164.9A priority Critical patent/CN111601154B/en
Publication of CN111601154A publication Critical patent/CN111601154A/en
Application granted granted Critical
Publication of CN111601154B publication Critical patent/CN111601154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4345Extraction or processing of SI, e.g. extracting service information from an MPEG stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Abstract

The embodiment of the application discloses a video processing method and related equipment, wherein the method is applied to electronic equipment and comprises the following steps: acquiring audio information in a media file when recording the media file for live broadcasting; identifying the audio information to obtain character information corresponding to the audio information; marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information; and adding the text information to the media file, and playing the media file added with the text information. By adopting the embodiment of the application, the video processing efficiency can be improved.

Description

Video processing method and related equipment
Technical Field
The present application relates to the field of electronic technologies, and in particular, to a video processing method and related device.
Background
During video playback (e.g., video, movie, or television), subtitles are typically presented for ease of understanding by the user. However, subtitles on a video are manually written in a later period and then are attached to a video file through software, so that the video processing efficiency is low. Moreover, if in the live broadcasting process, because the subtitles cannot be manually written in time, the subtitles cannot be displayed in the live video.
Disclosure of Invention
The embodiment of the application provides a video processing method and related equipment. The video processing efficiency can be improved, and subtitles can be displayed in time in the live broadcasting process.
In a first aspect, an embodiment of the present application provides a video processing method, including:
acquiring audio information in a media file when recording the media file for live broadcasting;
identifying the audio information to obtain character information corresponding to the audio information;
marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information;
and adding the text information to the media file, and playing the media file added with the text information.
Wherein, the identifying the audio information and the obtaining the text information corresponding to the audio information comprises:
sending the audio information to a voice server so that the voice server recognizes the audio information and generates the text information;
and receiving the text information returned by the voice server.
After the audio information is identified and the text information corresponding to the audio information is obtained, the method further includes:
translating the character information to obtain translated characters;
and displaying the translation words and the word information when the media file added with the word information is played.
Wherein the playing the media file added with the text information comprises:
and adjusting the display format of the text information according to an operation instruction input by a user.
Wherein the method further comprises:
starting a voice control mode, and acquiring voice information input by a user in the voice control mode, wherein the voice control mode is to record the media file through voice control;
carrying out voice recognition on the voice information to obtain a control command;
and recording the media file according to the control command.
In a second aspect, an embodiment of the present application provides a video processing apparatus, including:
the acquisition module is used for acquiring audio information in a media file when the media file for live broadcasting is recorded;
the processing module is used for identifying the audio information and obtaining character information corresponding to the audio information; marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information;
the processing module is further configured to add the text information to the media file, and play the media file added with the text information.
Wherein the apparatus further comprises:
the sending module is used for sending the audio information to a voice server so that the voice server can identify the audio information and generate the character information;
and the receiving module is used for receiving the text information returned by the voice server.
The processing module is further used for translating the text information to obtain translated text; and displaying the translation words and the word information when the media file added with the word information is played.
Wherein the apparatus further comprises: the processing module is further used for adjusting the display format of the text information according to an operation instruction input by a user.
The processing module is further configured to start a voice control mode, and acquire voice information input by a user in the voice control mode, where the voice control mode is recording the media file through voice control; carrying out voice recognition on the voice information to obtain a control command; and recording the media file according to the control command.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, a communication interface, and a bus; the processor, the memory and the communication interface are connected through the bus and complete mutual communication; the memory stores a computer program; the processor implements the method of the first aspect or any one of the possible designs of the first aspect by executing a computer program stored in the memory.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program for performing the method according to the first aspect or any one of the possible designs of the first aspect when the computer program runs on one or more processors.
By implementing the embodiment of the application, the audio information in the media file is acquired when the media file for live broadcasting is recorded; identifying audio information and obtaining character information corresponding to the audio information; marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information; and finally, adding the text information into the media file, and playing the media file added with the text information. The subtitles are obtained by carrying out voice recognition on the audio information in the process of recording the media file, so that the subtitles are added to the media file in the process of live broadcasting without manually writing the subtitles, the subtitles are displayed in time in the live broadcasting process, and the video processing efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic architecture diagram of a video processing system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another video processing method provided in the embodiment of the present application;
fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
The system architecture and service scenario of the embodiment of the present application are described below. It should be noted that the system architecture and the service scenario described in the present application are for more clearly illustrating the technical solution of the present application, and do not constitute a limitation to the technical solution provided in the present application, and as a person having ordinary skill in the art knows, along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the present application is also applicable to similar technical problems.
Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a video processing system according to an embodiment of the present disclosure. The video processing system comprises a video recorder 101, a microphone 102, an electronic device 103 and a server 104, the video processing system being arranged to process video, e.g. subtitles. Among other things, the video recorder 101 can be used to record video. The microphone 102 may capture voice information during recording of the video. The electronic device 103 may be various types of user devices, such as a mobile phone, a computer, a tablet computer, and other devices, and may also be wearable devices, such as a smart watch, smart glasses, and the like, which is not limited in this application. Optionally, the system may further include other electronic devices, which are described herein by way of example only and are not limited to the number of electronic devices in the embodiments of the present application. The server 104 may be a single server or a server cluster composed of a plurality of servers. The electronic device 103 may send the request information to the server 104. Accordingly, the server returns the processing result to the electronic device 103 after receiving the request information.
As shown in fig. 2, fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application. The steps in the embodiments of the present application include at least:
s201, when recording a media file for live broadcast, acquiring audio information in the media file.
In a specific implementation, the media file includes video information and audio information, the video information may be recorded by a video recorder, and the audio information related to the video information may be acquired by a microphone. And then the video information and the audio information are transmitted to the electronic equipment through a wireless connection mode (such as wifi and Bluetooth) or a wired connection mode. The media file may also be recorded directly by the electronic device.
S202, identifying the audio information and obtaining character information corresponding to the audio information.
Specifically, the electronic device may use an automatic caption tool locally, and perform speech recognition on the audio information by using a speech recognition algorithm to obtain text information.
S203, marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information.
In a specific implementation, a time point of each character in the character information can be marked to form a time axis. The time axis is determined according to the playing time of the audio information corresponding to the file information, and each time point on the time axis corresponds to the display time of one character. Therefore, the synchronization of the audio information and the character information is kept in the video playing process.
For example, the voice information related to the video information is acquired as "big family", and for the voice information, "big" is 3s with respect to the start time of video playback, "family" is 3.1s with respect to the start time of video playback, and "good" is 3.2s with respect to the start time of video playback. Thus, for text information, the time point that can be marked "big" is 3s, the time point of "home" is 3.1s, and the time point of "good" is 3.2 s. Therefore, in the video playing process, the audio information and the character information can be synchronously played.
Optionally, the voice volume of the audio information may be acquired; and when the voice volume is greater than a preset threshold value, performing voice recognition on the audio information to obtain character information. When the voice volume is not greater than the preset threshold, prompt information can be displayed, and the prompt information is used for prompting the user to enlarge the volume and reacquire the voice information. By limiting the voice volume, the definition of voice information is guaranteed, and the accuracy of voice recognition is guaranteed.
S203, adding the text information to the media file, and playing the media file added with the text information.
In a specific implementation, the text information may be added to the media file, and then the media file with the text information added thereto is played according to the time extraction. Optionally, the media file added with the text information may be live broadcast on an electronic device, or the media file added with the text information may be sent to other electronic devices, so as to be live broadcast on other electronic devices. The media file added with the text information can also be saved for subsequent playing.
Optionally, the text information may be translated to obtain translated text; and displaying the translation words and the word information (such as Chinese-English language) when the media file added with the word information is played. Furthermore, the text information can be displayed at a preset display position on a display interface of the media file according to a preset display mode. The preset display position may be a middle position, a top position or a bottom position on a display interface of the video information. The preset display mode can be gradient display, jump display and the like. And are not limited herein.
Optionally, when the media file added with the text information is played, the display format of the text information may be adjusted according to an operation instruction input by a user. For example, the displayed text information may be enlarged or reduced, or the font, display position, and the like of the text information may be adjusted.
Optionally, if the user does not need to acquire audio information in the media file when recording the media file, a voice control mode may be started, and the voice information input by the user is acquired in the voice control mode, where the voice control mode is to record the media file through voice control; then carrying out voice recognition on the voice information to obtain a control command; and recording the media file according to the control command. For example, in the voice control mode, the user utters a voice "stop" and stops recording the media file. The user initiates recording of the media file by speaking the voice "start". If the user needs to acquire the audio information in the media file when recording the media file, the voice control mode can be switched to the voice recording mode, and the voice recording mode is a mode for performing voice recognition on the audio information in the media file.
In the embodiment of the application, audio information in a media file is acquired when the media file for live broadcasting is recorded; identifying audio information and obtaining character information corresponding to the audio information; marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information; and finally, adding the text information into the media file, and playing the media file added with the text information. The subtitles are obtained by carrying out voice recognition on the audio information in the process of recording the media file, so that the subtitles are added to the media file in the process of live broadcasting without manually writing the subtitles, the subtitles are displayed in time in the live broadcasting process, and the video processing efficiency is improved.
As shown in fig. 3, fig. 3 is a schematic flowchart of another video processing method according to an embodiment of the present application. The steps in the embodiments of the present application include at least:
s301, when recording a media file for live broadcast, acquiring audio information in the media file.
In a specific implementation, the media file includes video information and audio information, the video information may be recorded by a video recorder, and the audio information related to the video information may be acquired by a microphone. And then the video information and the audio information are transmitted to the electronic equipment through a wireless connection mode (such as wifi and Bluetooth) or a wired connection mode. The media file may also be recorded directly by the electronic device.
S302, the electronic equipment sends the audio information to a voice server.
And S303, after receiving the audio information, the voice server identifies the audio information, generates the text information and then sends the text information to the electronic equipment.
Optionally, the electronic device may locally perform speech recognition on the audio information using an auto-caption tool to obtain text information. The voice server can also be used for carrying out voice recognition on the audio information to obtain the character information. And then comparing the text information obtained by local recognition with the text information obtained by voice server recognition, and if the accuracy of the text information obtained by local recognition is high, recognizing the voice information by adopting the local mode. And if the accuracy of the character information obtained by the voice server identification is high, the voice server identifies the voice information.
S304, marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information.
In a specific implementation, a time point of each character in the character information can be marked to form a time axis. The time axis is determined according to the playing time of the audio information corresponding to the file information, and each time point on the time axis corresponds to the display time of one character. Therefore, the synchronization of the audio information and the character information is kept in the video playing process.
For example, the voice information related to the video information is acquired as "big family", and for the voice information, "big" is 3s with respect to the start time of video playback, "family" is 3.1s with respect to the start time of video playback, and "good" is 3.2s with respect to the start time of video playback. Thus, for text information, the time point that can be marked "big" is 3s, the time point of "home" is 3.1s, and the time point of "good" is 3.2 s. Therefore, in the video playing process, the audio information and the character information can be synchronously played.
Optionally, the voice volume of the audio information may be acquired; and when the voice volume is greater than a preset threshold value, performing voice recognition on the audio information to obtain character information. When the voice volume is not greater than the preset threshold, prompt information can be displayed, and the prompt information is used for prompting the user to enlarge the volume and reacquire the voice information. By limiting the voice volume, the definition of voice information is guaranteed, and the accuracy of voice recognition is guaranteed.
S305, adding the text information to the media file, and playing the media file added with the text information.
In a specific implementation, the text information may be added to the media file, and then the media file with the text information added thereto is played according to the time extraction. Optionally, the media file added with the text information may be live broadcast on an electronic device, or the media file added with the text information may be sent to other electronic devices, so as to be live broadcast on other electronic devices. The media file added with the text information can also be saved for subsequent playing.
Optionally, the text information may be translated to obtain translated text; and displaying the translation words and the word information (such as Chinese-English language) when the media file added with the word information is played. Furthermore, the text information can be displayed at a preset display position on a display interface of the media file according to a preset display mode. The preset display position may be a middle position, a top position or a bottom position on a display interface of the video information. The preset display mode can be gradient display, jump display and the like. And are not limited herein.
Optionally, when the media file added with the text information is played, the display format of the text information may be adjusted according to an operation instruction input by a user. For example, the displayed text information may be enlarged or reduced, or the font, display position, and the like of the text information may be adjusted.
Optionally, if the user does not need to acquire audio information in the media file when recording the media file, a voice control mode may be started, and the voice information input by the user is acquired in the voice control mode, where the voice control mode is to record the media file through voice control; then carrying out voice recognition on the voice information to obtain a control command; and recording the media file according to the control command. For example, in the voice control mode, the user utters a voice "stop" and stops recording the media file. The user initiates recording of the media file by speaking the voice "start". If the user needs to acquire the audio information in the media file when recording the media file, the voice control mode can be switched to the voice recording mode, and the voice recording mode is a mode for performing voice recognition on the audio information in the media file.
In the embodiment of the application, audio information in a media file is acquired when the media file for live broadcasting is recorded; identifying audio information and obtaining character information corresponding to the audio information; marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information; and finally, adding the text information into the media file, and playing the media file added with the text information. The subtitles are obtained by carrying out voice recognition on the audio information in the process of recording the media file, so that the subtitles are added to the media file in the process of live broadcasting without manually writing the subtitles, the subtitles are displayed in time in the live broadcasting process, and the video processing efficiency is improved.
As shown in fig. 4, fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. The video processing apparatus may include an acquisition module 401, a processing module 402, and a transmitting module 403 and a receiving module 404, each of which functions and functions as follows.
The acquiring module 401 is configured to acquire audio information in a media file when recording the media file for live broadcasting.
In a specific implementation, the media file includes video information and audio information, the video information may be recorded by a video recorder, and the audio information related to the video information may be acquired by a microphone. And then the video information and the audio information are transmitted to the electronic equipment through a wireless connection mode (such as wifi and Bluetooth) or a wired connection mode. The media file may also be recorded directly by the electronic device.
The processing module 402 is configured to identify the audio information, and obtain text information corresponding to the audio information.
Specifically, the electronic device may use an automatic caption tool locally, and perform speech recognition on the audio information by using a speech recognition algorithm to obtain text information.
Optionally, the sending module 403 is configured to send the audio information to a voice server, so that the voice server recognizes the audio information and generates the text information; a receiving module 404, configured to receive the text message returned by the voice server. And voice recognition is carried out in different modes, so that the accuracy of the voice recognition is guaranteed.
Optionally, the text information obtained by local recognition and the text information obtained by voice server recognition may be compared, and if the accuracy of the text information obtained by local recognition is high, the voice information is recognized by local recognition. And if the accuracy of the character information obtained by the voice server identification is high, the voice server identifies the voice information.
Optionally, the processing module 402 is further configured to mark a time point of each character in the character information, where the time point is determined according to the playing time of the audio information, and the time point is used to synchronously play the audio information and the character information.
In a specific implementation, a time point of each character in the character information can be marked to form a time axis. The time axis is determined according to the playing time of the audio information corresponding to the file information, and each time point on the time axis corresponds to the display time of one character. Therefore, the synchronization of the audio information and the character information is kept in the video playing process.
For example, the voice information related to the video information is acquired as "big family", and for the voice information, "big" is 3s with respect to the start time of video playback, "family" is 3.1s with respect to the start time of video playback, and "good" is 3.2s with respect to the start time of video playback. Thus, for text information, the time point that can be marked "big" is 3s, the time point of "home" is 3.1s, and the time point of "good" is 3.2 s. Therefore, in the video playing process, the audio information and the character information can be synchronously played.
Optionally, the voice volume of the audio information may be acquired; and when the voice volume is greater than a preset threshold value, performing voice recognition on the audio information to obtain character information. When the voice volume is not greater than the preset threshold, prompt information can be displayed, and the prompt information is used for prompting the user to enlarge the volume and reacquire the voice information. By limiting the voice volume, the definition of voice information is guaranteed, and the accuracy of voice recognition is guaranteed.
The processing module 402 is further configured to add the text information to the media file, and play the media file added with the text information.
In a specific implementation, the text information may be added to the media file, and then the media file with the text information added thereto is played according to the time extraction. Optionally, the media file added with the text information may be live broadcast on an electronic device, or the media file added with the text information may be sent to other electronic devices, so as to be live broadcast on other electronic devices. The media file added with the text information can also be saved for subsequent playing.
Optionally, the text information may be translated to obtain translated text; and displaying the translation words and the word information (such as Chinese-English language) when the media file added with the word information is played. Furthermore, the text information can be displayed at a preset display position on a display interface of the media file according to a preset display mode. The preset display position may be a middle position, a top position or a bottom position on a display interface of the video information. The preset display mode can be gradient display, jump display and the like. And are not limited herein.
Optionally, when the media file added with the text information is played, the display format of the text information may be adjusted according to an operation instruction input by a user. For example, the displayed text information may be enlarged or reduced, or the font, display position, and the like of the text information may be adjusted.
Optionally, if the user does not need to acquire audio information in the media file when recording the media file, a voice control mode may be started, and the voice information input by the user is acquired in the voice control mode, where the voice control mode is to record the media file through voice control; then carrying out voice recognition on the voice information to obtain a control command; and recording the media file according to the control command. For example, in the voice control mode, the user utters a voice "stop" and stops recording the media file. The user initiates recording of the media file by speaking the voice "start". If the user needs to acquire the audio information in the media file when recording the media file, the voice control mode can be switched to the voice recording mode, and the voice recording mode is a mode for performing voice recognition on the audio information in the media file.
In the embodiment of the application, audio information in a media file is acquired when the media file for live broadcasting is recorded; identifying audio information and obtaining character information corresponding to the audio information; marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information; and finally, adding the text information into the media file, and playing the media file added with the text information. The subtitles are obtained by carrying out voice recognition on the audio information in the process of recording the media file, so that the subtitles are added to the media file in the process of live broadcasting without manually writing the subtitles, the subtitles are displayed in time in the live broadcasting process, and the video processing efficiency is improved.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown, the electronic device may include: at least one processor 501, e.g. a CPU, at least one receiver 503, at least one memory 504, at least one transmitter 505, at least one communication bus 502. Wherein a communication bus 502 is used to enable connective communication between these components. In this embodiment, the receiver 503 and the transmitter 505 of the electronic device may be wired transmission ports, or may also be wireless devices, for example, including an antenna apparatus, for performing signaling or data communication with other node devices. The memory 504 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 504 may optionally be at least one storage device located remotely from the processor 501. A set of program code is stored in the memory 504 and the processor 501 is used to call the program code stored in the memory for performing the following operations:
acquiring audio information in a media file when recording the media file for live broadcasting;
identifying the audio information to obtain character information corresponding to the audio information;
marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information;
and adding the text information to the media file, and playing the media file added with the text information.
The processor 501 is further configured to perform the following operation steps:
sending the audio information to a voice server so that the voice server recognizes the audio information and generates the text information;
and receiving the text information returned by the voice server.
The processor 501 is further configured to perform the following operation steps:
translating the character information to obtain translated characters;
and displaying the translation words and the word information when the media file added with the word information is played.
The processor 501 is further configured to perform the following operation steps:
and adjusting the display format of the text information according to an operation instruction input by a user.
The processor 501 is further configured to perform the following operation steps:
starting a voice control mode, and acquiring voice information input by a user in the voice control mode, wherein the voice control mode is to record the media file through voice control;
carrying out voice recognition on the voice information to obtain a control command;
and recording the media file according to the control command.
Further, the processor may cooperate with the memory and the communication bus to perform the operations of the electronic device in the embodiments of the above application.
It should be noted that, the present application also provides a storage medium for storing an application program, where the application program is used to execute, when running, an operation performed by the electronic device in one of the video processing methods shown in fig. 2 and the video processing method shown in fig. 3.
It should be noted that, the embodiment of the present application also provides an application program, where the application program is configured to execute, when running, operations performed by the electronic device in one of the video processing methods shown in fig. 2 and the video processing method shown in fig. 3.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The content downloading method, the related device and the system provided by the embodiment of the present application are described in detail above, a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A video processing method, applied to an electronic device, the method comprising:
acquiring audio information in a media file when recording the media file for live broadcasting;
identifying the audio information to obtain character information corresponding to the audio information;
marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information;
and adding the text information to the media file, and playing the media file added with the text information.
2. The method of claim 1, wherein the identifying the audio information and obtaining text information corresponding to the audio information comprises:
sending the audio information to a voice server so that the voice server recognizes the audio information and generates the text information;
and receiving the text information returned by the voice server.
3. The method of claim 1, wherein after identifying the audio information and obtaining text information corresponding to the audio information, the method further comprises:
translating the character information to obtain translated characters;
and displaying the translation words and the word information when the media file added with the word information is played.
4. The method of any one of claims 1-3, wherein the playing the media file with the textual information added thereto comprises:
and adjusting the display format of the text information according to an operation instruction input by a user.
5. The method of any one of claims 1-4, further comprising:
starting a voice control mode, and acquiring voice information input by a user in the voice control mode, wherein the voice control mode is to record the media file through voice control;
carrying out voice recognition on the voice information to obtain a control command;
and recording the media file according to the control command.
6. A video processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring audio information in a media file when the media file for live broadcasting is recorded;
the processing module is used for identifying the audio information and obtaining character information corresponding to the audio information; marking the time point of each character in the character information, wherein the time point is determined according to the playing time of the audio information, and the time point is used for synchronously playing the audio information and the character information;
the processing module is further configured to add the text information to the media file, and play the media file added with the text information.
7. The apparatus of claim 6, wherein the apparatus further comprises:
the sending module is used for sending the audio information to a voice server so that the voice server can identify the audio information and generate the character information;
and the receiving module is used for receiving the text information returned by the voice server.
8. The apparatus of claim 6,
the processing module is further used for translating the character information to obtain translated characters; and displaying the translation words and the word information when the media file added with the word information is played.
9. The apparatus of claim 6, wherein the apparatus further comprises:
the processing module is further used for adjusting the display format of the text information according to an operation instruction input by a user.
10. An electronic device, comprising: a processor, a memory, a communication interface, and a bus;
the processor, the memory and the communication interface are connected through the bus and complete mutual communication;
the memory stores executable program code;
the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the method of any one of claims 1-5.
CN202010381164.9A 2020-05-08 2020-05-08 Video processing method and related equipment Active CN111601154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010381164.9A CN111601154B (en) 2020-05-08 2020-05-08 Video processing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010381164.9A CN111601154B (en) 2020-05-08 2020-05-08 Video processing method and related equipment

Publications (2)

Publication Number Publication Date
CN111601154A true CN111601154A (en) 2020-08-28
CN111601154B CN111601154B (en) 2022-04-29

Family

ID=72185248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010381164.9A Active CN111601154B (en) 2020-05-08 2020-05-08 Video processing method and related equipment

Country Status (1)

Country Link
CN (1) CN111601154B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112423078A (en) * 2020-10-28 2021-02-26 卡莱特(深圳)云科技有限公司 Advertisement playing method and device applied to LED display screen
CN113365100A (en) * 2021-06-02 2021-09-07 中国邮政储蓄银行股份有限公司 Video processing method and device
CN113643691A (en) * 2021-08-16 2021-11-12 思必驰科技股份有限公司 Far-field voice message interaction method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932976A (en) * 2006-09-18 2007-03-21 北京北大方正电子有限公司 Method and system for realizing caption and speech synchronization in video-audio frequency processing
JP2007149163A (en) * 2005-11-24 2007-06-14 Yamaha Corp Contents reproduction device
US20120116748A1 (en) * 2010-11-08 2012-05-10 Sling Media Pvt Ltd Voice Recognition and Feedback System
CN104731476A (en) * 2015-03-21 2015-06-24 苏州乐聚一堂电子科技有限公司 Handheld intelligent electronic equipment electronic information expressing method
CN106873937A (en) * 2017-02-16 2017-06-20 北京百度网讯科技有限公司 Pronunciation inputting method and device
WO2017107578A1 (en) * 2015-12-22 2017-06-29 合一网络技术(北京)有限公司 Streaming media and caption instant synchronization displaying and matching processing method, device and system
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN110610699A (en) * 2019-09-03 2019-12-24 北京达佳互联信息技术有限公司 Voice signal processing method, device, terminal, server and storage medium
CN110933485A (en) * 2019-10-21 2020-03-27 天脉聚源(杭州)传媒科技有限公司 Video subtitle generating method, system, device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007149163A (en) * 2005-11-24 2007-06-14 Yamaha Corp Contents reproduction device
CN1932976A (en) * 2006-09-18 2007-03-21 北京北大方正电子有限公司 Method and system for realizing caption and speech synchronization in video-audio frequency processing
US20120116748A1 (en) * 2010-11-08 2012-05-10 Sling Media Pvt Ltd Voice Recognition and Feedback System
CN104731476A (en) * 2015-03-21 2015-06-24 苏州乐聚一堂电子科技有限公司 Handheld intelligent electronic equipment electronic information expressing method
WO2017107578A1 (en) * 2015-12-22 2017-06-29 合一网络技术(北京)有限公司 Streaming media and caption instant synchronization displaying and matching processing method, device and system
CN106873937A (en) * 2017-02-16 2017-06-20 北京百度网讯科技有限公司 Pronunciation inputting method and device
CN108401192A (en) * 2018-04-25 2018-08-14 腾讯科技(深圳)有限公司 Video stream processing method, device, computer equipment and storage medium
CN110610699A (en) * 2019-09-03 2019-12-24 北京达佳互联信息技术有限公司 Voice signal processing method, device, terminal, server and storage medium
CN110933485A (en) * 2019-10-21 2020-03-27 天脉聚源(杭州)传媒科技有限公司 Video subtitle generating method, system, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵鑫等: "《影视编辑理论与实务 第一版》", 30 September 2018 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112423078A (en) * 2020-10-28 2021-02-26 卡莱特(深圳)云科技有限公司 Advertisement playing method and device applied to LED display screen
CN113365100A (en) * 2021-06-02 2021-09-07 中国邮政储蓄银行股份有限公司 Video processing method and device
CN113365100B (en) * 2021-06-02 2022-11-22 中国邮政储蓄银行股份有限公司 Video processing method and device
CN113643691A (en) * 2021-08-16 2021-11-12 思必驰科技股份有限公司 Far-field voice message interaction method and system

Also Published As

Publication number Publication date
CN111601154B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN111601154B (en) Video processing method and related equipment
US9799375B2 (en) Method and device for adjusting playback progress of video file
RU2612362C1 (en) Method of recording, method of playback, device, terminal and system
EP4027238A1 (en) Card rendering method and electronic device
US20210398527A1 (en) Terminal screen projection control method and terminal
EP2242043A1 (en) Information processing apparatus with text display function, and data acquisition method
CN107734353B (en) Method and device for recording barrage video, readable storage medium and equipment
US11200899B2 (en) Voice processing method, apparatus and device
US11205431B2 (en) Method, apparatus and device for presenting state of voice interaction device, and storage medium
CN107885483B (en) Audio information verification method and device, storage medium and electronic equipment
CN107318038B (en) Method for synchronizing video playing and comment, terminal equipment and storage medium
EP2682931A1 (en) Method and apparatus for recording and playing user voice in mobile terminal
US10848835B2 (en) Video summary information playback device and method and video summary information providing server and method
US9807360B2 (en) Method and apparatus for reproducing content
CN104349173A (en) Video repeating method and device
CN111899859A (en) Surgical instrument counting method and device
CN111506747B (en) File analysis method, device, electronic equipment and storage medium
CN107483993B (en) Voice input method of television, television and computer readable storage medium
AU2018432003A1 (en) Video processing method and device, and terminal and storage medium
CN109147091A (en) Processing method, device, equipment and the storage medium of unmanned car data
CN112860361A (en) Method, device and storage medium for automatically selecting audio track and subtitle
US20140297285A1 (en) Automatic page content reading-aloud method and device thereof
US20200349190A1 (en) Interactive music on-demand method, device and terminal
CN114339325A (en) Multi-engine dynamic wallpaper playing method and device based on android system
CN109361940A (en) A kind of video playing control method, system and VR equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant