CN112653919B - Subtitle adding method and device - Google Patents

Subtitle adding method and device Download PDF

Info

Publication number
CN112653919B
CN112653919B CN202011536498.5A CN202011536498A CN112653919B CN 112653919 B CN112653919 B CN 112653919B CN 202011536498 A CN202011536498 A CN 202011536498A CN 112653919 B CN112653919 B CN 112653919B
Authority
CN
China
Prior art keywords
audio
video
information
subtitle
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011536498.5A
Other languages
Chinese (zh)
Other versions
CN112653919A (en
Inventor
刁弘锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202011536498.5A priority Critical patent/CN112653919B/en
Publication of CN112653919A publication Critical patent/CN112653919A/en
Application granted granted Critical
Publication of CN112653919B publication Critical patent/CN112653919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Studio Circuits (AREA)

Abstract

The application discloses a subtitle adding method and device, and belongs to the field of mobile communication. The method comprises the following steps: acquiring audio and video information of a target audio and video; the audio and video information comprises at least one of scene information and sound information; determining a target subtitle format corresponding to the audio and video information; according to the target subtitle format, manual operation of a user is not needed; and the subtitles added according to the target subtitle format are matched with the audio and video information, so that the subtitle adding effect is improved, and the personalized requirements of users are met. The method and the device for adding the subtitles solve the problem that in the prior art, the subtitles are added to the audio and video through the electronic equipment in a complex operation mode.

Description

Subtitle adding method and device
Technical Field
The application belongs to the field of mobile communication, and particularly relates to a subtitle adding method and device.
Background
With the rapid development of mobile communication technology, various mobile electronic devices and non-mobile electronic devices have become indispensable tools in various aspects of people's lives. The functions of various Applications (APPs) of electronic equipment are gradually improved, and the functions do not only play a role in communication, but also provide various intelligent services for users, so that great convenience is brought to the work and life of the users.
For playing or recording audio and video files, electronic devices mainly including smart phones have become main devices for playing or recording audio and video files. During the recording or playing process of the audio/video file, scenes needing to be added with subtitles often appear. Taking recording audio and video as an example, in the process of adding subtitles, a user is generally required to add and edit the subtitles by using other APPs after the recording is finished, and the operations of adding and editing the subtitles by using other APPs are more complicated and time-consuming.
Disclosure of Invention
The embodiment of the application aims to provide a subtitle adding method and device, and the method and device can solve the problem that in the prior art, the subtitle adding method for audio and video through electronic equipment is complex in operation.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides a subtitle adding method, where the method includes:
acquiring audio and video information of a target audio and video; the audio and video information comprises at least one of scene information and sound information;
determining a target subtitle format corresponding to the audio and video information;
and adding subtitles in the target audio and video according to the target subtitle format.
In a second aspect, an embodiment of the present application further provides a subtitle adding apparatus, where the subtitle adding apparatus includes:
the information acquisition module is used for acquiring the audio and video information of the target audio and video; the audio and video information comprises at least one of scene information and sound information;
the format determining module is used for determining a target subtitle format corresponding to the audio and video information;
and the caption adding module is used for adding captions in the target audio and video according to the target caption format.
In a third aspect, an embodiment of the present application further provides an electronic device, which includes a memory, a processor, and a program or an instruction stored in the memory and executable on the processor, where the processor implements the steps in the subtitle adding method as described above when executing the program or the instruction.
In a fourth aspect, embodiments of the present application further provide a readable storage medium, where a program or instructions are stored, and when the program or instructions are executed by a processor, the program or instructions implement the steps in the subtitle adding method as described above.
In a fifth aspect, the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method described above.
In the embodiment of the application, audio and video information of a target audio and video is acquired; the audio and video information comprises at least one of scene information and sound information; determining a target subtitle format corresponding to the audio and video information; according to the target subtitle format, subtitles are added in the target audio and video, so that the subtitles can be added quickly without manual operation of a user; and the subtitles added according to the target subtitle format are matched with the audio and video information, so that the subtitle adding effect is improved, and the personalized requirements of users are met.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flowchart illustrating a subtitle adding method according to an embodiment of the present application;
FIG. 2 illustrates a schematic diagram of a first example provided by an embodiment of the present application;
FIG. 3 shows a flow chart of a second example provided by an embodiment of the present application;
fig. 4 is a flowchart illustrating a third example provided by an embodiment of the present application;
FIG. 5 illustrates a schematic diagram of a third example provided by an embodiment of the present application;
fig. 6 is a block diagram of a subtitle adding apparatus according to an embodiment of the present application;
fig. 7 shows a block diagram of an electronic device provided by an embodiment of the application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/", and generally means that the former and latter related objects are in an "or" relationship.
The subtitle adding method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
Referring to fig. 1, an embodiment of the present application provides a caption adding method, which is optionally applicable to electronic devices including various handheld devices, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of Mobile Stations (MSs), terminal devices (Terminal devices), and the like.
The method comprises the following steps:
101, acquiring audio and video information of a target audio and video; the audio and video information comprises at least one of scene information and sound information.
Optionally, the target audio-video comprises at least one of a first audio-video and a second audio-video; the first audio and video is the audio and video recorded by the electronic equipment; the second audio and video is the audio and video received or played by the electronic equipment, and the electronic equipment is the electronic equipment applied by the subtitle adding method provided by the embodiment of the application; for convenience of description, in the embodiment of the present application, an audio/video recorded by an electronic device is taken as an example of a target audio/video. In the process of recording a target audio and video by the electronic equipment, the electronic equipment acquires audio and video information of the target audio and video, wherein the audio and video information comprises at least one of scene information and sound information; scene information, namely scenes in the target audio and video, such as cities, mountains and waters, gourmet food, stages and the like; sound information such as tone, pitch, volume, etc. For example, when the electronic device records audio and video, scene information is extracted from a video picture or audio, and sound information is extracted from audio.
And 102, determining a target subtitle format corresponding to the audio and video information.
After audio and video information of a target audio and video is obtained, a target subtitle format corresponding to the audio and video information is further determined; optionally, if the audio/video information includes one parameter, for example, only the scene information is included, the target subtitle format is a subtitle format matched with the scene information; if the audio/video information includes at least two parameters, for example, scene information, a tone in the sound information, and a tone in the sound information, the target subtitle format is the subtitle format with the highest degree of matching with all the parameters.
Optionally, the target subtitle format may be a subtitle format in a preset database, for example, a matching condition is set for each subtitle format in the preset database, and the matching condition is used for matching with the audio and video information; the target caption format can also be a user-defined caption format, such as a caption format preset by a user for certain scenes and certain sound information.
The subtitle format includes font, size, style, special effect display and other formats.
And 103, adding subtitles in the target audio and video according to the target subtitle format.
After the target subtitle format is determined, the electronic equipment adds subtitles to the audio and video according to the subtitle format; for example, when the electronic equipment records a target audio/video, the sound in the target audio/video is recognized, the sound is converted into characters, and the characters are added into the target audio/video according to a target subtitle format; or when the electronic equipment receives the target audio and video or plays the target audio and video, recognizing the sound in the target audio and video, converting the sound into characters, and adding the characters into the target audio and video according to a target subtitle format; therefore, the user does not need to add the subtitles manually, and the subtitles are automatically added.
In the embodiment of the application, audio and video information of a target audio and video is acquired; the audio and video information comprises at least one of scene information and sound information; determining a target subtitle format corresponding to the audio and video information; according to the target subtitle format, subtitles are added in the target audio and video, so that the subtitles can be added quickly without manual operation of a user; and the subtitles added according to the target subtitle format are matched with the audio and video information, so that the subtitle adding effect is improved, and the personalized requirements of users are met. The embodiment of the application solves the problem that in the prior art, the operation is complex in a mode of adding subtitles to audio and video through electronic equipment.
In an optional embodiment, if the audio/video information includes the scene information or the sound information, the determining a target subtitle format corresponding to the audio/video information includes:
and determining a target subtitle format corresponding to the audio and video information according to a preset corresponding relation.
The preset corresponding relation comprises a target subtitle format corresponding to each audio and video information; as a first example, referring to fig. 2, for example, the audio/video information only includes scene information, several types of scenes are predefined, as shown in a dashed line box S1, the scenes include cities, mountains and waters, gourmets, stages, and the like, and the user may preset subtitle formats, such as fonts, sizes, and styles, in different scenes; for example, for a certain scene, several photos corresponding to the scene are imported, and then the personalized font of the scene is set. The user may also customize the scenario as shown within the dashed box S2.
In addition, the corresponding relation can also be determined according to a big data algorithm, a target caption format corresponding to the audio and video information is determined according to a machine learning or deep learning algorithm, and the target caption format is recommended to the user, as shown in a dotted line frame S3, and a recommended scene comprises favorite, painting and the like.
And determining a target subtitle format corresponding to the audio and video information according to a preset corresponding relation so that the electronic equipment can quickly match the subtitle format for the target audio and video according to the audio and video information.
In an optional embodiment, in a case that the audio/video information includes the sound information, the correspondence includes a correspondence between a fourth parameter and a fifth parameter;
the fourth parameter is a parameter of the sound information, and the fifth parameter is a subtitle format.
For example, if the fourth parameter is a tone, the fifth parameter may be a font size of a subtitle; or the fourth adopted number is the tone, the fifth parameter is the font color of the caption; for example, the user records a sentence and then sets the font, size and special effect of the subtitle in the case of similar tone, pitch and volume to the sentence. For example, the size of the font can be controlled by the volume or the tone length, for example, if the user says "Wa \8230;" Wa where the tone is longer, the size of this "Wa" character can be larger than other characters of the same tone volume to highlight the effect.
The fifth parameter can also be the font or special effect of the subtitle, and the fourth parameter can be the tone or the timbre; for example, if the tone is a relatively deep male voice, the font may be a solemn formal font; if the tone is the male voice of the doll sound, the font can be a more mellow font, and some lovely special effects such as love, stars and the like are added; or the font corresponding to the sound with higher tone is thinner; or a sound with a lower pitch may correspond to a thicker font, etc.
Therefore, the subtitle format can be set in a personalized mode according to each specific parameter of the sound information, and the display effect of the subtitles is improved.
As a second example, referring to fig. 3, fig. 3 shows an example of applying the subtitle adding method provided by the embodiment of the present application, and mainly includes the following steps:
in step 301, a correspondence relationship is set in advance.
For example, the font style, size, and special effect of the subtitles are preset by the user before recording, and the user can perform personalized setting according to the scene or sound (tone, volume).
Specifically, taking personalized setting of a subtitle format according to scenes as an example, for example, dividing several large classes of predefined scenes, such as cities, mountains and waters, gourmet food, stages and the like, a user can set fonts, sizes and styles which like and answer to scenes for the subtitles in different scenes; or self-defining a scene, wherein a user can import several photos corresponding to the scene and then set the personalized fonts of the scene;
and scenes frequently set by other users can be obtained according to the big data recommendation scenes, the scenes are recommended to the users, and the users select whether to set personalized fonts for the scenes or not. For example, many users like to record the video for own favorite that sprouts, if there are many users customized to sprout the pet scene and set up individualized typeface, and this user has also watched or recorded the video that sprouts the pet, can set up the scene that the user set up at scene setting interface recommendation user setting sprouts the pet scene.
In addition, the personalized setting of the subtitle format is carried out according to the sound information.
(2) When the user inputs the voice for personalized setting, the priority of the tone, the tone and the volume is sequentially changed from high to low. When a user records a sentence, the user can first select the display form corresponding to the tone, then further set the display form corresponding to the tone, and finally continue to set the display forms corresponding to the tone, the tone and the volume. And matching is also performed in sequence according to the priority order of tone, tone and volume during matching.
And step 302, recording a target audio and video and acquiring audio and video information.
When a user records a video, the electronic equipment detects the video scene, the tone of the user speaking, the tone, the volume and other sound information.
Step 303, the electronic device determines a target subtitle format corresponding to the audio and video information.
And the electronic equipment calculates the similarity with the preset conditions of the subtitle format according to the detection result of the audio and video information, judges which preset setting is used, and then uses the setting in the immediately generated subtitle.
And step 304, after the audio and video recording is finished, the user can edit and adjust the subtitles.
The user can modify wrongly written characters, adjust punctuation marks, further switch styles, adjust font sizes, add more special effects and the like.
And 305, saving the target audio and video.
In an optional embodiment, if the audio/video information includes the scene information and the sound information, the determining a target subtitle format corresponding to the audio/video information includes:
and acquiring a target subtitle format with the highest matching degree with the audio and video information in a preset database.
The preset database comprises subtitle formats and application condition information corresponding to each subtitle format, wherein the application condition information is a condition for applying the subtitle formats; and the electronic equipment acquires a subtitle format with the highest matching degree with the audio and video information from a preset database as a target subtitle format, so that the matching degree of the subtitle and the target audio and video is improved.
In an optional embodiment, if the audio/video information includes at least three first parameters, the first parameters are the scene information or the sound information; for example, the scene information includes a first parameter, and the sound information includes two first parameters;
the acquiring of the target subtitle format with the highest matching degree with the audio and video information in the preset database comprises the following steps:
acquiring application condition information corresponding to a subtitle format in a preset database; wherein the application condition information comprises at least three second parameters respectively corresponding to the first parameters;
determining a first similarity of the first parameter and the second parameter; for example, the first parameter includes a first scene in the audio/video information, the second parameter includes a second scene in the application condition, and a first similarity between the first scene and the second scene is calculated;
determining a second similarity between the application condition information and the audio and video information according to the first similarity; after the first similarity of each group of parameters of the application condition information is determined, calculating a second similarity, for example, calculating the second similarity by weighting and summing each first similarity;
determining the matching degree of the subtitle format and the audio and video information according to the second similarity, and taking the subtitle format with the highest matching degree as a target subtitle format; optionally, the subtitle format in the preset database may include one or at least two pieces of application condition information, and if the subtitle format includes at least two pieces of application condition information, the second similarities of all pieces of application condition information of the subtitle format may be summed to calculate the matching degree.
Specifically, referring to fig. 4, as a third example, fig. 4 shows an example to which the subtitle adding method provided by the embodiment of the present application is applied, and mainly includes the following steps:
step 401, recording a target audio and video, and acquiring audio and video information.
Referring to fig. 5, in step 501, when a user records audio and video, the electronic device detects scene information of a target audio and video, and sound information such as a tone, a tone color, a volume, and the like of a user speaking.
And 402, determining a target subtitle format which is most matched with the audio and video information.
In step 502 in fig. 5, the electronic device searches, according to the audio/video information, the application condition information corresponding to the subtitle format from the preset database for the application condition that is most similar to the audio/video information, obtains the preset setting that the application condition is most popular, most common, and most likely to be selected, and then uses the setting in the subtitle that is generated at that time. For example, the subtitle format matched with the audio/video information includes a preset subtitle format 1 and a preset subtitle format 2; the preset subtitle format 1 includes 2 application conditions, the first similarity with the audio/video information is respectively as shown in the figure, and the application conditions 1: the first similarity based on the scene is 90%, the first similarity based on the pitch is 70%, the first similarity based on the timbre is 80%, \8230;, application condition 3: the first similarity based on the scene is 70 percent, 8230, 8230; the application conditions of the preset subtitle format 2 include 1, and the application conditions of 2: the first similarity based on the scene is 80%, the first similarity based on the tone is 60%, and the first similarity based on the tone is 90%, \8230; step 503, calculating a second similarity, i.e. the weight in fig. 5, based on the first similarity of each application condition; calculating the matching degree based on the weight of all the application conditions of each preset caption format, namely:
the score (i.e., matching degree) of subtitle format 1 is preset = weight 1+ weight 3;
the score of the preset subtitle format 2 = weight 2;
and (3) score sorting: the preset caption format 1 is larger than the preset caption format 2. Presetting a subtitle format 1 as a target subtitle format; in step 504, a preset subtitle format 1 is applied to the target audio/video.
For example, when a user records a video for a baby playing on a beach and speaks to interact with the baby while recording the video, if the user does not preset the scene or the similar scene, the user searches from a preset database to search preset data of a scene, tone, timbre and volume similar to the current video, for example, the baby playing on the beach is a similar scene 1, the baby playing on a yellow tile is a similar scene 2, the baby playing with a little partner on the beach is a similar scene 3, and the like; the tone 1 of the young woman is similar tone 1, the tone 2 of the young woman is similar tone 2 and the like, the similar scene, tone and volume are calculated according to an algorithm to be similar to the current video, the similarity is used as a weight, the score which is possibly selected for each preset font is calculated according to the weight, and the preset font with the highest score is applied to the current video.
Step 403, after the video recording is finished, the user can edit and adjust the subtitles, and during editing, the camera can also recommend several most popular preset settings under the preset conditions.
The user can modify wrongly written characters and adjust punctuation marks, select one of other most popular preset settings to switch by one key, further switch styles, adjust font sizes, add more special effects and the like. The final setting results of the user are also added into the large database. Further enriching the large database.
Step 404, saving the video with the subtitles in the target subtitle format.
In the embodiment of the application, audio and video information of a target audio and video is acquired; the audio and video information comprises at least one of scene information and sound information; determining a target subtitle format corresponding to the audio and video information; according to the target subtitle format, subtitles are added into the target audio and video, so that the subtitles are added quickly without manual operation of a user; and the subtitles added according to the target subtitle format are matched with the audio and video information, so that the subtitle adding effect is improved, and the personalized requirements of users are met.
With the foregoing description of the subtitle adding method according to the embodiment of the present application, a subtitle adding apparatus according to the embodiment of the present application will be described below with reference to the accompanying drawings.
It should be noted that, in the subtitle adding method provided in the embodiment of the present application, the execution main body may be a subtitle adding apparatus, or a control module in the subtitle adding apparatus for executing the subtitle adding method. In the embodiment of the present application, a subtitle adding apparatus executes a subtitle adding method as an example, and the subtitle adding method provided in the embodiment of the present application is described.
Referring to fig. 6, an embodiment of the present application further provides a subtitle adding apparatus 600, including:
the information acquisition module 601 is used for acquiring audio and video information of a target audio and video; the audio and video information comprises at least one of scene information and sound information.
Optionally, the target audio-video comprises at least one of a first audio-video and a second audio-video; the first audio and video is the audio and video recorded by the electronic equipment; the second audio and video is the audio and video received or played by the electronic equipment; for convenience of description, in the embodiment of the present application, an audio/video recorded by an electronic device is taken as an example of a target audio/video. In the process of recording a target audio and video by the electronic equipment, the electronic equipment acquires audio and video information of the target audio and video, wherein the audio and video information comprises at least one of scene information and sound information; scene information, i.e. scenes in a target audio/video, such as cities, mountains and waters, gourmets, stages and the like; sound information such as tone, pitch, volume, etc. For example, when the electronic device records audio and video, scene information is extracted from a video picture or audio, and sound information is extracted from the audio.
And a format determining module 602, configured to determine a target subtitle format corresponding to the audio and video information.
After audio and video information of a target audio and video is obtained, a target subtitle format corresponding to the audio and video information is further determined; optionally, if the audio/video information includes one parameter, for example, only includes scene information, the target subtitle format is a subtitle format matched with the scene information; if the audio/video information includes at least two parameters, for example, scene information, a tone in the sound information, and a tone in the sound information, the target subtitle format is the subtitle format that has the highest matching degree with all the parameters.
Optionally, the target subtitle format may be a subtitle format in a preset database, for example, a matching condition is set for each subtitle format in the preset database, and the matching condition is used for matching with the audio and video information; the target caption format may also be a user-defined caption format, such as a user-preset caption format for certain scenes and certain sound information.
The subtitle format includes font, size, style, special effect display and other formats.
And a subtitle adding module 603, configured to add subtitles in the target audio and video according to the target subtitle format.
After the target subtitle format is determined, the electronic equipment adds subtitles to the audio and video according to the subtitle format; for example, when the electronic equipment records a target audio/video, the sound in the target audio/video is recognized, the sound is converted into characters, and the characters are added into the target audio/video according to a target subtitle format; or when the electronic equipment receives the target audio and video or plays the target audio and video, recognizing the sound in the target audio and video, converting the sound into characters, and adding the characters into the target audio and video according to a target subtitle format; therefore, the user does not need to add the subtitles manually, and the subtitles are automatically added.
Optionally, in this embodiment of the present application, the format determining module 602 includes:
and the second determining submodule is used for acquiring a target subtitle format which is in a preset database and has the highest matching degree with the audio and video information if the audio and video information comprises the scene information and the sound information.
Optionally, in this embodiment of the application, if the audio/video information includes at least three first parameters, the first parameters are the scene information or the sound information;
the second determination submodule is configured to:
acquiring application condition information corresponding to a subtitle format in a preset database; the application condition information comprises at least three second parameters respectively corresponding to the first parameters;
determining a first similarity of the first parameter and the second parameter;
determining a second similarity between the application condition information and the audio and video information according to the first similarity;
and determining the matching degree of the subtitle format and the audio and video information according to the second similarity, and taking the subtitle format with the highest matching degree as a target subtitle format.
Optionally, in an embodiment of the present application, optionally, in the embodiment of the present application, the format determining module 602 includes:
and the first determining sub-module is used for determining a target subtitle format corresponding to the audio and video information according to a preset corresponding relation if the audio and video information comprises the scene information or the sound information.
Under the condition that the audio and video information comprises the sound information, the corresponding relation comprises a corresponding relation between a fourth parameter and a fifth parameter;
the fourth parameter is a parameter of the sound information, and the fifth parameter is a subtitle format.
Optionally, in an embodiment of the present application, the target audio/video includes at least one of a first audio/video and a second audio/video;
the first audio and video is the audio and video recorded by the electronic equipment;
the second audio and video is the audio and video received or played by the electronic equipment.
In the embodiment of the application, the information acquisition module 601 acquires audio and video information of a target audio and video; the audio and video information comprises at least one of scene information and sound information; the format determining module 602 determines a target subtitle format corresponding to the audio and video information; the subtitle adding module 603 adds subtitles in the target audio and video according to the target subtitle format, so that the subtitles can be added quickly without manual operation of a user; and the subtitles added according to the target subtitle format are matched with the audio and video information, so that the subtitle adding effect is improved, and the personalized requirements of users are met.
The subtitle adding apparatus in the embodiment of the present application may be an apparatus, and may also be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the Mobile electronic device may be a Mobile phone, a tablet Computer, a notebook Computer, a palm top Computer, an in-vehicle electronic device, a wearable device, an Ultra-Mobile Personal Computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-Mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (Personal Computer, PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The subtitle adding apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The subtitle adding device provided in the embodiment of the present application can implement each process implemented by the subtitle adding device in the method embodiments of fig. 1 to fig. 5, and is not described herein again to avoid repetition.
Optionally, an electronic device is further provided in an embodiment of the present application, and includes a processor 710, a memory 709, and a program or an instruction that is stored in the memory 709 and is executable on the processor 710, where the program or the instruction is executed by the processor 710 to implement each process of the foregoing subtitle adding method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.
It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.
Fig. 7 is a schematic hardware structure diagram of an electronic device 700 for implementing various embodiments of the present application;
the electronic device 700 includes, but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, a processor 710, and a power supply 711.
Those skilled in the art will appreciate that the electronic device 700 may also include a power supply (e.g., a battery) for powering the various components, and the power supply may be logically coupled to the processor 710 via a power management system, such that the functions of managing charging, discharging, and power consumption may be performed via the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
The processor 710 is configured to obtain audio and video information of a target audio and video; the audio and video information comprises at least one of scene information and sound information;
determining a target subtitle format corresponding to the audio and video information;
and adding subtitles in the target audio and video according to the target subtitle format.
Optionally, the processor 710 is configured to:
and determining a target subtitle format corresponding to the audio and video information according to a preset corresponding relation.
Optionally, if the audio/video information includes at least three first parameters, the first parameters are the scene information or the sound information;
a processor 710 configured to:
acquiring application condition information corresponding to a subtitle format in a preset database; the application condition information comprises at least three second parameters respectively corresponding to the first parameters;
determining a first similarity of the first parameter and the second parameter;
determining a second similarity between the application condition information and the audio and video information according to the first similarity;
and determining the matching degree of the subtitle format and the audio and video information according to the second similarity, and taking the subtitle format with the highest matching degree as a target subtitle format.
Optionally, the processor 710 is configured to:
acquiring a target subtitle format with the highest matching degree with the audio and video information in a preset database;
wherein, under the condition that the audio and video information comprises the sound information, the corresponding relationship comprises the corresponding relationship between a fourth parameter and a fifth parameter;
the fourth parameter is a parameter of the sound information, and the fifth parameter is a subtitle format.
Optionally, the processor 710 is configured to: the target audio/video comprises at least one of a first audio/video and a second audio/video;
the first audio and video is the audio and video recorded by the electronic equipment;
the second audio and video is the audio and video received or played by the electronic equipment. In the embodiment of the application, audio and video information of a target audio and video is acquired; the audio and video information comprises at least one of scene information and sound information; determining a target subtitle format corresponding to the audio and video information; according to the target subtitle format, subtitles are added in the target audio and video, so that the subtitles can be added quickly without manual operation of a user; and the subtitles added according to the target subtitle format are matched with the audio and video information, so that the subtitle adding effect is improved, and the personalized requirements of users are met. .
It should be understood that, in the embodiment of the present application, the input Unit 704 may include a Graphics Processing Unit (GPU) 7041 and a microphone 7042, and the Graphics processor 7041 processes image data of a still picture or a video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 706 may include a display panel 7061, and the display panel 7061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 707 includes a touch panel 7071 and other input devices 7072. The touch panel 7071 is also referred to as a touch screen. The touch panel 7071 may include two portions, a touch detection device and a touch controller. Other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Memory 709 may be used to store software programs as well as various data, including but not limited to applications and operating systems. Processor 710 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 710.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the foregoing subtitle adding method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the foregoing subtitle adding method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as a system-on-chip, or a system-on-chip.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A subtitle adding method, the method comprising:
acquiring audio and video information of a target audio and video; the audio and video information comprises at least one of scene information and sound information;
determining a target subtitle format corresponding to the audio and video information;
adding subtitles in the target audio and video according to the target subtitle format;
if the audio and video information comprises the scene information and the sound information, the determining of the target subtitle format corresponding to the audio and video information comprises the following steps:
acquiring a target subtitle format with the highest matching degree with the audio and video information in a preset database;
if the audio and video information comprises at least three first parameters, the first parameters are the scene information or the sound information;
the acquiring of the target subtitle format with the highest matching degree with the audio and video information in the preset database comprises the following steps:
acquiring application condition information corresponding to a subtitle format in a preset database; the application condition information comprises at least three second parameters respectively corresponding to the first parameters;
determining a first similarity of the first parameter and the second parameter;
determining a second similarity between the application condition information and the audio and video information according to the first similarity;
and determining the matching degree of the subtitle format and the audio and video information according to the second similarity, and taking the subtitle format with the highest matching degree as a target subtitle format.
2. The subtitle adding method according to claim 1, wherein if the audio/video information includes the scene information or the sound information, the determining a target subtitle format corresponding to the audio/video information includes:
determining a target subtitle format corresponding to the audio and video information according to a preset corresponding relation;
wherein, under the condition that the audio and video information comprises the sound information, the corresponding relationship comprises the corresponding relationship between a fourth parameter and a fifth parameter;
the fourth parameter is a parameter of the sound information, and the fifth parameter is a subtitle format.
3. The subtitle adding method according to claim 1, wherein the target audio-video includes at least one of a first audio-video and a second audio-video;
the first audio and video is the audio and video recorded by the electronic equipment;
the second audio and video is the audio and video received or played by the electronic equipment.
4. A subtitle adding apparatus, comprising:
the information acquisition module is used for acquiring the audio and video information of the target audio and video; the audio and video information comprises at least one of scene information and sound information;
the format determining module is used for determining a target subtitle format corresponding to the audio and video information;
the caption adding module is used for adding captions in the target audio and video according to the target caption format;
the format determination module includes:
the second determining submodule is used for acquiring a target subtitle format which is in a preset database and has the highest matching degree with the audio and video information if the audio and video information comprises the scene information and the sound information;
if the audio and video information comprises at least three first parameters, the first parameters are the scene information or the sound information;
the second determination submodule is configured to:
acquiring application condition information corresponding to a subtitle format in a preset database; wherein the application condition information comprises at least three second parameters respectively corresponding to the first parameters;
determining a first similarity of the first parameter and the second parameter;
determining a second similarity between the application condition information and the audio and video information according to the first similarity;
and determining the matching degree of the subtitle format and the audio and video information according to the second similarity, and taking the subtitle format with the highest matching degree as a target subtitle format.
5. The subtitle adding apparatus according to claim 4, wherein the format determining module includes:
the first determining submodule is used for determining a target subtitle format corresponding to the audio and video information according to a preset corresponding relation if the audio and video information comprises the scene information or the sound information;
wherein, under the condition that the audio and video information comprises the sound information, the corresponding relationship comprises the corresponding relationship between a fourth parameter and a fifth parameter;
the fourth parameter is a parameter of the sound information, and the fifth parameter is a subtitle format.
6. The subtitle adding apparatus according to claim 4, wherein the target audio-video includes at least one of a first audio-video and a second audio-video;
the first audio and video is the audio and video recorded by the electronic equipment;
the second audio and video is the audio and video received or played by the electronic equipment.
7. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, which when executed by the processor, implement the steps of the subtitle adding method according to any one of claims 1 to 3.
8. A readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the subtitle adding method according to any one of claims 1 to 3.
CN202011536498.5A 2020-12-22 2020-12-22 Subtitle adding method and device Active CN112653919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011536498.5A CN112653919B (en) 2020-12-22 2020-12-22 Subtitle adding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011536498.5A CN112653919B (en) 2020-12-22 2020-12-22 Subtitle adding method and device

Publications (2)

Publication Number Publication Date
CN112653919A CN112653919A (en) 2021-04-13
CN112653919B true CN112653919B (en) 2023-03-14

Family

ID=75359467

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011536498.5A Active CN112653919B (en) 2020-12-22 2020-12-22 Subtitle adding method and device

Country Status (1)

Country Link
CN (1) CN112653919B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596557B (en) * 2021-07-08 2023-03-21 大连三通科技发展有限公司 Video generation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109462768A (en) * 2018-10-25 2019-03-12 维沃移动通信有限公司 A kind of caption presentation method and terminal device
CN110198468A (en) * 2019-05-15 2019-09-03 北京奇艺世纪科技有限公司 A kind of video caption display methods, device and electronic equipment
CN110798636A (en) * 2019-10-18 2020-02-14 腾讯数码(天津)有限公司 Subtitle generating method and device and electronic equipment
CN111491184A (en) * 2019-01-25 2020-08-04 北京右划网络科技有限公司 Method and device for generating situational subtitles, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109462768A (en) * 2018-10-25 2019-03-12 维沃移动通信有限公司 A kind of caption presentation method and terminal device
CN111491184A (en) * 2019-01-25 2020-08-04 北京右划网络科技有限公司 Method and device for generating situational subtitles, electronic equipment and storage medium
CN110198468A (en) * 2019-05-15 2019-09-03 北京奇艺世纪科技有限公司 A kind of video caption display methods, device and electronic equipment
CN110798636A (en) * 2019-10-18 2020-02-14 腾讯数码(天津)有限公司 Subtitle generating method and device and electronic equipment

Also Published As

Publication number Publication date
CN112653919A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN109819313B (en) Video processing method, device and storage medium
TWI720062B (en) Voice input method, device and terminal equipment
CN112367551A (en) Video editing method and device, electronic equipment and readable storage medium
CN107155121B (en) Voice control text display method and device
CN111757175A (en) Video processing method and device
CN108345581A (en) A kind of information identifying method, device and terminal device
CN111491123A (en) Video background processing method and device and electronic equipment
CN112035042A (en) Application program control method and device, electronic equipment and readable storage medium
CN112416229A (en) Audio content adjusting method and device and electronic equipment
CN112269522A (en) Image processing method, image processing device, electronic equipment and readable storage medium
CN111553138B (en) Auxiliary writing method and device for standardizing content structure document
CN112653919B (en) Subtitle adding method and device
CN107885482A (en) Audio frequency playing method, device, storage medium and electronic equipment
CN112286617A (en) Operation guidance method and device and electronic equipment
CN112261321B (en) Subtitle processing method and device and electronic equipment
CN113593614B (en) Image processing method and device
CN113362426B (en) Image editing method and image editing device
CN112887792B (en) Video processing method, device, electronic equipment and storage medium
CN113055529B (en) Recording control method and recording control device
CN114125149A (en) Video playing method, device, system, electronic equipment and storage medium
CN112578965A (en) Processing method and device and electronic equipment
CN113283220A (en) Note recording method, device and equipment and readable storage medium
CN108073294A (en) A kind of intelligent word method and apparatus, a kind of device for intelligent word
CN113157966A (en) Display method and device and electronic equipment
CN112487247A (en) Video processing method and video processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant