CN113794927A - Information display method and device and electronic equipment - Google Patents

Information display method and device and electronic equipment Download PDF

Info

Publication number
CN113794927A
CN113794927A CN202110925397.5A CN202110925397A CN113794927A CN 113794927 A CN113794927 A CN 113794927A CN 202110925397 A CN202110925397 A CN 202110925397A CN 113794927 A CN113794927 A CN 113794927A
Authority
CN
China
Prior art keywords
information
character
video
pieces
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110925397.5A
Other languages
Chinese (zh)
Inventor
刘佳妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202110925397.5A priority Critical patent/CN113794927A/en
Publication of CN113794927A publication Critical patent/CN113794927A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • H04N21/4355Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reformatting operations of additional data, e.g. HTML pages on a television screen
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Abstract

The application discloses an information display method and device and electronic equipment, belongs to the technical field of communication, and can solve the problem that the effect of displaying subtitles of the electronic equipment is poor. The method comprises the following steps: acquiring M text messages and M character roles included in a first video segment, wherein one text message is information expressed by one character role in the first video segment, and M is an integer greater than 1; determining a display mode of each text message corresponding to each character role according to the emotion information of each character role, wherein the emotion information is determined according to the first information related to each character role; and in the process of playing the video picture corresponding to each character in the first video segment, displaying each text message according to the display mode of each text message. The method is applied to scenes displayed by subtitles.

Description

Information display method and device and electronic equipment
Technical Field
The application belongs to the technical field of communication, and particularly relates to an information display method and device and electronic equipment.
Background
With the development of communication technology, video occupies higher and higher proportion in the daily life of users. Generally, for most video resources, whether subtitles are generated through voice recognition or manually played subtitles, the subtitles are intensively displayed in a lower area of a screen.
However, when a user with hearing impairment watches a video, the user cannot hear what the person in the video speaks, and therefore even if the user sees a caption displayed in the lower area of the screen, the emotional color of the speaker cannot be understood, thereby causing an impairment in the understanding of the video by the user. As such, the electronic device may display subtitles (i.e., text information expressed by the speaker during the video playback) with poor performance.
Disclosure of Invention
The embodiment of the application aims to provide an information display method, an information display device and electronic equipment, and can solve the problem that the effect of displaying subtitles of the electronic equipment is poor.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides an information display method, including: acquiring M text messages and M character roles included in a first video segment, wherein one text message is information expressed by one character role in the first video segment, and M is an integer greater than 1; determining a display mode of each text message corresponding to each character role according to the emotion information of each character role, wherein the emotion information is determined according to the first information related to each character role; displaying each text message according to the display mode of each text message in the process of playing the video picture corresponding to each character in the first video segment; wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information.
In a second aspect, an embodiment of the present application provides an information display apparatus, including: the device comprises an acquisition module, a determination module and a display module. The acquisition module acquires M pieces of text information and M character roles included in the first video segment, wherein one piece of text information is information expressed by one character role in the first video segment, and M is an integer greater than 1. And the determining module is used for determining the display mode of each text message corresponding to each character role according to the emotion information of each character role, and the emotion information is determined according to the first information related to each character role. The display module is used for displaying each text message according to the display mode of each text message determined by the determination module in the process of playing the video picture corresponding to each character in the first video segment; wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information.
In a third aspect, embodiments of the present application provide an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, where the program or instructions, when executed by the processor, implement the steps of the method as in the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method as in the first aspect described above.
In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method as in the first aspect.
In the embodiment of the application, M pieces of text information and M characters included in a first video segment can be acquired, where one piece of text information is information expressed by one character in the first video segment, and M is an integer greater than 1; determining a display mode of each text message corresponding to each character role according to the emotion information of each character role, wherein the emotion information is determined according to the first information related to each character role; displaying each text message according to the display mode of each text message in the process of playing the video picture corresponding to each character in the first video segment; wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information. According to the scheme, after M pieces of text information and M pieces of character roles included in the video segment are acquired, the display mode of each piece of text information corresponding to each character role can be determined according to the emotion information of each character role, so that each piece of text information can be synchronously displayed according to the display mode of each piece of text information in the process of playing the video picture corresponding to each character role in the video segment, a user can know the emotion of each character role when the corresponding piece of text information is expressed according to the display mode of each piece of text information, and the user can conveniently understand the video when the user watches the video. Thus, compared with the prior art, the problem that a special crowd (such as a person with hearing impairment) cannot hear the words of the characters in the video, so that the comprehension of the video is obstructed is avoided, namely, the effect of displaying information (such as subtitles) by the electronic equipment is improved.
Drawings
Fig. 1 is a schematic diagram of an information display method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of an information display manner provided in an embodiment of the present application;
fig. 3 is a schematic view of an interface for displaying information according to an embodiment of the present disclosure;
fig. 4 is a second schematic view of an information display interface according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an information display device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a hardware schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," should not be construed as advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In the description of the embodiments of the present application, unless otherwise specified, "a plurality" means two or more, for example, a plurality of elements means two or more elements, and the like.
The embodiment of the application provides an information display method, an information display device and electronic equipment, wherein M pieces of text information and M pieces of character roles included in a first video segment can be acquired, one piece of text information is information expressed by one character role in the first video segment, and M is an integer greater than 1; determining a display mode of each text message corresponding to each character role according to the emotion information of each character role, wherein the emotion information is determined according to the first information related to each character role; displaying each text message according to the display mode of each text message in the process of playing the video picture corresponding to each character in the first video segment; wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information. By the scheme, after M pieces of text information and M pieces of character colors of the person included in the video segment are acquired. Because the display mode of each text message corresponding to each character role can be determined according to the emotion information of each character role, each text message can be synchronously displayed according to the display mode of each text message in the process of playing the video picture corresponding to each character role in the video segment, so that a user can know the emotion of each character role in expressing the corresponding text message according to the display mode of each text message, and the user can conveniently understand the video when watching the video. Thus, compared with the prior art, the problem that a special crowd (such as a person with hearing impairment) cannot hear the words of the characters in the video, so that the comprehension of the video is obstructed is avoided, namely, the effect of displaying information (such as subtitles) by the electronic equipment is improved.
The information display method, the information display device, and the electronic device provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present application provides an information display method, which includes the following steps S101 to S103.
S101, the information display device acquires M text information and M characters included in the first video segment.
Wherein, a text message is a message expressed by a character in the first video segment, and M is an integer greater than 1. M personalities are determined from a first video picture comprised by a first video segment.
Optionally, the embodiment of the present application may be applied to scenes in which the first video segment is played in a cache manner, that is, videos including the first video segment are to be played, and the like, that is, in these scenes, M pieces of text information and M pieces of character roles included in the first video segment may be acquired.
Optionally, the first video segment may be a complete video resource in the electronic device, or a certain video segment in any video resource, which may be determined specifically according to an actual use condition, and this is not limited in this embodiment of the application.
Optionally, in a case that the first video segment does not include a speech segment, the M pieces of text information are obtained before dubbing the first video segment, and the M pieces of text information may be stored in the electronic device or the server, so that the M pieces of text information may be obtained from a local storage space of the electronic device or the server; in a case that the first video segment includes a voice segment, the M pieces of text information are determined based on the voice segment, and specific reference may be made to detailed descriptions in the following embodiments, which are not described herein again in this embodiment of the present application.
Optionally, the first video picture corresponds to image data included in the first video segment. The first video frame may be from animate and inanimate objects, e.g., flowers, birds, sea; for example, children, the elderly, etc.
Optionally, the "determining M character characters according to the first video picture included in the first video segment" means: when the facial feature information is recognized in the first video picture, the M personalities included in the first video segment may be determined from different facial feature information included in the first video picture. Wherein, different personas correspond to different face feature information.
Further, when the face feature information is not identified in the first video picture, if the first video segment further includes the first voice frequency segment, the M character roles included in the first video segment may be determined according to the voiceprint information of the first voice segment, which may specifically refer to the detailed description in the following embodiments, which is not described herein again.
Optionally, the first video segment further includes a first voice segment, and a playing time stamp of the first voice segment matches a playing time stamp of the first video frame. The S101 may specifically include S101A and S101B described below.
S101A, the information display device performs speech recognition on the first speech segment to obtain M characters.
Optionally, the first voice segment corresponds to audio data included in the first video segment. Further, in the embodiment of the present application, the first voice fragment is from a living body, for example, an old person, a child, a man, a woman, and the like.
Optionally, in this embodiment of the present application, the playing time stamp refers to a start time and an end time of playing a segment of voice or a video frame. Specifically, the time when the voice starts and the time when the voice ends in a video segment can be determined through a voice endpoint detection technology, that is, the time when the voice starts and the time when the voice ends are the playing time stamps.
It should be noted that, in this embodiment of the present application, matching the playing time stamp of the first voice segment with the playing time stamp of the first video frame means: the playing time stamp of the first voice segment is the same as the playing time stamp of the first video picture. That is, the playing timestamp of the first audio frame corresponding to the first voice clip is the same as the playing timestamp of the first video frame corresponding to the first video frame, and the playing timestamp of the last audio frame corresponding to the first voice clip is the same as the playing timestamp of the last video frame corresponding to the first video frame.
Specifically, the above-described S101A can be specifically realized by the following S101a1 and S101a 2; alternatively, this is achieved by S101a1 and S101 A3. I.e., S101a2 and S101A3 are executed alternatively.
S101a1, the information display device determines M pieces of voiceprint information included in the first speech fragment.
Wherein one voiceprint information is used to indicate one persona.
Optionally, the voiceprint information may include at least one of: timbre, pitch, rhythm, intensity, etc. Wherein, tone color, pitch, rhythm and intensity are all used for indicating the characteristics of sound,
it should be noted that, the tone, pitch, rhythm and intensity of different sound producing bodies may be different. The character role included in the first speech segment can thus be determined by the voiceprint information.
S101a2, in a case where the N preset voiceprint information includes the M voiceprint information, the information display device selects, as the M human characters, a preset human character corresponding to the M voiceprint information from the N preset human characters according to the M voiceprint information.
Wherein, a preset voiceprint information corresponds to a preset persona.
It should be noted that, because N pieces of preset voiceprint information and N pieces of preset personas are stored in the electronic device, and one piece of preset voiceprint information corresponds to one piece of preset persona, when the N pieces of preset voiceprint information include M pieces of voiceprint information, it is described that M personal personas corresponding to the M pieces of voiceprint information have been spoken, that is, the M personas are personas that have already appeared in the first voice segment, so that the preset persona corresponding to the M pieces of voiceprint information can be selected from the N pieces of preset personas to serve as the M personas.
S101a3, in a case where the N preset voiceprint information does not include the M voiceprint information, the information display apparatus creates M human characters based on the M voiceprint information.
Wherein one character role corresponds to one voiceprint information.
It should be noted that, because N pieces of preset voiceprint information and N pieces of preset personas are stored in the electronic device in advance, and one piece of preset voiceprint information corresponds to one piece of preset persona, when M pieces of voiceprint information are not included in the N pieces of preset voiceprint information, it is described that M personal personas corresponding to the M pieces of voiceprint information have not been spoken, that is, the M personas are personas that have not appeared in the first voice segment, so that M personas can be created according to the M pieces of voiceprint information, that is, the M personas are new personas.
Optionally, after S101a3, the information display method provided in the embodiment of the present application may further include: the information display device stores M pieces of voiceprint information. In this way, when voiceprint information matching the M pieces of voiceprint information reappears after the voiceprint information stored in the database is updated, it is convenient to confirm the character corresponding to the voiceprint information.
It is to be understood that, since different personas have different voiceprint information, M personas included in the first video segment can be determined by the M voiceprint information included in the first voice segment, thereby facilitating further obtaining of the M text information.
S101B, the information display device divides the first speech segment according to the M characters to obtain M pieces of text information.
Wherein one text message corresponds to one of the M personas.
Specifically, because the voiceprint information of each of the M human characters is different, in the process of performing voice recognition on the first voice segment, the first voice segment can be divided into M voice segments according to the recognized different voiceprint information, and the voice information included in each voice segment is converted into text information, that is, M pieces of text information are obtained.
It can be understood that, since M characters included in the first speech segment are obtained by performing speech recognition on the first speech segment, and the first speech segment is divided according to the M characters to obtain M pieces of text information, text information corresponding to each character, that is, information expressed by each character, can be determined.
S102, the information display device determines the display mode of each text message corresponding to each character according to the emotion information of each character.
Wherein the emotional information is determined based on the first information associated with each character.
Optionally, the emotion information is used to reflect the emotion of each character. Further, the display manner is used to indicate the emotion of each character in expressing the corresponding text information.
Illustratively, the mood information may include at least one of: happy, angry, sad, happy, etc.
It should be noted that, because the emotion information of each character determines the display mode of each text message corresponding to each character, when the emotion information of each character is different, the display mode of each text message corresponding to each character is also different; when the emotion information of any of the characters is the same, each text information corresponding to the emotion information of each of the characters is displayed in the same manner.
Optionally, the display mode may include at least one of: display color, display shape, display area, display size, and the like. The method is determined according to actual use conditions, and the method is not limited in the embodiment of the application.
For example, when the emotion information of one character is normally stated, the text information is displayed in the following manner: a floating window display shape as shown in fig. 2 (a); when the emotion information of a character is happy, the text information is displayed in the following manner: a floating window display shape as shown in (b) in fig. 2; when the emotional information of a character is surprised or excited, the text information is displayed in the following mode: a floating window display shape as shown in (c) in fig. 2; when the emotion information of one character is a bystander statement, the display mode of the text information is as follows: the floating window display shape shown in fig. 2 (d).
Optionally, after the step S101 and before the step S102, the information display method provided in the embodiment of the present application may further include: the information display device analyzes the first information related to each character to obtain emotion information of each character. The first information includes at least one of: voice feature information and face feature information.
Optionally, the voice feature information may include at least one of: the semantic meaning of the voice, the tone of the voice and the language word in the voice.
Exemplarily, taking the speech feature information as the semantic of the speech as an example, after performing speech recognition on the first speech segment, text information is obtained, and then some key information in the text information is extracted and recognized, so as to obtain the meaning of the text information expressed by the first speech segment, that is, the semantic of the speech. It can be understood that, in the case of voices with different semantics, the embodied characters have different emotions when the voices are expressed.
Illustratively, taking the voice feature information as the tones of the voice, the tones may include four tones of yin-yang, yang-yin, up-going and down-going. Wherein, different tones and meanings expressed by the voice language are different, so that the emotion change of the character can be judged according to the tone change. It will be appreciated that the above tones are used to indicate changes in the tone of sound as the character speaks.
For example, the voice feature information is a semantic word in the voice, and the semantic word may be "o, do, or bar". Wherein, when the character uses different mood words, the emotion of the character is also different.
Optionally, the face feature information is obtained from the first video image. Specifically, the face feature information may include: facial features of the facial image, micro-expression information of the facial image and micro-action information of the facial image. Wherein, the micro-expression information reflects the emotion change of the character role, and the micro-action information mainly refers to the opening and closing of the lip part of the face.
Further, when the face image is identified for the first video image, the corresponding face feature information is different for different face images, so that the corresponding character role can be determined through the face feature information.
For example, taking the first information as the speech feature information as an example, if the tone of the speech in a segment of speech is rising, the emotion information is "excited"; as another example, for example, taking the first information as the face feature information, if a crying expression appears in the face in one video picture, the emotion information is "sad".
It can be understood that, in the case that the first information is the voice feature information, that is, the first video picture does not include the face feature information, so that the emotion information of each character can be determined only through the analysis of the voice feature information; in the case where the first information includes speech feature information and face feature information, that is, the first video picture includes the face feature information, it is possible to collectively determine emotion information of each character by analyzing the speech feature information and the face feature information. Therefore, the emotion information of each character role can be obtained under different scenes, and the accuracy of the emotion information of each character role can be determined.
And S103, in the process of playing the video picture corresponding to each character in the first video segment, the information display device displays each piece of text information according to the display mode of each piece of text information.
Optionally, the "video picture corresponding to each character" in S103 refers to: in the video picture, the micro expression or micro action of the face image corresponding to the character of the person can be recognized, namely, only the character of the person is speaking.
It should be noted that, in the embodiment of the present application, when each text message is displayed on screen, each text message may be referred to as a subtitle.
In example 1, the information display device is a mobile phone with M equal to 2. The mobile phone acquires the text information 1, the text information 2 and two character roles included in the first video segment. Wherein, the two personas are respectively: speaker _1 and Speaker _ 2. Then, the mobile phone respectively obtains the emotion information according to Speaker _ 1: happy, the display mode of the text information 1 corresponding to the Speaker _1 is determined as follows: as shown in fig. 2 (b); mood information according to Speaker _ 2: exciting, and determining the display mode of the text information 2 corresponding to the Speaker _2 as follows: as in fig. 2 (c). Then, in the process that the mobile phone plays the video picture corresponding to Speaker _1 in the first video segment, displaying text information 1 according to the floating window display shape shown in (b) of fig. 2; and in the process that the mobile phone plays the video picture corresponding to Speaker _2 in the first video segment, the text information 2 is displayed according to the floating window display shape shown in (c) of fig. 2.
It should be noted that, the above-mentioned S101 to S103 are only information display methods adopted when a certain video segment in a complete video is played, and it can be understood that, in the process of playing a complete video, the information display methods provided in S101 to S103 may be executed in a loop, so that whenever a character in a video speaks, the content (i.e. subtitles) spoken by the character may be displayed synchronously according to a display mode matching the current emotion of the character, and thus, the video may be better viewed.
Alternatively, S103 may be specifically implemented by S103A described below.
S103A, in the process of playing the video picture corresponding to each character in the first video segment, the information display device displays the text information corresponding to each character in a floating manner in the target area in the screen according to the display mode of each text information.
Under the condition that the M character roles are not matched with the first video picture, the target area is located in a preset area in the screen; in the case where the M characters are matched with the first video picture, the target area is an area satisfying a preset condition among display areas adjacent to a display area of each character in the screen.
Optionally, the preset area may be set by a manufacturer of the electronic device when the manufacturer leaves a factory, or may be set by a user for customization.
For example, the preset regions are display regions located at four corners of the screen, and the display regions may be an upper left corner display region, an upper right corner display region, a lower left corner display region, and a lower right corner display region.
Optionally, for the mismatch between the M human characters and the first video image, the method specifically includes: (1) face feature information corresponding to M person character colors is not recognized in the first video picture; (2) the facial feature information identified in the first video frame does not correspond to the M character characters. I.e. M character characters as commentary on the first video picture, i.e. the first speech segment is a voice-over commentary on the first video picture.
For example, assume that M ═ 1: in the course of playing the video picture shown in fig. 3, since the 1 character explains the video picture, the text information "this day, the puppy met his friend" is displayed in a floating manner in the display area 01 located at the upper right corner of the screen.
Optionally, for matching the M human characters with the first video image, the method specifically includes: face feature information corresponding to M person character colors is identified in the first video picture.
Optionally, the preset condition may include: minimum set, no important set, etc.
For example, in the process of playing the video picture shown in fig. 4, since the character 02 matches with the video picture, in the display area 03 adjacent to the display area of the character 02 and where no important scenery appears on the screen, the text information "wa" corresponding to the character 02 is displayed in a floating manner, and you really take into account xx school |. ".
It can be understood that, under the condition that the M character characters are not matched with the first video picture, the text information corresponding to each character can be displayed in a floating manner in the preset area of the screen according to the display mode of each text information; or, under the condition that the M character characters are matched with the first video picture, in a display area adjacent to the display area of each character in the screen and meeting the preset condition, respectively displaying the text information corresponding to each character in a floating manner according to the display mode of each text information, so that a reasonable display area can be selected in the screen according to the actual situation for displaying the text information corresponding to each character, and the display mode of the subtitles becomes richer and more colorful. Therefore, the style and the characteristics of the subtitles are clearer, and the user has a better substitution sense in the video watching process.
The embodiment of the application provides an information display method, which is used after M pieces of text information and M pieces of character colors of a person included in a video segment are acquired. Because the display mode of each text message corresponding to each character role can be determined according to the emotion information of each character role, each text message can be synchronously displayed according to the display mode of each text message in the process of playing the video picture corresponding to each character role in the video segment, so that a user can know the emotion of each character role in expressing the corresponding text message according to the display mode of each text message, and the user can conveniently understand the video when watching the video. Thus, compared with the prior art, the problem that a special crowd (such as a person with hearing impairment) cannot hear the words of the characters in the video, so that the comprehension of the video is obstructed is avoided, namely, the effect of displaying information (such as subtitles) by the electronic equipment is improved.
Optionally, the first video segment further includes a first voice segment, and a playing time stamp of the first voice segment matches with a playing time stamp of the first video picture; m pieces of text information are determined based on the first speech segment. S101 described above can be also realized by S104 to S106 described below.
S104, the information display device performs face image recognition on the first video image to obtain M pieces of face feature information.
Wherein, a persona corresponds to a face feature information.
Optionally, for the description of the face feature information, reference may be made to the detailed description in the foregoing embodiments, which are not limited in this application.
It is to be understood that, since one piece of face feature information may be used to characterize the face features of one face image in the first video frame, when the M pieces of face feature information are obtained, it may be determined that the M pieces of face image are included in the first video frame. One piece of face feature information is used to indicate one face image, and the one face image corresponds to one character, that is, one character corresponds to one piece of face feature information.
S105, the information display device divides the first video frame according to the M pieces of personal face feature information to obtain M pieces of video frames.
Wherein a video frame includes at least one persona.
Specifically, because the face feature information includes micro-motion information and micro-expression information of the face image, the first video frame can be segmented according to micro-motions and micro-expressions of different face images in different time periods to obtain M video frames.
Further, each of the M video pictures includes a plurality of frames of consecutive video pictures.
S106, the information display device determines the target character role according to the first playing time stamp of the target video picture in the M video pictures and the target human face characteristic information in the target video picture so as to obtain the M character roles.
And the second playing time stamp of the voice segment corresponding to the target character role in the first voice segment is matched with the first playing time stamp. The target video pictures are: any one of the M video pictures; the target personas are: and the character role corresponding to the target face characteristic information.
Optionally, the face feature information includes face feature information, fine motion information, and fine expression information, so that the play timestamp of any one of the M video frames can be determined by the appearance time and disappearance time of the face feature information in the video frame. Further, because the appearance time and the disappearance time of different pieces of face feature information are different, the playing time stamp of each of the M video pictures is different.
Optionally, the target video frame may include a plurality of face images. The target face feature information is face feature information of a target face image with a micro expression or micro action in the plurality of face images.
It should be noted that, since the target video frame is any one of the M video frames, the M character roles can be determined according to the timestamp of each video frame and the face feature information in each video frame.
Optionally, after S106, the information display method provided in the embodiment of the present application may further include: the information display device stores the corresponding relation between the target human face characteristic information and the target human character in the target video picture. Therefore, the face characteristic information corresponding to the voiceprint information of the character is matched directly through the voiceprint information of the character.
Example 2, example 1 of the above example was combined. The mobile phone carries out face image recognition on the first video picture to obtain 2 pieces of face characteristic information; and dividing the first video picture according to the 2 pieces of face feature information to obtain 2 video pictures, wherein the 2 video pictures are respectively a video picture 1 between 11 seconds and 13 seconds and a video picture 2 between 13 seconds and 15 seconds. The mobile phone can determine a human role Speaker _1 according to the playing time stamp of the video picture 1 and the face characteristic information of the face _ A in the video picture 1; and determining a human character Speaker _2 according to the playing time stamp of the video picture 2 and the face characteristic information of the face _ B in the video picture 2. Wherein, the timestamp of the voice segment corresponding to Speaker _1 in the first voice segment is between 11-13 seconds, and the timestamp of the voice segment corresponding to Speaker _2 in the first voice segment is between 13-15 seconds.
In this way, the characters included in the video pictures can be matched with the characters in the voice segments, so that the text information corresponding to each character is synchronized with each video picture.
It should be noted that, in a possible case, when the first video frame includes a face image, the above S104 to S106 may be executed; in another possible case, when the first video frame includes a non-face image, the above S104 to S106 are not performed; in another possible case, when the first video frame includes a face image and the face image does not have a micro-motion or a micro-expression, the first voice segment is a voice-over description of the first video frame, so that the above steps S104 to S106 are not performed.
The information display method provided by the embodiment of the application can perform face image recognition on a first video picture to obtain M pieces of face characteristic information, and segment the first video picture according to the M pieces of face characteristic information to obtain M pieces of video pictures, and determine a target character role according to a first playing time stamp of a target video picture in the M pieces of video pictures and the target face characteristic information in the target video picture to obtain M pieces of character roles, so that the character roles included in the video pictures can be matched with the character roles in a voice clip, and further, text information corresponding to each character role is synchronized with each video picture.
It should be noted that, in the information display method provided in the embodiment of the present application, the execution main body may be an information display device (for example, the information display device may be an electronic device or an external device on the electronic device), or a control module in the information display device for executing the information display method. In the embodiment of the present application, an information display device executing an information display method is taken as an example, and the information display device provided in the embodiment of the present application is described.
As shown in fig. 5, an embodiment of the present application provides an information display apparatus 200, which includes an acquisition module 201, a determination module 202, and a display module 203. The obtaining module 201 obtains M pieces of text information and M characters included in a first video segment, where one piece of text information is information expressed by one character in the first video segment, and M is an integer greater than 1. The determining module 202 is configured to determine a display manner of each text message corresponding to each character according to emotion information of each character, where the emotion information is determined according to first information related to each character. A display module 203, configured to display each piece of text information according to the display mode of each piece of text information determined by the determination module 202 in the process of playing the video picture corresponding to each character in the first video segment; wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information.
Optionally, the first video segment further includes a first voice segment, and a playing time stamp of the first voice segment matches a playing time stamp of the first video frame. The acquisition module is specifically used for carrying out voice recognition on the first voice fragment to obtain M character roles; and segmenting text information corresponding to the first voice segment according to the M character roles to obtain M text information, wherein one text information corresponds to one character role in the M character roles.
Optionally, the determining module is further configured to determine M pieces of voiceprint information included in the first voice segment; an obtaining module, configured to select, from N preset personas according to the M voiceprint information, a preset persona corresponding to the M voiceprint information as the M personas when the N preset voiceprint information includes the M voiceprint information, where one preset voiceprint information corresponds to one preset persona; or, under the condition that the N preset voiceprint information does not include the M voiceprint information, creating M human characters according to the M voiceprint information, wherein one human character corresponds to one voiceprint information; wherein N is an integer greater than or equal to M.
Optionally, the first video segment further includes a first voice segment, a playing time stamp of the first voice segment matches a playing time stamp of the first video picture, and the M pieces of text information are determined based on the first voice segment. The acquisition module is specifically used for carrying out face image recognition on the first video picture to obtain M pieces of face characteristic information, and one character role corresponds to one piece of face characteristic information; dividing the first video picture according to the M pieces of personal face characteristic information to obtain M video pictures, wherein one video picture comprises at least one character role; determining a target character role according to a first playing time stamp of a target video picture in the M video pictures and target face characteristic information in the target video picture to obtain M personal character roles, wherein a second playing time stamp of a voice fragment corresponding to the target character role in a first voice fragment is matched with the first playing time stamp; wherein, the target video picture is: any one of the M video pictures; the target personas are: and the character role corresponding to the target face characteristic information.
Optionally, the determining module is further configured to analyze the first information related to each character role to obtain emotion information of each character role.
Optionally, the display module is specifically configured to display, in a floating manner, the text information corresponding to each character in the target area in the screen according to the display mode of each text information; under the condition that the M character roles are not matched with the first video picture, the target area is located in a preset area in the screen; in the case where the M characters are matched with the first video picture, the target area is an area satisfying a preset condition among display areas adjacent to a display area of each character in the screen.
The embodiment of the application provides an information display device, which is used for acquiring M pieces of text information and M pieces of character colors of a person contained in a video segment. Because the display mode of each text message corresponding to each character role can be determined according to the emotion information of each character role, each text message can be synchronously displayed according to the display mode of each text message in the process of playing the video picture corresponding to each character role in the video segment, so that a user can know the emotion of each character role in expressing the corresponding text message according to the display mode of each text message, and the user can conveniently understand the video when watching the video. Thus, compared with the prior art, the problem that a special crowd (such as a person with hearing impairment) cannot hear the words of the characters in the video, so that the comprehension of the video is obstructed is avoided, namely, the effect of displaying information (such as subtitles) by the electronic equipment is improved.
The information display device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The information display device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The information display device provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to fig. 4, and is not described here again to avoid repetition.
Optionally, as shown in fig. 6, an electronic device 300 is further provided in this embodiment of the present application, and includes a processor 301, a memory 302, and a program or an instruction stored in the memory 302 and executable on the processor 301, where the program or the instruction is executed by the processor 301 to implement each process of the above-mentioned embodiment of the information display method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.
Fig. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 400 includes, but is not limited to: radio unit 401, network module 402, audio output unit 403, input unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, and processor 410.
Those skilled in the art will appreciate that the electronic device 400 may further include a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 410 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
The processor 410 acquires M pieces of text information and M characters included in the first video segment, where one piece of text information is information expressed by one character in the first video segment, and M is an integer greater than 1; and determining a display mode of each text message corresponding to each character role according to the emotion information of each character role, wherein the emotion information is determined according to the first information related to each character role. A display unit 406, configured to display each piece of text information according to the display mode of each piece of text information in the process of playing the video picture corresponding to each character in the first video segment; wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information.
Optionally, the first video segment further includes a first voice segment, and a playing time stamp of the first voice segment matches a playing time stamp of the first video frame. A processor 410, specifically configured to perform speech recognition on the first speech segment to obtain M human characters; and segmenting text information corresponding to the first voice segment according to the M character roles to obtain M text information, wherein one text information corresponds to one character role in the M character roles.
Optionally, the processor 410 is further configured to determine M pieces of voiceprint information included in the first speech segment; under the condition that the N pieces of preset voiceprint information comprise the M pieces of voiceprint information, selecting a preset persona corresponding to the M pieces of voiceprint information from the N pieces of preset persona as the M personas according to the M pieces of voiceprint information, wherein one piece of preset voiceprint information corresponds to one preset persona; or, under the condition that the N preset voiceprint information does not include the M voiceprint information, creating M human characters according to the M voiceprint information, wherein one human character corresponds to one voiceprint information; wherein N is an integer greater than or equal to M.
Optionally, the first video segment further includes a first voice segment, a playing time stamp of the first voice segment matches a playing time stamp of the first video picture, and the M pieces of text information are determined based on the first voice segment. The processor 410 is configured to perform face image recognition on the first video image to obtain M pieces of face feature information, where one character corresponds to one piece of face feature information; dividing the first video picture according to the M pieces of personal face characteristic information to obtain M video pictures, wherein one video picture comprises at least one character role; determining a target character role according to a first playing time stamp of a target video picture in the M video pictures and target face characteristic information in the target video picture to obtain M personal character roles, wherein a second playing time stamp of a voice fragment corresponding to the target character role in a first voice fragment is matched with the first playing time stamp; wherein, the target video picture is: any one of the M video pictures; the target personas are: and the character role corresponding to the target face characteristic information.
Optionally, the processor 410 is further configured to analyze the first information related to each character to obtain emotion information of each character.
Optionally, the display unit 406 is specifically configured to display, in a floating manner, the text information corresponding to each character in the target area in the screen according to the display mode of each text information; under the condition that the M character roles are not matched with the first video picture, the target area is located in a preset area in the screen; in the case where the M characters are matched with the first video picture, the target area is an area satisfying a preset condition among display areas adjacent to a display area of each character in the screen.
The embodiment of the application provides electronic equipment, which is used after M pieces of text information and M pieces of character colors of a person included in a video segment are acquired. Because the display mode of each text message corresponding to each character role can be determined according to the emotion information of each character role, each text message can be synchronously displayed according to the display mode of each text message in the process of playing the video picture corresponding to each character role in the video segment, so that a user can know the emotion of each character role in expressing the corresponding text message according to the display mode of each text message, and the user can conveniently understand the video when watching the video. Thus, compared with the prior art, the problem that a special crowd (such as a person with hearing impairment) cannot hear the words of the characters in the video, so that the comprehension of the video is obstructed is avoided, namely, the effect of displaying information (such as subtitles) by the electronic equipment is improved.
It should be understood that, in the embodiment of the present application, the input unit 404 may include a Graphics Processing Unit (GPU) 4041 and a microphone 4042, and the graphics processing unit 4041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 406 may include a display panel 4061, and the display panel 4061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 407 includes a touch panel 4071 and other input devices 4072. A touch panel 4071, also referred to as a touch screen. The touch panel 4071 may include two parts, a touch detection device and a touch controller. Other input devices 4072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 409 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 410 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 410.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above-mentioned information display method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device in the above embodiment. Readable storage media, including computer-readable storage media such as a computer-read-only memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, and so forth.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above information display method embodiment, and the same technical effect can be achieved.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method in the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (12)

1. An information display method, characterized in that the method comprises:
acquiring M text messages and M character roles included in a first video segment, wherein one text message is information expressed by one character role in the first video segment, and M is an integer greater than 1;
determining a display mode of each text message corresponding to each character role according to emotion information of each character role, wherein the emotion information is determined according to first information related to each character role;
in the process of playing the video picture corresponding to each character in the first video segment, displaying each text message according to the display mode of each text message;
wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information.
2. The method of claim 1, wherein the first video segment further comprises a first voice segment having a play time stamp matching a play time stamp of the first video picture;
the acquiring of the M text information and the M personas included in the first video segment includes:
performing voice recognition on the first voice fragment to obtain the M character roles;
and segmenting the first voice segment according to the M character roles to obtain the M text messages, wherein one text message corresponds to one character role in the M character roles.
3. The method of claim 2, wherein said performing speech recognition on said first speech segment to obtain said M personas comprises:
determining M pieces of voiceprint information included in the first voice fragment;
under the condition that N pieces of preset voiceprint information comprise M pieces of voiceprint information, selecting a preset persona corresponding to the M pieces of voiceprint information from N pieces of preset persona according to the M pieces of voiceprint information to serve as the M personas, wherein one piece of preset voiceprint information corresponds to one preset persona; alternatively, the first and second electrodes may be,
under the condition that N pieces of preset voiceprint information do not comprise the M pieces of voiceprint information, creating the M character roles according to the M pieces of voiceprint information, wherein one character role corresponds to one piece of voiceprint information;
wherein N is an integer greater than or equal to M.
4. The method according to claim 1, wherein the first video segment further comprises a first voice segment, a play time stamp of the first voice segment matches a play time stamp of the first video picture, the M pieces of text information are determined based on the first voice segment;
the acquiring of the M text information and the M personas included in the first video segment includes:
carrying out face image recognition on the first video picture to obtain M pieces of face characteristic information, wherein one character corresponds to one piece of face characteristic information;
according to the M pieces of face feature information, the first video picture is segmented to obtain M pieces of video pictures, and one video picture comprises at least one character role;
determining a target character role according to a first playing time stamp of a target video picture in the M video pictures and target face characteristic information in the target video picture to obtain M character roles, wherein a second playing time stamp of a voice fragment corresponding to the target character role in the first voice fragment is matched with the first playing time stamp;
wherein the target video picture is: any one of the M video pictures; the target personas are: and the character role corresponding to the target face characteristic information.
5. The method according to claim 1, wherein said displaying each text message according to its display mode comprises:
respectively displaying the text information corresponding to each character role in a target area in a screen in a floating manner according to the display mode of each text information;
wherein, under the condition that the M characters are not matched with the first video picture, the target area is positioned in a preset area in the screen;
and under the condition that the M characters are matched with the first video picture, the target area is an area which meets a preset condition in a display area adjacent to the display area of each character in the screen.
6. The information display device is characterized by comprising an acquisition module, a determination module and a display module;
the acquiring module is used for acquiring M pieces of text information and M character roles included in a first video segment, wherein one piece of text information is information expressed by one character role in the first video segment, and M is an integer greater than 1;
the determining module is used for determining a display mode of each text message corresponding to each character role according to emotion information of each character role, wherein the emotion information is determined according to first information related to each character role;
the display module is configured to display each piece of text information according to the display mode of each piece of text information determined by the determination module in the process of playing the video picture corresponding to each character in the first video segment;
wherein the M character characters are determined according to a first video picture included in the first video segment; the first information includes at least one of: voice feature information and face image feature information.
7. The device of claim 6, wherein M is greater than 1; the first video segment further comprises a first voice segment, and the playing time stamp of the first voice segment is matched with the playing time stamp of the first video picture;
the acquisition module is specifically configured to perform voice recognition on the first voice segment to obtain M character roles; and according to the M character roles, segmenting the first voice segment to obtain the M text messages, wherein one text message corresponds to one character role in the M character roles.
8. The apparatus of claim 7,
the determining module is further configured to determine M pieces of voiceprint information included in the first voice fragment;
the obtaining module is specifically configured to select, from N preset personas according to the M voiceprint information, a preset persona corresponding to the M voiceprint information as the M personas when the N preset voiceprint information includes the M voiceprint information, where one preset voiceprint information corresponds to one preset persona; alternatively, the first and second electrodes may be,
under the condition that N pieces of preset voiceprint information do not comprise the M pieces of voiceprint information, creating the M character roles according to the M pieces of voiceprint information, wherein one character role corresponds to one piece of voiceprint information;
wherein N is an integer greater than or equal to M.
9. The apparatus according to claim 6, wherein the first video segment further comprises a first voice segment having a play time stamp matching a play time stamp of the first video picture; the M pieces of text information are determined based on the first voice fragment;
the acquisition module is specifically used for carrying out face image recognition on the first video picture to obtain M pieces of face characteristic information, and one character role corresponds to one piece of face characteristic information; dividing the first video picture according to the M pieces of face characteristic information to obtain M pieces of video pictures, wherein one video picture comprises at least one character role;
the determining module is specifically configured to determine a target character role according to a first playing time stamp of a target video picture in the M video pictures and target face feature information in the target video picture to obtain M character roles, where a second playing time stamp of a voice segment corresponding to the target character role in the first voice segment matches the first playing time stamp;
wherein the target video picture is: any one of the M video pictures; the target personas are: and the character role corresponding to the target face characteristic information.
10. The apparatus according to claim 6, wherein the display module is specifically configured to display the text information corresponding to each character in a floating manner in a target area of a screen according to the display mode of each text information;
wherein, under the condition that the M characters are not matched with the first video picture, the target area is positioned in a preset area in the screen;
and under the condition that the M characters are matched with the first video picture, the target area is an area which meets a preset condition in a display area adjacent to the display area of each character in the screen.
11. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the information display method according to any one of claims 1-5.
12. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, carry out the steps of the information display method according to any one of claims 1 to 5.
CN202110925397.5A 2021-08-12 2021-08-12 Information display method and device and electronic equipment Pending CN113794927A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110925397.5A CN113794927A (en) 2021-08-12 2021-08-12 Information display method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110925397.5A CN113794927A (en) 2021-08-12 2021-08-12 Information display method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113794927A true CN113794927A (en) 2021-12-14

Family

ID=78875981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110925397.5A Pending CN113794927A (en) 2021-08-12 2021-08-12 Information display method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113794927A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299953A (en) * 2021-12-29 2022-04-08 湖北微模式科技发展有限公司 Speaker role distinguishing method and system combining mouth movement analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015156443A1 (en) * 2014-04-11 2015-10-15 네무스텍(주) Cartoon-type mobile personal secretary service system
KR20180038318A (en) * 2016-10-06 2018-04-16 주식회사 카카오 System and method for generating caption, and program of content generation
WO2020091431A1 (en) * 2018-11-02 2020-05-07 주식회사 모두앤모두 Subtitle generation system using graphic object

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015156443A1 (en) * 2014-04-11 2015-10-15 네무스텍(주) Cartoon-type mobile personal secretary service system
KR20180038318A (en) * 2016-10-06 2018-04-16 주식회사 카카오 System and method for generating caption, and program of content generation
WO2020091431A1 (en) * 2018-11-02 2020-05-07 주식회사 모두앤모두 Subtitle generation system using graphic object

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299953A (en) * 2021-12-29 2022-04-08 湖北微模式科技发展有限公司 Speaker role distinguishing method and system combining mouth movement analysis
CN114299953B (en) * 2021-12-29 2022-08-23 湖北微模式科技发展有限公司 Speaker role distinguishing method and system combining mouth movement analysis

Similar Documents

Publication Publication Date Title
CN110634483B (en) Man-machine interaction method and device, electronic equipment and storage medium
CN107705783B (en) Voice synthesis method and device
CN110941954B (en) Text broadcasting method and device, electronic equipment and storage medium
CN109819313B (en) Video processing method, device and storage medium
CN110288077B (en) Method and related device for synthesizing speaking expression based on artificial intelligence
CN109637518B (en) Virtual anchor implementation method and device
CN106024009B (en) Audio processing method and device
CN108847214B (en) Voice processing method, client, device, terminal, server and storage medium
CN111930994A (en) Video editing processing method and device, electronic equipment and storage medium
CN112367551B (en) Video editing method and device, electronic equipment and readable storage medium
CN109474845B (en) Bullet screen control method, bullet screen processing server and computer readable storage medium
CN111145777A (en) Virtual image display method and device, electronic equipment and storage medium
CN107864410B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN109859298B (en) Image processing method and device, equipment and storage medium thereof
EP4300431A1 (en) Action processing method and apparatus for virtual object, and storage medium
CN109033423A (en) Simultaneous interpretation caption presentation method and device, intelligent meeting method, apparatus and system
CN114157920B (en) Method and device for playing sign language, intelligent television and storage medium
CN113538628A (en) Expression package generation method and device, electronic equipment and computer readable storage medium
CN110781346A (en) News production method, system, device and storage medium based on virtual image
CN112492390A (en) Display device and content recommendation method
CN108847246A (en) A kind of animation method, device, terminal and readable medium
CN113794927A (en) Information display method and device and electronic equipment
CN113806570A (en) Image generation method and generation device, electronic device and storage medium
WO2023045716A1 (en) Video processing method and apparatus, and medium and program product
CN109087644B (en) Electronic equipment, voice assistant interaction method thereof and device with storage function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination