CN112532912A

CN112532912A - Video processing method and device and electronic equipment

Info

Publication number: CN112532912A
Application number: CN202011312608.XA
Authority: CN
Inventors: 牛红霞; 路呈璋; 张爽
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-03-19

Abstract

The embodiment of the invention provides a video processing method, a video processing device and electronic equipment, wherein the method comprises the following steps: acquiring video stream data of a video conference to be processed; extracting audio data of each participant from the video stream data; generating participant identifications according to the audio data of the participants; and displaying the participant identification. In the embodiment of the invention, the participant identification used for representing the participant is used when the video stream data is played, so that the user can conveniently distinguish which participant is, and a better video conference service is provided for the user, thereby improving the user experience.

Description

Video processing method and device and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a video processing method and apparatus, and an electronic device.

Background

In recent years, recording apparatuses have been developed rapidly and have entered the public domain as products in professional fields. Recording equipment is generally required for recording by journalists, students, teachers and other groups. In addition, recording of various television programs, movies, music, etc. requires the use of recording equipment.

In many scenes, besides the recording device can be used for recording, the recording device can also be used for carrying out video conference through the video conference software of a third party. However, most of the video conference software video conference services are not perfect at present, and a high-quality video conference service cannot be further provided for users.

Disclosure of Invention

The embodiment of the invention provides a video processing method, which displays the participant identification of the participant in the video conference process, thereby providing high-quality video conference service for users.

Correspondingly, the embodiment of the invention also provides a video processing device and electronic equipment, which are used for ensuring the realization and application of the method.

In order to solve the above problem, an embodiment of the present invention discloses a video processing method, which is applied to a recording device, and specifically includes:

acquiring video stream data of a video conference to be processed;

extracting audio data of each participant from the video stream data;

generating participant identifications according to the audio data of the participants;

and displaying the participant identification.

Optionally, the method further comprises:

converting the audio data of each of the parties to text data;

processing the text data into subtitles;

the displaying the participant identification includes:

adding participant identification to the subtitles;

and displaying the subtitles added with the participant identifications.

Optionally, the method further comprises:

identifying identity data from the audio data;

the displaying the subtitles added with the participant identifiers comprises:

and adding participant identifications corresponding to the participants for the subtitles added with the participant identifications according to the identity data.

Optionally, the method further comprises:

and adding corresponding visual identification for the participant identification and/or the participant identification.

Optionally, the acquiring video stream data of the video conference to be processed includes:

acquiring video stream data in a video conference while the video conference is in progress;

and taking the video stream data in the video conference as the video stream data of the video conference to be processed.

after the video conference is finished, acquiring stored video stream data;

and taking the stored video stream data as the video stream data of the video conference to be processed.

Optionally, the method further comprises:

and saving the video stream data, the audio data, the participant identification and/or the subtitle.

Optionally, the identity data is a voiceprint.

The embodiment of the invention also discloses a video processing device, which is applied to recording equipment, and the device comprises:

the video stream data acquisition module is used for acquiring video stream data of a video conference to be processed;

the audio data extraction module is used for extracting the audio data of each participant from the video stream data;

the participant identification generation module is used for generating participant identifications according to the audio data of the participants;

and the participant identification display module is used for displaying the participant identification.

Optionally, the method further comprises: the text data conversion module is used for converting the audio data of each party into text data; processing the text data into subtitles; the participant identification is displayed and is used for adding the participant identification to the caption; and displaying the subtitles added with the participant identifications.

Optionally, the method further comprises: the identity data identification module is used for identifying identity data from the audio data; and the participant identification display module is used for adding participant identifications corresponding to the participants for adding the subtitles of the participant identifications according to the identity data.

Optionally, the method further comprises: and the visual identifier adding module is used for adding corresponding visual identifiers for the participant identifiers and/or the participant identifiers.

Optionally, the video stream data acquiring module is configured to acquire video stream data in a video conference while the video conference is in progress; and taking the video stream data in the video conference as the video stream data of the video conference to be processed.

Optionally, the video stream data acquiring module is configured to acquire stored video stream data after the video conference is ended; and taking the stored video stream data as the video stream data of the video conference to be processed.

Optionally, the method further comprises: and the storage module is used for storing the video stream data, the audio data, the participant identification and/or the subtitle.

Optionally, the identity data is a voiceprint.

The embodiment of the invention also discloses a readable storage medium, and when the instructions in the storage medium are executed by a processor of the electronic equipment, the electronic equipment can execute the video processing method according to any one of the embodiments of the invention.

The embodiment of the invention also discloses a sound recording device, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for:

acquiring video stream data of a video conference to be processed;

extracting audio data of each participant from the video stream data;

and displaying the participant identification.

Optionally, generating participant identities in the audio data according to each participant, further includes:

converting the audio data of each of the parties to text data;

processing the text data into subtitles;

the displaying the participant identification includes:

adding participant identification to the subtitles;

and displaying the subtitles added with the participant identifications.

Optionally, the method further comprises:

identifying identity data from the audio data;

the displaying the subtitles added with the participant identifiers comprises:

Optionally, the method further comprises:

after the video conference is finished, acquiring stored video stream data;

Optionally, the method further comprises:

Optionally, the identity data is a voiceprint.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, the audio data of each participant is extracted from the video stream data of the video conference to be processed, the participant identification is generated according to the audio data of each participant, and then the participant identification is displayed. In the embodiment of the invention, the participant identification used for representing the participant is provided when the video stream data is played, so that a user can conveniently distinguish which participant is, and a better video conference service is provided for the user, thereby improving the user experience.

Drawings

FIG. 1 is a flow chart of the steps of one embodiment of a video processing method of the present invention;

FIG. 2 is a schematic diagram of video processing during a video conference in accordance with the present invention;

FIG. 3 is a schematic illustration of a video conference detail page of the present invention;

FIG. 4 is a schematic diagram of video processing after a video conference is completed according to the present invention;

FIG. 5 is a block diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 6 illustrates a block diagram of an electronic device for video processing, according to an example embodiment.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The embodiment of the invention provides a video processing method, which is applied to recording equipment, wherein the recording equipment can be equipment with a recording function, such as a recording pen, translation equipment such as a translation pen, a translator and the like; the embodiments of the present invention are not limited in this regard.

The video conference is carried out through the recording equipment, the participant identification used for representing the participants is displayed on the recording equipment, the participant identification is convenient to distinguish by a user through the participants, the high-quality video conference service is provided for the user, and the user experience is improved.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a video processing method according to the present invention is shown, which may specifically include the following steps:

and 102, acquiring video stream data of the video conference to be processed.

In the embodiment of the present invention, the video stream data of the video conference to be processed may be video stream data obtained during the video conference, or may be video stream data stored after the video conference is ended.

And 104, extracting the audio data of each participant from the video stream data.

The recording equipment is provided with an image acquisition module and an audio acquisition module.

As a specific example of the present invention, the image capturing module may be a camera, and the audio capturing module may be a microphone. When a video conference is carried out, the recording equipment can shoot video data through the camera, acquire audio data through the microphone, and then display and play the video data and the audio data together as video stream data on the recording equipment.

Of course, video stream data sent by other electronic devices participating in the video conference, such as translation devices, recording devices, smartphones, tablet computers, and the like, may also be received on the recording device, and the video stream data may also include video data and audio data.

In the embodiment of the present invention, the audio data of each participant may be extracted from the video stream data, and specifically, the extracted audio data may include audio data of the recording device (my party) and audio data of the electronic device (the other party, which may be one or more parties).

And 106, generating participant identifications according to the audio data of the participants.

The participant id is a name or number of a participant for distinguishing different participants in the video conference.

In the embodiment of the invention, the corresponding participant identification can be generated according to the audio data of each participant so as to distinguish different participants participating in the video conference. Specifically, the audio data of my party and the other party may be extracted from the video stream data, and then the corresponding participant identifier may be generated according to the audio data, for example, the participant identifier "my party" may be generated according to the audio data of my party, and the participant identifier "other party" may be generated according to the audio data of the other party, where the number of the other parties may be one or more, and when the number is two or more, the participant identifiers "other party 1" and "other party 2" … … may be generated.

And step 108, displaying the participant identification.

In embodiments of the present invention, participant identities may be displayed. Specifically, when video stream data of the video conference is played, a corresponding participant identifier may be played at a specified position in the video conference interface, for example, the participant identifier may be displayed below the played video stream data, or the participant identifier may be displayed on the right side of the played video stream data.

For example, assuming that the video conference is a three-party conference, and the audio data of my party, the counterpart 1, and the counterpart 2 are extracted from the video stream data of the video conference, the participant identifiers "my party", "counterpart 1", and "counterpart 2" may be generated according to the audio data, and then sequentially displayed below the video conference interface according to the speaking sequence of the participants, and assuming that the speaking sequence is the counterpart 1, my party, counterpart 2, and my party, the participant identifiers are displayed as "counterpart 1, my party, counterpart 2, my party".

In the video processing method, the audio data of each participant is extracted from the video stream data of the video conference to be processed, the participant identifier is generated according to the audio data of each participant, and then the participant identifier is displayed. In the embodiment of the invention, the participant identification used for representing the participant is displayed when the video stream data is played, so that a user can conveniently distinguish which participant is, and a better video conference service is provided for the user, thereby improving the user experience.

In an optional embodiment of the present invention, further comprising: converting the audio data of each of the parties to text data; and processing the text data into subtitles.

In the embodiment of the present invention, audio data may be converted into text data, and then the text data may be displayed as subtitles of video stream data.

In the above alternative embodiment, the text data may be recognized from the audio data and then processed as subtitles for presentation, so that the corresponding content can be visually observed at the same time when playing the video stream data or playing the audio data.

In an optional embodiment of the present invention, the step 108 of displaying the participant identifier includes: adding participant identification to the subtitles; and displaying the subtitles added with the participant identifications.

In the embodiment of the invention, after the audio data are converted into the text data and processed into the subtitles, the corresponding participant identification is added to each subtitle so as to distinguish the subtitles corresponding to the audio data of different participants. Specifically, audio data of my party and the other party may be extracted from the video stream data, the audio data may be converted into text data, the text data is processed into subtitles, and then corresponding participant identifiers are added to the subtitles of my party and the other party, for example, a participant identifier "my party" may be added to the subtitle of my party and a participant identifier "opposite party" may be added to the subtitle of the other party, where one or more of the opposite parties may be provided, and when two or more of the opposite parties are provided, participant identifiers "opposite party 1" may be added to the subtitles of two or more of the opposite parties: subtitle 1 "," counterpart 2: subtitles 2 … … "… ….

In the embodiment of the invention, the identification of the participant is added to the subtitle, so that the subtitle added with the identification of the participant can be displayed. Specifically, when video stream data of a video conference is played, a corresponding subtitle may be played at a specified position in a video conference interface, for example, the participant identifier and the subtitle may be displayed below the played video stream data, or the participant identifier and the subtitle may be displayed on the right side of the played video stream data.

In an optional embodiment of the invention, the method further comprises: identifying identity data from the audio data; the displaying the subtitle added with the participant identifier comprises: and adding participant identifications corresponding to the participants for the subtitles added with the participant identifications according to the identity data.

The identity data is used for uniquely identifying the corresponding participant. Specifically, the identity data may be a voiceprint, and since the voiceprint has specificity and stability, different speaking participants in the video conference can be identified through the voiceprint. Participant identification is a participant name or participant number or the like used to distinguish different participants in a video conference.

In the embodiment of the invention, after the identity data is identified from the audio data, the corresponding participant identification can be further added to the caption according to different identity data, so that the displayed caption can comprise the participant identification. For example, assuming that two identification data are recognized from one audio data, "participant 1" and "participant 2" may be added to the subtitles corresponding to the different identification data, respectively.

Optionally, in the embodiment of the present invention, only one subtitle corresponding to the participant may be displayed, and after the change of the identity data is recognized, the subtitle corresponding to the changed identity data is displayed. Specifically, when it is recognized that the identity data changes, it indicates that the participant who is speaking at present has changed, and then only the subtitle corresponding to the changed identity data may be displayed, for example, if the change is from participant a to participant B, it may be detected that the voiceprint changes, that is, the voiceprint of participant a is switched to the voiceprint of participant B, and then the participant identification and the subtitle of participant a may be switched to the participant identification and the subtitle of participant B from the display.

In the optional embodiment, the participant identifiers of the participants and the participant identifiers of the participants, which are obtained based on video stream data recognition, are added to the subtitles, so that a user can distinguish the speech content corresponding to the participant of which participant is, thereby providing high-quality video conference service for the user and further improving the user experience.

Further, in a video conference, participants usually speak to make their opinions, and when speaking, the participants need to make sounds by opening and closing lips, so in the embodiment of the present invention, for extracting portrait images from video stream data to identify lip feature data, a participant who is currently speaking in the video conference is identified.

Specifically, portrait identification is a biometric identification technology for identifying based on portrait feature information of a person, and uses a camera to collect an image or a video stream containing a portrait, and automatically detects and tracks a face in the image, thereby obtaining a series of related technologies of portrait feature information, which are generally called face identification and face identification. In the embodiment of the invention, specifically, the portrait identification is performed on the video stream data through the portrait identification, so that the portrait feature data corresponding to different participants can be obtained, then the face image can be confirmed according to the different portrait feature data, then the lip feature data is further identified from the portrait image, and then when the lip feature data meets the preset condition, the participant who is speaking currently in the video conference is confirmed.

In the embodiment of the invention, the portrait image is input into the pre-trained portrait recognition model, and the portrait recognition model can output corresponding lip feature data. Specifically, the preset condition may be that lip feature data is matched with preset lip feature data, where the preset lip feature data is lip feature data generated in advance to characterize that a participant is speaking, and therefore, when the lip feature data is matched with the preset lip feature data, it may be determined that the participant corresponding to the portrait image is speaking, and the portrait image may be marked in the displayed video stream data. The marking mode may be other modes such as box selection, amplification, highlight, and the like, which is not limited in the embodiment of the present invention. By marking the portrait images at the positions in the video stream data and combining the subtitles comprising the participant identifications and/or the participant identifications, the user can conveniently check the currently speaking participants, and can conveniently combine the portrait images in the displayed video stream data.

In an optional embodiment of the invention, the method further comprises:

The visual identification can be in different colors or fonts and other special effects.

In the embodiment of the invention, the participant identification and the participant identification can be obtained, and the participant can be distinguished and the participant of the participant can be distinguished through the participant identification and the participant identification.

By way of example, suppose there are participants participating in a video conference including my party, opponent 1 and opponent 2, wherein, each party comprises a participant 1 and a participant 2, each party 1 comprises a participant 3 and a participant 4, the counterpart 2 includes the participant 5, the participant mark "my" of my party can be displayed in red, the participant mark "counterpart 1" of the counterpart 1 can be displayed in green, and the participant mark 'the other party 2' of the other party 2 is displayed in pink, furthermore, the participant marks 'the participant 1' and 'the participant 2' of the other party can be displayed in yellow and white respectively, participant identifications "participant 3" and "participant 4" of counterpart 1 may be displayed in brown and blue, respectively, and participant identification "participant 5" of counterpart 2 may be displayed in purple.

Of course, the above-mentioned visual identifiers are only examples, and in practical applications, the visual identifiers may be added to only the participant identifiers of my party, the other party 1, and the other party 2, or only the participant identifiers, which is not limited in this embodiment of the present invention.

In the above optional embodiment, the visual identifier is added to the participant identifier and/or the participant identifier, so that the user can conveniently and quickly identify different participants and/or participants, and meanwhile, the displayed content is more diversified, and the user experience is further improved.

In an optional embodiment of the present invention, the step 102 of obtaining video stream data of a video conference to be processed includes:

When a user needs to carry out a video conference, the video conference function of the recording device can be started so as to carry out the video conference based on the recording device. Specifically, the recording device may be installed with third-party video conference software, and the user may start the video conference software on the recording device, so that a video conference may be performed based on the video conference software. In the embodiment of the invention, the video stream data can be acquired in real time during the video conference, and then the video stream data is used as the video stream data of the video conference to be processed.

Specifically, referring to fig. 2, during the video conference, video stream data is obtained in real time and displayed in a recording device, audio data of my party and the other party are extracted from the video stream data, and after the audio data are converted into text data and processed into subtitles, corresponding participant identifiers, such as "my party", "other party 1" and "other party 2", are added to the subtitles, and further, identification data can be identified from the video stream data, and corresponding participant identifiers, such as "participant 1", "participant 2" of my party, "participant 3", "participant 4" of opposite party 1, and "participant 5" of opposite party 2, can be added to the subtitles of different identification data.

In the above optional embodiment, during the video conference, the audio data of each participant is extracted from the video stream data in real time, the audio data is converted into text data to process subtitles, and the participant identifier and/or the participant identifier are/is added, so that the user can conveniently distinguish which participant and/or participant during the video conference.

In an optional embodiment of the present invention, the acquiring video stream data of a video conference to be processed includes:

after the video conference is finished, acquiring stored video stream data;

In the embodiment of the invention, at the end of the video conference, the video stream data can be saved, and then the saved video stream data is used as the video stream data of the video conference to be processed when needed.

Specifically, referring to fig. 3, after the video conference is finished, the video stream data is stored in the recording device, at this time, the video conference detail page of the recording device may be entered, the video stream data of a certain video conference is checked, if the video stream data is not transcribed, a subtitle is not displayed temporarily below the video conference detail page, but a transcription key "transcribe immediately" is displayed, and a user clicks the key, the video stream data may be transcribed, specifically, audio data of my party and the other party are extracted from the video stream data, and after the audio data are respectively converted into text data, corresponding participant identifiers, such as "my party", "other party 1" and "other party 2", are added to the text data, in addition, the identity data may be identified from the video stream data, and corresponding participant identifiers are added to text data of different identity data, such as "participant 1", "participant 2" of my party, "participant 3", "participant 4" of party 1, and "participant 5" of party 2. And successfully obtaining a subtitle corresponding to the video stream data and containing the participant identifier and/or the participant identifier, wherein the subtitle can be displayed below the video stream data, as shown in fig. 4 in particular.

In the above optional embodiment, after the video conference is finished, the audio data of each participant can be extracted from the video stream data, the audio data is converted into text data to process subtitles, and the participant identifier and/or the participant identifier are/is added, so that after the video conference is finished, the content of the video conference can be traced more truly and more completely, and the speaking content of a certain participant can be recalled.

In an optional embodiment of the invention, the method further comprises: and saving the video stream data, the audio data, the participant identification and/or the subtitle.

In the embodiment of the invention, after the video stream data is processed, the video stream data, the audio data, the participant identifier and/or the subtitle are obtained, and the video stream data, the audio data, the participant identifier and/or the subtitle are stored. For example, in the video conference process, the video stream data is processed in real time to obtain the audio data, the participant identifiers and the participant identifiers, and then the video stream data, the audio data, the participant identifiers and/or the subtitles are stored after the video conference is finished. For another example, after the video conference is finished, the video stream data is processed to obtain the subtitles, and then the audio data of the video stream data and the participant identifications, the participant identifications and/or the subtitles are saved.

Optionally, in a specific application, the video stream data and the participant identifier, and/or the subtitle may be stored together, or an association relationship may be established and stored separately. For example, when a video conference needs to be viewed, corresponding video stream data or audio data is acquired, then the video stream data or the audio data is displayed, meanwhile, a display identifier key is provided in an interface for displaying the video stream data or the audio data, when a user needs to display an identifier, the user can click the key to trigger generation of an identifier acquisition instruction, and the recording device acquires and displays the identifier of a participant, the identifier of a participant and/or a subtitle according to a pre-established association relationship. Certainly, when the video conference needs to be viewed, the corresponding participant identifier, participant identifier and/or subtitle may be acquired, then the participant identifier, participant identifier and/or subtitle is displayed, meanwhile, a "video display or audio play" button is provided in an interface for displaying the participant identifier, participant identifier and/or subtitle, when a user needs to display video stream data or play audio data, the button may be clicked to trigger generation of an audio/video acquisition instruction, and the recording device acquires the video stream data to display or acquires the audio data to display according to a pre-established association relationship.

In the above optional embodiment, after the video stream data is processed to obtain the audio data, the participant identifier and/or the subtitle, the video stream data and the subtitle may be stored in association, so that when the video stream data is displayed again, the audio data, the participant identifier and/or the subtitle do not need to be processed again, and unnecessary system overhead is reduced.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 5, a block diagram of a video processing apparatus according to an embodiment of the present invention is shown, and is applied to a recording device, and specifically includes the following modules:

a video stream data obtaining module 502, configured to obtain video stream data of a video conference to be processed;

an audio data extraction module 504, configured to extract audio data of each participant from the video stream data;

a participant identifier generating module 506, configured to generate participant identifiers according to the audio data of each participant;

and a participant identifier display module 508, configured to display the participant identifier.

In an optional embodiment of the invention, the apparatus further comprises: the text data conversion module is used for converting the audio data of each party into text data; processing the text data into subtitles; the participant identifier display 508 is configured to add a participant identifier to the subtitle; and displaying the subtitles added with the participant identifications.

In an optional embodiment of the invention, the apparatus further comprises: the identity data identification module is used for identifying identity data from the audio data; and the participant identifier display module 508 is configured to add participant identifiers corresponding to the participants according to the subtitles added with the participant identifiers by the identification data.

In an optional embodiment of the invention, the apparatus further comprises: and the visual identifier adding module is used for adding corresponding visual identifiers for the participant identifiers and/or the participant identifiers.

In an optional embodiment of the present invention, the video stream data obtaining module 502 is configured to obtain video stream data in a video conference while the video conference is in progress; and taking the video stream data in the video conference as the video stream data of the video conference to be processed.

In an optional embodiment of the present invention, the video stream data obtaining module 502 is configured to obtain stored video stream data after the video conference is ended; and taking the stored video stream data as the video stream data of the video conference to be processed.

In an optional embodiment of the invention, the apparatus further comprises: and the storage module is used for storing the video stream data, the audio data, the participant identification and/or the subtitle.

In an optional embodiment of the invention, the identity data is a voiceprint.

In the embodiment of the invention, the audio data of each participant is extracted from the video stream data of the video conference to be processed, the participant identification is generated according to the audio data of each participant, and then the participant identification is displayed. In the embodiment of the invention, the participant identification used for representing the participant is displayed when the video stream data is played, so that the user can conveniently distinguish which participant is, the subtitle viewing experience is better, and the video conference service with higher quality is provided for the user, thereby improving the user experience.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Fig. 6 is a block diagram illustrating an architecture of an electronic device 600 for video processing, according to an example embodiment. For example, the electronic device 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, electronic device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an interface to input/output (I/O) 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on the electronic device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power component 606 provides power to the various components of electronic device 600. Power components 606 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 600.

The multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 600 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the electronic device 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the electronic device 600, the sensor component 614 may also detect a change in the position of the electronic device 600 or a component of the electronic device 600, the presence or absence of user contact with the electronic device 600, orientation or acceleration/deceleration of the electronic device 600, and a change in the temperature of the electronic device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the electronic device 600 and other devices in a wired or wireless manner. The electronic device 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 614 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 614 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the electronic device 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an optional embodiment of the present invention, the electronic device 600 may be a recording device, and the recording device may be a recording pen, a translating pen, a translator, or the like. A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a video processing method, the method comprising: acquiring video stream data of a video conference to be processed; extracting audio data of each participant from the video stream data; generating participant identifications according to the audio data of the participants; and displaying the participant identification.

Optionally, generating participant identities in the audio data according to each participant, further includes: converting the audio data of each of the parties to text data; processing the text data into subtitles;

the displaying the participant identification includes: adding participant identification to the subtitles; and displaying the subtitles added with the participant identifications.

Optionally, the method further comprises: identifying identity data from the audio data; the displaying the subtitles added with the participant identifiers comprises: and adding participant identifications corresponding to the participants for the subtitles added with the participant identifications according to the identity data.

Optionally, the method further comprises: and adding corresponding visual identification for the participant identification and/or the participant identification.

Optionally, the acquiring video stream data of the video conference to be processed includes: acquiring video stream data in a video conference while the video conference is in progress; and taking the video stream data in the video conference as the video stream data of the video conference to be processed.

Optionally, the acquiring video stream data of the video conference to be processed includes: after the video conference is finished, acquiring stored video stream data; and taking the stored video stream data as the video stream data of the video conference to be processed.

Optionally, the method further comprises: and saving the video stream data, the audio data, the participant identification and/or the subtitle.

Optionally, the identity data is a voiceprint.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The foregoing detailed description of a video processing method, a video processing apparatus and an electronic device provided by the present invention has been presented, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the description of the foregoing embodiments is only used to help understand the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A video processing method, applied to a recording apparatus, the method comprising:

acquiring video stream data of a video conference to be processed;

extracting audio data of each participant from the video stream data;

and displaying the participant identification.

2. The method of claim 1, further comprising:

converting the audio data of each of the parties to text data;

processing the text data into subtitles;

the displaying the participant identification includes:

adding participant identification to the subtitles;

and displaying the subtitles added with the participant identifications.

3. The method of claim 2, further comprising:

identifying identity data from the audio data;

the displaying the subtitles added with the participant identifiers comprises:

4. The method of claim 1, 2 or 3, further comprising:

5. The method of claim 1, wherein obtaining video stream data of the pending videoconference comprises:

6. The method of claim 1, wherein obtaining video stream data of the pending videoconference comprises:

after the video conference is finished, acquiring stored video stream data;

7. The method of claim 2 or 3, further comprising:

8. A video processing apparatus, for use in a recording device, the apparatus comprising:

9. An audio recording apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

acquiring video stream data of a video conference to be processed;

extracting audio data of each participant from the video stream data;

and displaying the participant identification.

10. A readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video processing method of any of method claims 1-7.