CN111372119B

CN111372119B - Multimedia data recording method and device and electronic equipment

Info

Publication number: CN111372119B
Application number: CN202010305850.8A
Authority: CN
Inventors: 付玉迪
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2022-07-08
Anticipated expiration: 2040-04-17
Also published as: CN111372119A

Abstract

The invention provides a multimedia data recording method, a multimedia data recording device and electronic equipment, wherein the method comprises the following steps: receiving a first input of a user; responding to the first input, recording first multimedia data, and displaying target text information on a recording interface of the first multimedia data; wherein the first multimedia data includes audio information of a target user, and the target text information is obtained from the second multimedia data. The invention provides the reminding function of the target text information for the user in the process of recording the first multimedia data containing the audio information of the target user, avoids the situation that the user needs to repeatedly and frequently record words forgetting and the like when recording the multimedia data, avoids the complicated process that the user needs to manually record the target text information, and is beneficial to improving the success rate of recording the first multimedia data.

Description

Multimedia data recording method and device and electronic equipment

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a multimedia data recording method and apparatus, and an electronic device.

Background

With the development of electronic devices, the functions of the electronic devices are more and more. Currently, electronic devices generally have an audio recording function, such as: a user may record audio data via an audio recording application and video data containing audio information via a video recording application. When a user needs to record similar audio contents in target audio data or target video data containing audio information through an audio recording application or a video recording application in electronic equipment, the user generally needs to remember or manually record text contents of the audio information to be recorded in advance, and then record the text contents through the audio recording application or the video recording application in the electronic equipment. In the method, for the user to remember the text content of the audio information, the user may forget words and the like, and the recording needs to be repeated frequently; for the way of recording the text content of the audio information in advance, a cumbersome recording process is also required, and the above ways are inconvenient for the user to operate.

Disclosure of Invention

The invention provides a multimedia data recording method, a multimedia data recording device and electronic equipment, which can solve the problems that in the current multimedia data recording process, the repeated and frequent recording is needed due to the condition that a user forgets words and the like, or the operation is complicated and the operation is inconvenient when the user manually records the text content of audio information.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a multimedia data recording method, which is applied to an electronic device, and the method includes:

receiving a first input of a user;

responding to the first input, recording first multimedia data, and displaying target text information on a recording interface of the first multimedia data;

wherein the first multimedia data includes audio information of a target user, and the target text information is obtained from the second multimedia data.

In a second aspect, an embodiment of the present invention further provides a multimedia data recording apparatus, which is applied to an electronic device, and the apparatus includes:

the receiving module is used for receiving a first input of a user;

the response module is used for responding to the first input, recording first multimedia data and displaying target text information on a recording interface of the first multimedia data;

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the multimedia data recording method described above.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the multimedia data recording method as described above.

In the embodiment of the present invention, by receiving a first input of a user, recording first multimedia data including audio information of a target user in response to the first input, and displaying target text information obtained from second multimedia data on a recording interface of the first multimedia data, in a process of recording the first multimedia data including the audio information of the target user, a reminding effect of the target text information is provided for the user by displaying the target text information obtained from second media data on the recording interface in advance, so as to prevent the user from recording multimedia data, such as: the situations of word forgetting and the like possibly occur when audio data or video data are recorded, the repeated and frequent recording is needed, the complicated process that a user possibly needs to manually record target text information is avoided, the target text information obtained from the second multimedia data is displayed on the recording interface of the first multimedia data, the prompt of the user in the recording process is realized, and the success rate of recording the first multimedia data is improved.

Drawings

Fig. 1 is a flow chart of a multimedia data recording method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an embodiment of the present invention showing target text information;

FIG. 3 is a schematic diagram of a "performance prompter" button of an embodiment of the present invention;

FIG. 4 is a schematic view of a browsing interface for video data according to an embodiment of the present invention;

FIG. 5 illustrates an example of targeted textual information content of an embodiment of the present invention;

FIG. 6 shows an example of displaying the information content of a first sentence text in accordance with an embodiment of the present invention;

fig. 7 shows an example of displaying the content of the text information of the second sentence according to the embodiment of the present invention;

fig. 8 shows an example of displaying the content of text information of a third sentence according to the embodiment of the present invention;

fig. 9 shows an example of displaying the contents of first sentence text information and second sentence text information in an embodiment of the present invention;

fig. 10 shows an example of displaying the contents of the second sentence text information and the third sentence text information in the embodiment of the present invention;

fig. 11 is a diagram showing one of the characters of the text information of the first sentence matching the speech content of the entered first audio information in the second display mode according to the embodiment of the present invention;

fig. 12 is a diagram illustrating one of the characters of the text information of the first sentence matching the speech content of the entered first audio information in the second display mode according to the embodiment of the present invention;

FIG. 13 is a diagram illustrating a third exemplary embodiment of the present invention;

FIG. 14 is a schematic diagram of a target display position corresponding to a gaze position in accordance with an embodiment of the present invention;

FIG. 15 is a diagram illustrating the display of target text information at a target display location in accordance with an embodiment of the present invention;

fig. 16 is a block diagram of a multimedia data recording apparatus according to an embodiment of the present invention;

fig. 17 is a schematic diagram showing a hardware configuration of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a multimedia data recording method applied to an electronic device, where the method includes:

step 101: receiving a first input of a user;

optionally, the first input of the user input is used to start a recording function of the multimedia data, such as: the first input may be a start input for starting an audio recording application, a start input for starting a video recording application, or the like.

Step 102: and responding to the first input, recording first multimedia data, and displaying second multimedia data on a recording interface of the first multimedia data.

Optionally, the first multimedia data may be audio data or video data; the second media data may be audio data, video data, text data, or the like, and may carry the target text information or indirectly obtain the target text information. For example: the target text information can be obtained indirectly by recognizing corresponding voice content from video data/audio data by using a voice recognition technology, and further converting the recognized voice content into text information.

The types of the first multimedia data and the second multimedia data may be the same, for example, in the case that the first multimedia data is audio data, the second multimedia data may also be audio data, and in the case that the first multimedia data is video data, the second multimedia data may also be video data; of course, the types of the first multimedia data and the second multimedia data may be different, for example, in the case that the first multimedia data is audio data, the second multimedia data may be video data, text data, etc., and in the case that the first multimedia data is video data, the second multimedia data may be audio data, text data, etc.

As shown in fig. 2, for example, in a video recording application, the target text message 21 may be displayed on a recording interface of the video recording. In addition, the record button 22 is used to start video recording, or to pause or stop video recording during video recording; the thumbnail window 23 is used for displaying image data which is shot last, such as a thumbnail of video data, a thumbnail of a photo, and the like, and the thumbnail window 23 may also be associated with an image browsing interface for entering the image browsing interface.

When the number of the characters of the target text information is within the preset range, all contents of the target text information can be displayed on a recording interface of video recording, so that when the number of the characters of the target text information is within the preset range, all contents of the target text information can be visually checked by a user in a mode of displaying all contents of the target text information.

Under the condition that the number of characters of the target text information exceeds the preset range, partial content in the target text information can be displayed; for example: in the case of displaying a part of the content in the target text information in the recording interface of the video recording, a part of the text information matched with the voice content of the currently recorded audio information may be displayed, as shown in fig. 2, if the voice content of the currently recorded audio information is "today weather", then a part of the text information matched with the voice content of "today weather" in the target text information may be displayed in the recording interface of the video recording as "today weather is well suited for stepping on.

Therefore, under the condition that the number of the characters of the target text information exceeds the preset range, a mode of displaying partial content in the target text information can be used for avoiding that a larger display area is occupied due to excessive content of the target text information, so that the display effect of a recording interface is favorably ensured.

In the above solution, by receiving a first input of a user, recording first multimedia data including audio information of a target user in response to the first input, and displaying target text information obtained from second multimedia data on a recording interface of the first multimedia data, in a process of recording the first multimedia data including the audio information of the target user, a reminding effect of the target text information is provided for the user by displaying the target text information obtained from the second multimedia data on the recording interface in advance, so as to prevent the user from recording the multimedia data, for example: when audio data or video data are recorded, the situations of word forgetting and the like can occur, the recording needs to be repeatedly and frequently carried out, the complicated process that a user possibly needs to manually record target text information is avoided, the target text information obtained from the second multimedia data is displayed on the recording interface of the first multimedia data, the prompt of the user in the recording process is realized, and the success rate of recording the first multimedia data is improved.

Optionally, in a case that the second multimedia data is multimedia data including target audio information, before the step of displaying the target text information on the recording interface of the first multimedia data, the method may further include:

extracting target audio information in the second multimedia data;

and carrying out voice recognition processing on the target audio information to obtain the target text information.

And the target text information is used for prompting the voice content of the corresponding target user in the recording process of the first multimedia data.

Optionally, the multimedia data recording method of the embodiment of the present invention may be applied to an audio recording or video recording scene, so that in the audio recording or video recording scene, the target audio information in the second multimedia data may be automatically identified, and the target text information may be automatically generated corresponding to the target audio information, thereby avoiding a cumbersome operation that a user needs to manually record the target text information.

For example: a "performance prompter" button can be displayed in the audio recording interface or the video recording interface, so that in the audio recording interface or the video recording interface, a user can trigger the function of acquiring the target text information by triggering the "performance prompter" button. When a user triggers a 'performance prompter' button, a browsing interface of multimedia data can be switched; the browsing interface of the multimedia data may be a video data browsing interface, an audio data browsing interface or a multimedia data selection interface, wherein the multimedia data selection interface may include at least one of video data and audio data. Therefore, the user can determine the second multimedia data through selection operation and realize that one key acquires the target text information corresponding to the second multimedia data.

In the case of determining the second multimedia data, further extracting the target audio information in the second multimedia data; and the target text information is obtained by carrying out voice recognition processing on the target audio information, so that a complicated process that a user possibly needs to manually record the target text information is avoided, and the method is beneficial to simple and convenient user operation.

Optionally, the target text information includes: textual information indicative of the speech content, and at least one of: character identification information and pronunciation identification information.

The role identification information is used for distinguishing character information which is used for indicating the voice content and corresponds to different roles, wherein the role identification information can be role names, patterns and the like, and different roles correspond to different role identification information, so that a user can distinguish the character information which is used for indicating the voice content and corresponds to different roles; the pronunciation identification information may include, but is not limited to: tone information, speech rate information, prompt information for adjusting breathing, pause time, and the like.

For example: in the case that the target text information only contains text information of the voice content corresponding to one character, the target text information may include text information indicating the voice content, or may further include text information indicating the voice content and pronunciation identification information to prompt the user to enter the voice content.

Another example is: when the target text information contains the character information of the voice content corresponding to a plurality of characters, the target text information can also comprise character identification information besides the character information used for indicating the voice content to distinguish the character information used for indicating the voice content, or can also comprise pronunciation identification information besides the character identification information and the character information used for indicating the voice content to prompt a user to input pronunciation information when the voice content is input.

As shown in fig. 3, taking a video recording scene as an example, a "performance prompter" button 31 may be displayed in the video recording interface, and a user may switch to a browsing interface of video data by triggering the "performance prompter" button 31 in the video recording interface, as shown in fig. 4, at least one thumbnail 41 of video data may be displayed in the browsing interface of video data.

Further, the user may select a second multimedia data by clicking a thumbnail of the video data; under the condition that second multimedia data are determined, extracting target audio information in the second multimedia data, and identifying and obtaining voice contents corresponding to different users in the target audio information through a voice identification technology, a voiceprint identification technology and the like; the speech recognition technology can recognize the speech content in the target audio information and convert the speech content into corresponding text information; the voiceprint recognition technology can distinguish different users according to different sound wave frequency spectrums, so that voice contents corresponding to different users in target audio information can be distinguished, and target text information can be generated according to the target users and the corresponding voice contents thereof, as shown in fig. 5, the target text information 51 includes text information corresponding to the voice contents of the role a and role identification information a corresponding to the text information, such as "a: today, weather is well suitable for people who step on green to go to the park bar together, and text information corresponding to the voice content of the character B and character identification information B corresponding to the text information are as follows, such as' B: good ".

Optionally, by identifying text information corresponding to the voice content of different roles in the target text information, when the target text information is displayed on the recording interface, a prompt for distinguishing the lines of different roles can be provided.

Optionally, in the process of recognizing the target audio information, besides the voice content of the user and the corresponding target user, pronunciation identification information corresponding to the voice content may also be recognized, such as: the tone information, intonation information, speech rate information, time intervals between the voice contents, and the like corresponding to the voice contents can be identified in the target text information. As shown in fig. 5, the target text information 52 includes the voice content of the character a, pronunciation identification information of the voice content, and character identification information a, such as "a: today, the weather is good (interval 0.3 seconds) and is suitable for people to step on green (interval 1 second, calm voice, gentle voice and slow voice speed), and the content of the speaking of the role B, pronunciation identification information and corresponding role identification information B are shown as' B: good (tone excited, moderate rate of speech) ".

In this way, the voice content, the pronunciation information of the voice content, the corresponding role and the like in the target text information 52 are identified to provide the prompt of the voice content and the pronunciation information when the user records the first multimedia data, so that the prompt of the recorded voice content through the target text information is richer and more diverse, and the success rate of recording is favorably improved through the prompt of more voice contents and related information thereof.

Optionally, in the process of identifying the voice content in the target audio information and converting the voice content into the corresponding text information, text paragraphs may also be divided according to the time interval of the voice content of the user in the target audio information; for example: when the first voice time period corresponding to the same user is over a predetermined time period (for example, 0.5 second later) and then the second voice time period is started, the two voice time periods are divided into two audio frequency periods, so that the text information of the respective corresponding voice contents can be obtained for the recognition of the voice contents in the different voice time periods, and further the text paragraphs of the recognized voice contents can be divided, as shown in fig. 5, in the target text information 53, the text information corresponding to the voice contents of the character a is divided into two text paragraphs, "today's weather is very good for stepping on" and "we go to the park bar together". Or after the target character information is identified, the target character information may be divided into a plurality of character paragraphs by performing semantic analysis on the target text information (the semantic analysis method may be implemented by establishing a semantic analysis model based on a learning algorithm).

Further, under the condition that the target text information is divided into a plurality of text paragraphs, the text paragraphs can be marked with time intervals according to the time intervals between the audio paragraphs of the corresponding adjacent text paragraphs in the target audio information, and the time intervals are used as the pause time between the voice contents; for example: a corresponding time interval may be recorded at the end of the first-ranked one of the adjacent text passages to provide a reminder of the time interval between adjacent text passages to the user, so that the user may adjust the speech rate, breathing, etc. when recording the first multimedia data. In addition, the mood information, the intonation information, the speech rate information, and the like may also be identified in the text information, which is not limited in the embodiments of the present invention.

Alternatively, the step of acquiring the target text information from the second multimedia data may be acquiring the target text information from the second multimedia data after selecting the second multimedia data through a "performance prompter" button in the first multimedia data recording interface, or acquiring the target text information from the second multimedia data in advance before recording the first multimedia data.

Optionally, after the target text information is acquired from the second multimedia data, an editing input of the target text information by a user may be further received, and the text content of the target text information is updated in response to the editing input; thus, the updated target text information can be displayed subsequently when the target text information is displayed.

For example: when a user wants to modify the text content of the target text information, the user can edit and input the target text information by adding content, deleting content, adjusting the sequence between text segment information and the like, so that the text content of the target text information is updated.

Optionally, the step of recording the first multimedia data and displaying the target text information on the recording interface of the first multimedia data may specifically include:

and sequentially switching and displaying the plurality of text segment information on the recording interface according to the preset sequence of the plurality of text segment information in the target text information and the display duration corresponding to each text information.

Optionally, in the recording process of the first multimedia data, on the basis of displaying the target text information to provide the user speech prompting function, by sequentially switching and displaying the plurality of text segment information in the target text information on the recording interface, it is further possible to avoid that the video preview effect is affected due to occupation of more display areas in the recording interface by too much text segment information.

The display duration corresponding to each text message may be determined according to the number of characters of each text segment message, the time interval between the audio segment corresponding to each text segment message and the text segment message after the time sequence in the target audio information in the second multimedia data, and the like. Therefore, by directly switching and displaying the plurality of text segment information in the recording interface in sequence with the preset display duration according to the preset sequence of the plurality of text segment information in the target text information, the processing procedures of detecting the voice content of the input audio information and matching the text segment information in the recording process can be reduced, and the data processing capacity of the electronic equipment can be reduced to a certain extent.

Optionally, the recording the first multimedia data and displaying the target text information on a recording interface of the first multimedia data includes:

displaying first text information in a first display mode on the recording interface within a first recording time period of the first multimedia data; wherein the first text information includes: at least one sentence of text information contained in the target text information; the target text information comprises the first text information, the first text information comprises N characters, and N is a positive integer.

Optionally, the first display mode includes but is not limited to: font style, font color, font size, etc. of the first text information.

Optionally, the first text information may be a sentence of text information in the target text information, for example, the first text information may be a first sentence of text information matched with the voice content entered in the first recording time period, so that by displaying only the text information matched with the first recording time period on the recording interface, the text information with excessive content may be prevented from occupying a larger display area, and the text information matched with the first recording time period may also be emphasized.

Optionally, the first text information may be the first sentence text information, and a second sentence text information adjacent to and preceding to the timing of the first text information; alternatively, the first text information may be first sentence text information, and third sentence text information adjacent to and subsequent in timing to the timing of the first text information; or, the first text information may be the first sentence text information, the second sentence text information, and the third sentence text information, so that by displaying the first sentence text information matched with the voice content entered in the first recording time period, and the second sentence text information before the time sequence of the first sentence text information and the third sentence text information after the time sequence of the first sentence text information, a reminding effect is ensured for the first sentence text information while a reminding effect is ensured for the second sentence text information and the third sentence text information, and a user can establish a logical association by combining the first sentence text information and its previous and subsequent contents.

Specifically, the target text information may include a plurality of text messages, as shown in fig. 5, the target text information includes 3 text messages, which are "today's weather is well suited for stepping on, and" we go to the park bar together "and" good ", respectively; if a text message matched with the voice content recorded in the first recording time period is 'we go to the park bar together', only the text message can be displayed on the current recording interface; or the text message before the time sequence of the text message can be displayed at the same time of displaying the text message, namely the text message 'weather is very good for stepping on today', or the text message 'good bar' after the time sequence of the text message can be displayed at the same time of displaying the text message; or it is also possible to display the text message "weather today is well suited for stepping" before its timing and the text message "good bar" after its timing, simultaneously with the display of the text message.

Specifically, as an implementation: taking the first text message as the first sentence text message matched with the voice content recorded in the first recording time period as an example, as shown in fig. 6, if the first sentence text message 61 is displayed as "today is very good for green running" on the recording interface in the first recording time period; as shown in fig. 7, in a second recording period after the first recording period, a second sentence of text information 71 is displayed as "we go to the park bar together" on the recording interface; as shown in fig. 8, in a third recording implementation segment after the second recording time period, a third sentence of text information 81 is displayed as "good bar" on the recording interface; therefore, the multi-sentence text information of the target text information is gradually displayed according to the lapse of the recording time period, and under the condition that too much text information occupies more display areas, the text information corresponding to the voice information recorded in different recording time periods can be emphasized, so that the targeted reminding function of the corresponding text information is realized.

Specifically, as another implementation: the voice content entered in the first recording time period is matched with the first sentence of text information, the first sentence of text information 91 is displayed in the recording interface as ' today's weather is well suitable for stepping on green ', and the second sentence of text information 92 behind the first sentence of text information is ' we go to the park bar together '; the first sentence text information 91 and the second sentence text information 92 are displayed in different manners, for example, the first sentence text information 91 is displayed in a manner of bold and enlarged characters, as shown in fig. 9, of course, the first sentence text information can be distinguished by different display manners, and the embodiment of the present invention is not limited thereto.

In a second recording time period after the first recording time period, the voice content recorded in the second recording time period is matched with the second sentence text information, and the second sentence text information 101 is displayed as ' we go to the park bar together ' on the recording interface, and the third sentence text information after the second sentence text information is ' B: good "; the second sentence text information 101 and the third sentence text information 102 are displayed in different manners, for example, the second sentence text information 101 is displayed in a manner of enlarging characters in a bold manner, as shown in fig. 10, and of course, the second sentence text information can be distinguished by different display manners, which is not limited in this embodiment of the present invention.

Optionally, after the step of displaying the first text information in the first display manner on the recording interface of the first multimedia data, the method further includes:

under the condition that the voice content of the first audio information acquired by an audio acquisition component of the electronic equipment comprises the ith character of the first text information, updating the first display mode corresponding to the ith character into a second display mode;

under the condition that the first display modes corresponding to the N characters of the first text information are all updated to the second display mode, updating the first text information to second text information;

wherein i is a positive integer, i is less than or equal to N; the second display mode is used for indicating that the voice content of the first audio information is matched with the ith character; the target text information comprises the second text information, a second time sequence corresponding to the second text information in the second multimedia data is adjacent to a first time sequence corresponding to the first text information, and the second time sequence is after the first time sequence.

When the second multimedia file is video data or audio data, the time sequence of the text information may be the sequence of the occurrence time of the text information corresponding to the audio information in the second multimedia data; when the second multimedia file is text data, the time sequence of the text information may be the order of the content of the text data.

Optionally, in the process of recording the first multimedia data, the acquired audio information is identified in real time to obtain corresponding voice content, so as to update the display mode of the characters in the first text information; the updating of the display modes, that is, the updating of the first display mode to the second display mode, may include, but is not limited to: at least one of font style, font color, font size, etc. is updated.

For example: displaying the first sentence of text information as "today's weather is well suited for being turned green" in the first recording period of time as shown in fig. 11; if the first audio information entered by the user in the first recording time period includes "present" in the voice content, the display manner of the character "present" in the first sentence of text information 1101 is switched from the first display manner to the second display manner, for example, the font of the character "present" in fig. 11 is enlarged and enlarged, but the embodiment of the present invention may also adjust the color, the style, and the like of the font, and the embodiment of the present invention is not limited thereto.

Further, after the display mode of the character "today" is switched from the first display mode to the second display mode, and it is detected that the input first audio information includes the voice content "day", the display mode of the character "day" in the first sentence text information 1101 is switched from the first display mode to the second display mode, as shown in fig. 12; by analogy, the first audio information is identified and matched in real time, and then the display mode of the characters in the first sentence of text information is updated in real time so as to distinguish the text information corresponding to the unrecorded and recorded audio information.

Optionally, in a case that the audio information is recognized to include a character in the first text information, the display mode of the character may be switched from the first display mode to the second display mode, and the display mode of an unrecognized character in the audio information may be switched from the first display mode to the third display mode, for example, by adding an identifier to distinguish recorded text segment information from unrecorded text segment information, such as: the display area corresponding to the unrecognized character in the audio information is specially marked, for example, in fig. 6, a dot is made below the unrecognized character "today" for the unrecognized character to prompt the unrecognized character, so as to play a role of highlighting the reminder, but the identifier may be in other forms, such as drawing a line, circling the unrecognized character, and the like, which is not limited in the embodiment of the present invention.

Furthermore, when the first display modes corresponding to the N characters of the first text message are all updated to the second display mode, that is, when it is determined that the recording of the voice content corresponding to the current first text message is completed, the first text message is updated and displayed as the second text message. Optionally, in the case that the audio information is not recognized to contain characters in the second text information, the second text information is displayed in the first display mode; when the audio information is identified to contain the characters in the second text information, the display mode of the characters is switched from the first display mode to the second display mode, and the specific identification, matching and updating modes are similar to those of the first text information, and are not described herein again.

Optionally, in the step of switching the display mode for displaying the characters, the number of the recognized characters at each time may be one, that is, the display mode of one character is updated for each character; or the number of the recognized characters may be multiple, for example, a phrase including multiple characters, that is, updating the display mode of a phrase, which is performed one by one, and the embodiment of the present invention is not limited thereto.

Optionally, in a first recording time period of the first multimedia data, after the step of displaying the first text information in the recording interface in the first display manner, the method may further include:

acquiring first audio information acquired in a first recording time period of the first multimedia data;

matching text information identified from the first audio information with first text information;

and if the text information identified from the first audio information is matched with the first text information, displaying second text information on the recording interface.

The target text information comprises the second text information, a second time sequence corresponding to the second text information in the second multimedia data is adjacent to a first time sequence corresponding to the first text information, and the second time sequence is after the first time sequence.

Optionally, if more than T characters in the text information identified from the first audio information match with characters in the first text information, it is determined that the text information identified from the first audio information matches with the first text information. For example: when a user inputs first audio information in a first recording time period, a situation that partial characters are omitted or partial characters are added may exist, and whether the voice content corresponding to the current first text information is recorded or not can be judged in a fuzzy matching mode aiming at the situation. Such as: the first text information corresponds to 10 characters, and if the number of the characters in the text information identified from the first audio information is more than 7, it can be determined that the text information identified from the first audio information is matched with the first text information, that is, the recording of the language content corresponding to the first text information is completed, so that the second text information after the time sequence of the first text information can be displayed, and the reminding of the second text information is further performed.

Optionally, after the step of displaying the first text information in the first display manner on the recording interface of the first multimedia data, the method may further include:

under the condition that second audio information acquired by an audio acquisition component of the electronic equipment comprises T characters in the N characters, outputting prompt information, wherein the prompt information is used for prompting that the content of the second audio information is not matched with the first text information;

under the condition that second audio information acquired by an audio acquisition component of the electronic equipment comprises 0 character in the N characters, displaying third text information on a recording interface of the first multimedia data; the third text information is obtained by converting the second audio information into a text; n is more than 1, and T is less than N.

For example: the first text information corresponds to 10 characters, and if the text information identified from the second audio information contains 10 characters, but the number of the characters identical to that in the first text information is 3, it can be determined that the text information identified from the second audio information does not match the first text information, and prompt information is output.

Optionally, the prompt message may be a prompt icon displayed on a recording interface, where the prompt icon is used to indicate that the content of the second audio message does not match the first text message; or the prompt message may also be a prompt word, such as directly displaying the content of the second audio message and the content of the word in which the first text message does not match, or directly prompting which characters in the text message do not match, etc., so as to remind the user that the wrong speech-line may be spoken in the recording process.

Optionally, the prompt message may also be a voice prompt message, for example, a voice content indicating that the content of the second audio information and the first text information are not matched is directly broadcasted, or several unmatched characters in the text information are directly broadcasted, so as to prompt a user to possibly speak a wrong station word in the recording process; or other predetermined sound, blinking of an indicator light, and the like, and the embodiments of the present invention are not limited thereto.

For example: when a user inputs second audio information in a first recording time period, the content of part of text information may be omitted, or the content of part of text information may be selectively skipped, and in this case, text information after the text information corresponding to the content of the current audio information may be prompted according to the content of the currently input audio information; if the content of the second sentence of text information is skipped and the content of the third sentence of text information is recognized after the voice content corresponding to the first sentence of text information is recorded, the third sentence of text information can be directly displayed; therefore, the normal switching display of the text information can be ensured under the condition that the user possibly omits part of the content of the text information or selectively skips part of the content of the text information.

For example: the user may add a part of content when recording the first audio information, and when the second audio information acquired by the audio acquisition component of the electronic device includes 0 character of the N characters, that is, it is determined that the content of the acquired second audio information is the content added by the user, the third text information identified from the second audio information may be directly displayed on the recording interface, so as to remind the user of the newly added content. As shown in fig. 13, in the case where the first text message 131 is "weather is good for stepping on today", the second text message 132 is "we go to the park bar together", and the newly added third text message 133 is "many people in the park".

Optionally, a display mode of the third text information may be set to be different from the first display mode and the second display mode, so that the third text information can be highlighted and reminded to be newly added content.

Optionally, in a case that second audio information acquired by an audio acquisition component of the electronic device includes 0 character of the N characters, third text information is inserted into the target text information, where the third text information is obtained by converting the second audio information into a text.

Specifically, the third text information may be inserted into the corresponding time sequence position in the target text information, and if the time sequence of the audio information corresponding to the third text information is after the time sequence corresponding to the first text information and the audio information is adjacent to the first text information, the third text information may be inserted into the position, located after the time sequence of the first text information, in the target text information. Therefore, in the recording process, the content of the target text information can be updated according to the content of the audio information input by the user, and the text information corresponding to the newly added voice content of the user is automatically inserted, so that when the multimedia data with the same voice content is recorded next time, the user can be prompted by the video lines based on the updated target text information, the target text information does not need to be manually updated by the user, and the operation of manually updating the target text information by the user is reduced when the multimedia data is recorded for multiple times.

Another example is: a user may add partial content when recording first audio information, if second audio information acquired by an audio acquisition component of the electronic device includes 0 character of the N characters, that is, the content of currently-entered audio information is inconsistent with the first text information, the display of the first text information may be maintained, and the first text information is updated and displayed as the second text information only when it is detected that text information corresponding to the audio information input by the user is matched with the second text information; wherein the target text information includes the second text information.

For example: when the target text information includes the first text information, the second text information and the third text information in sequence, and the first text information is displayed currently, the content of the audio information input by the current user may be fuzzy-matched with the first text information, and if the matching degree meets a preset condition, that is, the two are matched, the second text information may be displayed, wherein the specific matching mode may refer to the above-mentioned embodiment, and is not described herein again; if the matching degree does not meet the preset condition, namely the matching degree does not meet the preset condition, the second text information is judged to be matched with the second text information in sequence, if the second text information is matched with the second text information, the third text information is displayed, if the second text information is not matched with the third text information, the fourth text information is prompted, if the second text information is not matched with the third text information, the fourth text information is judged to be matched with the fourth text information in sequence, and the like. If the target text information does not have text information matched with the content of the audio information input by the current user, prompt information can be output and the first text information is still continuously displayed; optionally, outputting the prompt message may be as described in the above embodiments, for example, the first prompt message may be a voice message "you may say" wrong, please say, the following lines ".

Optionally, the next text information to be displayed is determined by attention matching of the content of the currently acquired audio information and the multiple-sentence text information in the target text information, so that switching display of the multiple-sentence text information can be ensured under the condition that a user may skip a certain passage of a phrase, and higher flexibility in displaying the text information is ensured.

Optionally, the step of recording the first multimedia data may further specifically include:

and in the process of recording the first multimedia data, playing first voice information corresponding to the first text information under the condition that the audio information of the target user is not acquired within a preset time period.

Optionally, when the user records the first multimedia data, the position of the user relative to the electronic device may move, for example, move to a position back to the electronic device, or play the first voice message corresponding to the first text message in a manner that the first voice message is played when the recording preview interface of the electronic device is not seen, so as to provide a user with a reminder function.

Optionally, when the first voice message is played, if it is detected that the gaze position of the eyes of the target user is on the electronic device, the playing of the first voice message may be stopped, and the display of the first text message may be continued. Therefore, the first voice information corresponding to the first text information is played, so that the effect of reminding the user under the condition that the user is not convenient to view the screen can be ensured, and the operation of viewing the screen by the user can be reduced.

Optionally, the step of displaying the target text information on the recording interface of the first multimedia data may specifically include:

acquiring a watching area of eyes of a target user on the electronic equipment, wherein the target user is a user corresponding to voice information in audio information acquired by the electronic equipment;

and displaying the target text information in a target area of the recording interface, wherein the target area is an area corresponding to the watching area.

Optionally, in the recording process, facial features of a target user can be collected through a camera to identify and determine a gazing area of eyes of the target user on the electronic device; such as: in the audio recording process, the eye characteristics of a target user can be collected by starting a camera, and then the watching area of the eyes of the target user on the electronic equipment is determined according to the collected eye characteristics; another example is: during video recording, a gaze area of a target user's eyes on the electronic device may be identified by facial features of the target user in a video preview interface.

For example: when the user records the first multimedia data, the position of the user relative to the electronic device may move, as shown in fig. 14, and when the gaze direction of the target user is focused at the gaze position on the electronic device, corresponding to the target display position 141 in the recording preview interface, the target text information 42 may be switched and displayed at the target display position 141, as shown in fig. 15. Therefore, when the user records the first multimedia data, the display position of the target text information can move along with the watching area of the eyes of the target user on the electronic equipment, and the user can check the displayed target text information conveniently.

Optionally, the step of recording the first multimedia data and displaying the target text information on the recording interface of the first multimedia data may further specifically include:

receiving a second input of the target text information from a user in the process of recording the first multimedia data;

updating display information of the target text information in response to the second input; wherein the display information comprises at least one of: display content, display parameters.

Alternatively, in the case of displaying the target text information, the content of the target text information, the display parameter, and the like may be edited in response to an input by the user, for example, the content of the editing target text information may be adding text content, deleting text content, modifying text content, and the like; the display parameter of the editing target text information may be modifying font style, modifying font size, modifying font color, modifying font background color, etc. Therefore, in the display process of the target text information, the user can directly edit the display information of the target text information through the second input aiming at the target text information so as to ensure that the editing mode is more convenient and faster.

As shown in fig. 16, an embodiment of the present invention further provides a multimedia data recording apparatus 1600 applied to an electronic device, where the apparatus 1600 includes:

a receiving module 1610 configured to receive a first input of a user;

a response module 1620, configured to record first multimedia data in response to the first input, and display target text information on a recording interface of the first multimedia data;

Optionally, the response module 1620 comprises:

the first display sub-module is used for displaying first text information in a first display mode on the recording interface within a first recording time period of the first multimedia data;

wherein the first text information includes: at least one sentence of text information contained in the target text information; the target text information comprises the first text information, the first text information comprises N characters, and N is a positive integer.

Optionally, the response module 1620 further comprises:

the first updating sub-module is used for updating the first display mode corresponding to the ith character to a second display mode under the condition that the voice content of the first audio information acquired by the audio acquisition component of the electronic equipment comprises the ith character of the first text information;

the second updating submodule is used for updating the first text information into second text information under the condition that the first display modes corresponding to the N characters of the first text information are all updated into the second display modes;

Optionally, the response module 1620 further comprises:

the output sub-module is used for outputting prompt information under the condition that second audio information acquired by an audio acquisition component of the electronic equipment comprises T characters in the N characters, wherein the prompt information is used for prompting that the content of the second audio information is not matched with the first text information;

the second display sub-module is used for displaying third text information on a recording interface of the first multimedia data under the condition that second audio information acquired by an audio acquisition component of the electronic equipment comprises 0 character of the N characters;

the third text information is obtained by converting the second audio information into a text; n is more than 1, and T is less than N.

Optionally, the response module 1620 further comprises:

and the inserting sub-module is used for converting the second audio information into third text information obtained by a text under the condition that the second audio information acquired by the audio acquisition assembly of the electronic equipment comprises 0 character of the N characters, and inserting the third text information into the target text information.

Optionally, the response module 1620 comprises:

and the playing sub-module is used for playing the first voice information corresponding to the first text information under the condition that the audio information of the target user is not acquired within a preset time period in the process of recording the first multimedia data.

Optionally, the response module 1620 comprises:

the acquisition submodule is used for acquiring a watching area of eyes of a target user on the electronic equipment, wherein the target user is a user corresponding to voice information in the audio information acquired by the electronic equipment;

and the third display sub-module is used for displaying the target text information in a target area of the recording interface, wherein the target area is an area corresponding to the watching area.

Optionally, the response module 1620 comprises:

the receiving submodule is used for receiving second input of the target text information from a user in the process of recording the first multimedia data;

the response submodule is used for responding to the second input and updating the display information of the target text information;

wherein the display information comprises at least one of: display content, display parameters.

The multimedia data recording device provided by the embodiment of the invention can realize each process of the multimedia data recording method embodiment, and is not repeated here to avoid repetition.

The device 1600 in the embodiment of the present invention, by receiving a first input of a user, and in response to the first input, records first multimedia data including audio information of a target user, and displays target text information obtained from second multimedia data on a recording interface of the first multimedia data, so that in a process of recording the first multimedia data including the audio information of the target user, a reminding effect of the target text information is provided for the user by displaying the target text information obtained from the second media data on the recording interface in advance, so as to prevent the user from recording the multimedia data, such as: when audio data or video data are recorded, situations such as word forgetting and the like can occur, repeated and frequent recording is needed, a complicated process that a user can record target text information manually is avoided, and the target text information obtained from the second multimedia data is displayed on a recording interface of the first multimedia data, so that prompt in the recording process of the user is realized, and the success rate of recording the first multimedia data is improved.

Fig. 17 is a schematic diagram of a hardware structure of an electronic device for implementing various embodiments of the present invention.

The electronic device 1700 includes, but is not limited to: radio frequency unit 1701, network module 1702, audio output unit 1703, input unit 1704, sensor 1705, display unit 1706, user input unit 1707, interface unit 1708, memory 1709, processor 1710, and power supply 1711. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 17 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

A user input unit 1707, configured to receive a first input by a user;

the processor 1710, in response to the first input, records first multimedia data and displays second multimedia data on a recording interface of the first multimedia data. Wherein the first multimedia data includes audio information of a target user, and the target text information is obtained from the second multimedia data.

The electronic device provided by the embodiment of the invention can realize the processes of the multimedia data recording method embodiment, and is not described again to avoid repetition.

In the electronic device 1700 in the embodiment of the present invention, by receiving a first input of a user, and in response to the first input, recording first multimedia data including audio information of a target user, and displaying target text information obtained from second multimedia data on a recording interface of the first multimedia data, in a process of recording the first multimedia data including the audio information of the target user, a reminding effect of the target text information is provided for the user by displaying the target text information obtained in advance from the second multimedia data on the recording interface, so as to prevent the user from recording the multimedia data, where: when audio data or video data are recorded, situations such as word forgetting and the like can occur, repeated and frequent recording is needed, a complicated process that a user can record target text information manually is avoided, and the target text information obtained from the second multimedia data is displayed on a recording interface of the first multimedia data, so that prompt in the recording process of the user is realized, and the success rate of recording the first multimedia data is improved.

It should be understood that, in the embodiment of the present invention, the rf unit 1701 may be configured to receive and transmit signals during a message transmission or a call, and specifically, receive downlink data from a base station and then process the received downlink data to the processor 1710; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 1701 includes, but is not limited to, an antenna, at least one amplifier, transceiver, coupler, low noise amplifier, duplexer, and the like. The radio frequency unit 1701 may also communicate with a network and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 1702, such as to assist the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.

The audio output unit 1703 may convert audio data received by the radio frequency unit 1701 or the network module 1702 or stored in the memory 1709 into an audio signal and output as sound. Also, the audio output unit 1703 may provide audio output related to a specific function performed by the electronic apparatus 1700 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 1703 includes a speaker, a buzzer, a receiver, and the like.

Input unit 1704 is used to receive audio or video signals. The input Unit 1704 may include a Graphics Processing Unit (GPU) 17041 and a microphone 17042, the Graphics processor 17041 Processing image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capture mode or an image capture mode. The processed image frames may be displayed on the display unit 1706. The image frames processed by the graphics processor 17041 may be stored in the memory 1709 (or other storage medium) or transmitted via the radio frequency unit 1701 or the network module 1702. The microphone 17042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 1701 in the case of the phone call mode.

The electronic device 1700 also includes at least one sensor 1705, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor includes an ambient light sensor that adjusts the brightness of the display panel 17061 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 17061 and/or the backlight when the electronic device 1700 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 1705 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which will not be described herein.

The display unit 1706 is used to display information input by the user or information provided to the user. The Display unit 1706 may include a Display panel 17061, and the Display panel 17061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 1707 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 1707 includes a touch panel 17071 and other input devices 17072. Touch panel 17071, also referred to as a touch screen, can collect touch operations by a user on or near it (e.g., operations by a user on touch panel 17071 or near touch panel 17071 using a finger, stylus, or any other suitable object or attachment). The touch panel 17071 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1710, and receives and executes commands sent by the processor 1710. In addition, the touch panel 17071 can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to touch panel 17071, user input unit 1707 may include other input devices 17072. In particular, the other input devices 17072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein.

Further, the touch panel 17071 can be overlaid on the display panel 17061, and when the touch panel 17071 detects a touch operation on or near the touch panel, the touch operation is transmitted to the processor 1710 to determine the type of the touch event, and then the processor 1710 provides a corresponding visual output on the display panel 17061 according to the type of the touch event. Although the touch panel 17071 and the display panel 17061 are shown in fig. 17 as two separate components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 17071 may be integrated with the display panel 17061 to implement the input and output functions of the electronic device, and is not limited herein.

The interface unit 1708 is an interface for connecting an external device to the electronic apparatus 1700. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 1708 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 1700 or may be used to transmit data between the electronic apparatus 1700 and the external device.

The memory 1709 may be used to store software programs as well as various data. The memory 1709 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 1709 may include high speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid state storage device.

The processor 1710 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 1709 and calling data stored in the memory 1709, thereby integrally monitoring the electronic device. Processor 1710 may include one or more processing units; preferably, the processor 1710 can integrate an application processor, which primarily handles operating systems, user interfaces, application programs, etc., and a modem processor, which primarily handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 1710.

The electronic device 1700 may further include a power source 1711 (e.g., a battery) for powering the various components, and preferably, the power source 1711 may be logically coupled to the processor 1710 via a power management system to manage charging, discharging, and power consumption via the power management system.

In addition, the electronic device 1700 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor 1710, a memory 1709, and a computer program that is stored in the memory 1709 and can be run on the processor 1710, and when being executed by the processor 1710, the computer program implements each process of the foregoing multimedia data recording method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the foregoing multimedia data recording method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A multimedia data recording method is applied to electronic equipment, and is characterized by comprising the following steps:

receiving a first input of a user;

wherein the first multimedia data includes audio information of a target user, and the target text information is obtained from second multimedia data;

the recording of the first multimedia data and the display of the target text information on the recording interface of the first multimedia data comprise:

displaying first text information in a first display mode on the recording interface within a first recording time period of the first multimedia data;

wherein the first text information includes: at least one sentence of text information contained in the target text information; the target text information comprises the first text information, the first text information comprises N characters, and N is a positive integer;

after the first text information is displayed in a first display mode on the recording interface of the first multimedia data, the method further comprises the following steps:

and under the condition that second audio information acquired by an audio acquisition component of the electronic equipment comprises 0 character of the N characters, third text information is inserted into the target text information, and the third text information is obtained by converting the second audio information into a text.

2. The method for recording multimedia data according to claim 1, wherein after displaying the first text message in the first display manner on the recording interface of the first multimedia data, the method further comprises:

3. The method for recording multimedia data according to claim 1, wherein after displaying the first text message in the first display manner on the recording interface of the first multimedia data, the method further comprises:

under the condition that second audio information acquired by an audio acquisition component of the electronic equipment comprises 0 character in the N characters, displaying third text information on a recording interface of the first multimedia data;

4. The method of claim 1, wherein the recording the first multimedia data comprises:

5. The method as claimed in claim 1, wherein displaying the target text information on the recording interface of the first multimedia data comprises:

6. The method for recording multimedia data according to claim 1, wherein the recording of the first multimedia data and the displaying of the target text information on the recording interface of the first multimedia data comprises:

updating display information of the target text information in response to the second input;

7. The method for recording multimedia data according to any of claims 1 to 6, wherein the target text information comprises: textual information indicative of the speech content, and at least one of: character identification information and pronunciation identification information.

8. A multimedia data recording apparatus for use in an electronic device, the apparatus comprising:

the receiving module is used for receiving a first input of a user;

the response module comprises:

the inserting sub-module is used for inserting third text information into the target text information under the condition that second audio information acquired by an audio acquisition assembly of the electronic equipment comprises 0 character of the N characters, wherein the third text information is obtained by converting the second audio information into a text.

9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of recording multimedia data according to any one of claims 1 to 7.