WO2022183814A1

WO2022183814A1 - Voice annotation and use method and device for image, electronic device, and storage medium

Info

Publication number: WO2022183814A1
Application number: PCT/CN2021/140547
Authority: WO
Inventors: 彭映; 刘昱玥
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-03-03
Filing date: 2021-12-22
Publication date: 2022-09-09
Also published as: CN115101057A

Abstract

A voice annotation and use method for an image, a voice annotation and use device (10) for an image, an electronic device (100), and a non-volatile computer readable storage medium (201). The voice annotation and use method for an image comprises: acquiring an image to be annotated (01); generating an annotated image according to input voice information and the image to be annotated, the annotated image comprising a voice annotation tag, and the voice annotation tag being displayed in the image to be annotated (02); associating the voice annotation tag with the voice information (03); and storing the annotated image and the voice information (04). In the voice annotation and use method for an image, the voice annotation of the image to be annotated is achieved by inputting the voice information, thereby improving the image annotation efficiency.

Description

Image voice annotation and using method and device, electronic device and storage medium

priority information

This application claims the priority and rights and interests of the patent application with the patent application number 202110235765.3 submitted to the State Intellectual Property Office of China on March 03, 2021, and the full text of which is incorporated here through the exhibition.

technical field

The present application relates to the technical field of image processing, and more particularly, to a voice annotation of an image and a method of using it, a device for voice annotation and use of an image, an electronic device, and a non-volatile computer-readable storage medium.

Background technique

With the development of technology, electronic devices such as mobile phones, tablet computers, and computers have become tools for people to obtain information from the outside world. When some important information needs to be preserved, it is often saved by means of images, and the images are marked with information. In order to facilitate the quick access to important information of the image when viewing the image again. However, the current image labeling method can only be performed through text, brushes, etc., and the labeling efficiency is low.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a voice annotation of an image and a method of using it, a device for voice annotation and use of an image, an electronic device, and a non-volatile computer-readable storage medium.

The voice annotation of images in the embodiments of the present application and the method for using them include: acquiring an image to be annotated; generating an annotated image according to input voice information and the to-be-annotated image, where the annotated image includes a voice annotation label, and the voice annotation label displaying in the image to be marked; associating the voice marking label and the voice information; and saving the marked image and the voice information.

The apparatus for voice annotation and use of images according to the embodiment of the present application includes: an acquisition module, a generation module, an association module, and a storage module. The acquisition module is used to acquire the image to be marked; the generation module is used to generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed on the to-be-marked image. The association module is used for associating the voice annotation label and the voice information; and the storage module is used to save the marked image and the voice information.

The electronic device of the embodiment of the present application includes: one or more processors and a memory. One or more of the processors are used to obtain an image to be marked; generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed on the to-be-marked image. annotating the image; and associating the voice annotation label and the voice information. The memory is used for saving the marked image and the voice information.

The non-volatile computer-readable storage medium of the embodiment of the present application contains a computer program. When the computer program is executed by one or more processors, it enables the processor to implement the following image annotation and usage methods: acquiring an image to be annotated; generating an annotated image according to the input voice information and the image to be annotated, the The annotated image includes a voice annotation label, which is displayed in the to-be-annotated image; associates the voice annotation label and the voice information; and saves the annotated image and the voice information.

Additional aspects and advantages of the present application will be set forth, in part, from the following description, and in part will become apparent from the following description, or may be learned by practice of the present application.

Description of drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

1 is a schematic flowchart of a voice annotation of an image and a method of using it according to some embodiments of the present application;

2 is a schematic diagram of performing voice annotation on images to be marked for voice annotation of images and methods of use according to some embodiments of the present application;

3 is a schematic diagram of a marked image of the voice annotation of an image and a method of using it according to some embodiments of the present application;

4 is a schematic structural diagram of a device for voice annotation and use of images according to some embodiments of the present application;

5 is a schematic structural diagram of an electronic device according to some embodiments of the present application;

6 to 14 are schematic flowcharts of voice annotation of images and methods of using them according to some embodiments of the present application;

15 is a schematic diagram of the use of voice annotation of an image and a method of using the image according to some embodiments of the present application to the voice annotation of the marked image beyond the display area;

FIG. 16 is a schematic diagram of the connection between the non-volatile computer-readable storage medium and the processor according to some embodiments of the present application.

Detailed ways

The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.

The following disclosure provides many different embodiments or examples for implementing different structures of the present application. To simplify the disclosure of the present application, the components and arrangements of specific examples are described below. Of course, they are only examples and are not intended to limit the application. Furthermore, this application may repeat reference numerals and/or letters in different instances. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed. In addition, this application provides examples of various specific processes and materials, but one of ordinary skill in the art will recognize the applicability of other processes and/or the use of other materials.

The embodiments of the present application provide a voice annotation of an image and a method of using the image. The voice annotation of the image and the method of using the image include: acquiring an image to be marked; generating a marked image according to the input voice information and the image to be marked, and the marked image includes the voice annotation tag, the voice tagging tag is displayed in the image to be tagged; associates the voice tagging tag and voice information; and saves the tagged image and voice information.

In some embodiments, generating a marked image according to the input voice information and the image to be marked includes: generating a voice marking label according to the voice information; and displaying the voice marking label in the to-be-marked image to generate the marked image.

In some embodiments, generating a marked image according to the input voice information and the image to be marked includes: generating a voice marking label according to the voice information; displaying the voice marking label in the image to be marked; and processing the voice marking label to Generating annotated images, the processing includes at least one of playing, deleting, and dragging.

In some embodiments, saving the marked image and voice information includes: saving the marked image and voice information as a video file.

In some embodiments, the voice annotation of the image and the method for using the image further include: playing the marked image and voice information.

In some embodiments, there are multiple voice annotation tags, and the multiple voice annotation tags have a predetermined playing order. Playing the marked images and voice information includes: playing the voice information associated with the voice annotation tags in the playing order.

In some embodiments, saving the marked image and voice information includes: saving the marked image as a first format file; saving the voice information as a second format file; and saving the first format file and the second format file separately save.

In some embodiments, the voice annotation of the image and the method of using the image further include: triggering the voice annotation tag to play the voice information associated with the voice annotation tag.

In some embodiments, the voice annotation of the image and the method for using it further include: playing the voice information associated with the voice annotation tag in the display area; after playing the voice information associated with the voice annotation tag in the display area, scrolling the display Annotating the image so that unplayed voice annotation tags enter the display area; and playing voice information associated with the voice annotation tags entering the display area.

Embodiments of the present application further provide a device for voice annotation and use of images, the device for voice annotation and use of images includes: an acquisition module, a generation module, an association module, and a storage module. The acquisition module is used to acquire the image to be labeled. The generating module is configured to generate a marked image according to the input voice information and the image to be marked, the marked image includes a voice marked label, and the voice marked label is displayed in the to-be-marked image. The association module is used to associate the voice annotation label and voice information. The storage module is used to save the marked images and voice information.

Embodiments of the present application further provide an electronic device, where the electronic device includes one or more processors and a memory. One or more processors are used to obtain the image to be marked; generate a marked image according to the input voice information and the image to be marked, the marked image includes a voice marked tag, and the voice marked tag is displayed in the image to be marked; and the associated voice marked tag and Voice information; the memory is used to save the marked images and voice information.

In some embodiments, the one or more processors are further configured to: generate a voice annotation label according to the voice information; and control the display of the voice annotation label in the image to be annotated, so as to generate an annotated image.

In some embodiments, the one or more processors are further configured to: generate a voice annotation tag according to the voice information; control the display of the voice annotation tag in the image to be annotated; and process the voice annotation tag to generate an annotated image, The processing includes at least one of playing, deleting, and dragging.

In some embodiments, the memory is also used to save the annotated image and voice information as a video file.

In some embodiments, the electronic device further includes a display and a speaker, the display is used for displaying the marked image, and the speaker is used for playing the voice information.

In some embodiments, there are multiple voice annotation tags, the multiple voice annotation tags have a predetermined playing order, and the speaker is further configured to play the voice information associated with the voice annotation tags according to the playing order.

In some embodiments, the memory is further used for: saving the marked image as a first format file; saving the voice information as a second format file; and saving the first format file and the second format file separately.

In some embodiments, the speaker is further configured to play voice information associated with the voice annotation tag according to the triggered voice annotation tag.

In some embodiments, the speaker is further used to: play the voice information associated with the voice annotation tags in the display area; after playing the voice information associated with the voice annotation tags in the display area, scroll and display the marked images to make The unplayed voice annotation tag enters the display area; and the voice information associated with the voice annotation tag entered into the display area is played.

Embodiments of the present application further provide a non-volatile computer-readable storage medium storing a computer program, when the computer program is executed by one or more processors, it implements any of the above-mentioned image annotation and usage methods.

Please refer to FIG. 1 to FIG. 3 , an embodiment of the present application provides a voice annotation of an image and a method of using the image, and the voice annotation of the image and the method of using include:

01: Obtain the image P1 to be marked;

02: generate a marked image P2 according to the input voice information and the image to be marked P1, the marked image P2 includes a voice marked label V, and the voice marked label V is displayed in the to-be-marked image P1;

03: Associate the voice annotation tag V with the voice information; and

04: Save the marked image P2 and voice information.

Referring to FIG. 4 , an embodiment of the present application provides an apparatus 10 for voice annotation and use of images. The apparatus 10 for voice annotation and use of images includes an acquisition module 11 , a generation module 12 , an association module 13 and a storage module 14 . The image voice annotation and use method according to the embodiment of the present application can be applied to the image voice annotation and use device 10, wherein the acquisition module 11, the generation module 12, the association module 13 and the storage module 14 are respectively used to execute 01, 02, Methods in 03 and 04. That is, the acquiring module 11 is used to acquire the image P1 to be marked; the generating module 12 is used to generate the marked image P2 according to the input voice information and the image P1 to be marked, and the marked image P2 includes the voice annotation label V, which is displayed on the In the image P1 to be marked; the association module 13 is used for associating the voice marking label V and the voice information; and the storage module 14 is used for saving the marked image P2 and the voice information.

Please refer to FIG. 5 , an embodiment of the present application provides an electronic device 100 . The electronic device 100 includes one or more processors 30 and a memory 50 . The voice annotation of images and the method for using the image in this embodiment can be applied to the electronic device 100, wherein one or more processors 30 are used for executing the methods in 01, 02 and 03, and the memory 50 is used for executing the method in 04 . That is, one or more processors 30 are used to: acquire the image P1 to be marked; generate the marked image P2 according to the input voice information and the image P1 to be marked, and the marked image P2 includes the voice marking label V, and the voice marking label V is displayed on the In the image P1 to be marked; associate the voice marking label V and the voice information. The memory 50 is used to store the marked image P2 and voice information.

With the development of electronic devices such as mobile phones, tablet computers, and computers, these devices have gradually become important tools for people to obtain information from the outside world. For example, to explain or understand the text symbol information in the image, to mark the date when the image was taken, to correct the information of the image, etc., so that the important information in the image can be quickly obtained when viewing the image again. However, the current image labeling method can only be performed through text, brushes, etc., and the labeling efficiency is low. The voice annotation of an image and its use method of the present application realize the voice annotation of the image P1 to be marked by inputting voice information, which improves the efficiency of image annotation compared with the traditional annotation methods such as text and brushes.

Please refer to FIG. 4 and FIG. 5 , specifically, the electronic device 100 may be a terminal device such as a mobile phone, a notebook computer, a smart watch, a computer, etc. The device 10 for voice annotation and use of images may be an application program installed in the electronic device 100 , for example , application programs such as screenshots, photo albums, etc.; it can also be a certain functional module in some application programs, such as an image editing function; this application only takes the electronic device 100 as a mobile phone as an example for description, and when the electronic device 100 is other types of terminals The situation is similar to that of mobile phones, so we will not explain in detail.

In one embodiment, the acquisition module 11 or one or more processors 30 may acquire an image to be labeled P1 by capturing an image as the to-be-labeled image P1. In another embodiment, the acquiring module 11 or one or more processors 30 acquires the image P1 to be tagged, and may acquire an image from the album in the electronic device 100 as the image P1 to be tagged. In yet another embodiment, the acquisition module 11 or one or more processors 30 may acquire an image to be marked P1 by taking a screenshot of the electronic device 100 to acquire an image as the to-be-marked image P1. Of course, the acquisition module 11 or one or more processors 30 may acquire the image P1 to be marked in other ways, which are not limited here.

Please refer to FIG. 2 , after entering the voice annotation interface of the image P1 to be marked, long press the recording label L to record to input voice information, and when you release the hand, the recording ends and the input of the voice information is completed. The generating module 12 or one or more processors 30 generates a marked image P2 according to the input voice information and the acquired image P1 to be marked, and the marked image P2 includes a voice marking label V, and the voice marking label V is displayed on the image P1 to be marked Specifically, the initial display position of the voice annotation label V may be the bottom position of the image P1 to be annotated. The association module 13 or one or more processors 30 associates the input voice information with the voice annotation tag V, wherein the user can input the voice information one or more times through the recording tag L, and each time the input voice information is associated with a voice annotation tag V , in this way, the marked image P2 may include multiple voice marking labels V, so as to realize the multi-voice marking function of the image P1 to be marked. After the voice annotation is completed, the storage module 14 or the memory 50 saves the marked image P2 and the marked voice information, so that when viewing the marked image P2 again, the voice information in the marked image P2 can be listened to. In the voice annotation of an image and the method for using it of the present application, the information annotation of the image P1 to be annotated is realized by inputting the voice information, which improves the efficiency of image annotation compared with the annotation methods such as text and brushes.

In the embodiment of the present application, the apparatus 10 for voice annotation and use of images may also implement text and brush annotation on the image P1 to be annotated. For example, when performing the text annotation function, the user can input voice information by recording, and the generation module 12 or one or more processors 30 convert the input voice information into text information and display it in the image to be marked P1, so as to generate a Annotate image P2. Alternatively, the user directly inputs text information to implement text labeling of the image P1 to be labelled. For another example, when performing the brush marking function, the user can input voice information by recording, and the generation module 12 or one or more processors 30 converts the input voice information into picture information and displays it in the image to be marked P1 to generate a picture. Image P2 is annotated. Alternatively, the user directly inputs drawing information (drawing in the image P1 to be annotated) to realize the brush annotation of the image P1 to be annotated. That is, the apparatus 10 for voice annotation and use of images according to the embodiment of the present application can realize not only the voice annotation function of the image P1 to be annotated, but also the text and brush annotation functions of the image P1 to be annotated, the application scenarios are more diverse, and the user is provided with More callout options.

Please refer to FIG. 2 and FIG. 6 , in some embodiments, 02: Generate a marked image P2 according to the input voice information and the image P1 to be marked, including:

021: generate a voice annotation label based on the voice information; and

023: Display the voice annotation label V in the to-be-annotated image P1 to generate an annotated image P2.

Please refer to Fig. 4, the generation module 12 is also used to execute the methods in 021 and 023, that is, the generation module 12 is also used to generate a voice annotation label according to the voice information; Generate annotated image P2.

Please refer to FIG. 5, one or more processors 30 are also used to execute the methods in 021 and 023, that is, one or more processors 30 are also used to generate a voice annotation label according to the voice information; and control the image P1 to be annotated. The voice annotation label V is displayed in , to generate the annotated image P2.

In one embodiment, after obtaining the image P1 to be marked, the user records the voice information. Specifically, the user generates the corresponding voice according to the input voice information through the generation module 12 or one or more processors 30. Labeling label V, correspondingly, each time the user enters a piece of speech information, the generation module 12 or one or more processors 30 generates a speech labeling label corresponding to the input speech information, and at the same time, the generation module 12 or one or more processors 30 The corresponding voice annotation label V is controlled to be displayed in the to-be-annotated image P1 to generate an annotated image P2 to ensure that the user can quickly learn the voice information annotated in the annotated image P2 when viewing the annotated image P2 again. If the voice annotation tag V associated with the input voice information is not displayed in the marked image P2, the user cannot determine whether the relevant voice information has been successfully entered in the to-be-marked image P1 after inputting the voice information; or when the user checks the marked voice information again When the image P2 is used, if the voice annotation label V associated with the input voice information is not displayed in the marked image P2, the user cannot determine whether there is marked voice information in the marked image P2, so that the user needs to perform two steps on the marked image P1. The phenomenon of secondary voice annotation, the efficiency of image annotation is low. In the voice annotation of images of the present application and the method for using them, the generation module 12 or one or more processors 30 generate corresponding voice annotation labels V according to the input voice information, and control the voice annotation labels V to be displayed in the image to be labeled P1, Thereby, the marked image P2 is generated, which is convenient for the user to confirm whether the voice information is successfully inputted, and it is convenient for the user to quickly learn the voice information marked in the marked image P2 when viewing the marked image P2 again, thereby improving the efficiency of image marking.

Referring to FIG. 2 and FIG. 7, in some embodiments, step 023: displaying the voice annotation label V in the image to be annotated P1, to generate an annotated image P2, including:

0231: Display the voice annotation label V in the image P1 to be annotated; and

0233: Process the voice annotation tag V to generate a marked image P2, and the processing includes at least one of playing, deleting, and dragging.

Please refer to FIG. 4, the generation module 12 is also used to execute the methods in 0231 and 0233, that is, the generation module 12 is also used to: control the display of the voice annotation label V in the image P1 to be marked; and process the voice annotation label V, To generate the marked image P2, the processing includes at least one of playing, deleting, and dragging.

Please refer to FIG. 5 , one or more processors 30 are also used to execute the methods in 021 and 023, that is, one or more processors 30 are also used to: control the display of the voice annotation label V in the image P1 to be annotated; and The voice annotation tag V is processed to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging.

Further, the user performs recording by long pressing the recording label L. After letting go, the recording ends and the input of the voice information is completed. The voice annotation tag V. At this time, the user can at least one of play, delete, and drag the voice annotation tag V whose recording has ended. For example, the user plays the voice annotation tag V at the end of the recording. At this time, the voice annotation tag V displays an animation of the icon to play the associated voice information, which is convenient for the user to listen to the entered voice information and determine whether the entered voice information is accurate and whether the sound is accurate or not. For another example, the user deletes the voice annotation label V at the end of the recording, and clicks to select the pre-deleted voice annotation label V. At this time, a delete icon appears on the voice annotation label V, and the voice can be deleted by clicking the delete image. For another example, the user drags the voice annotation tag V at the end of the recording to display the voice annotation tag V in a suitable position in the image P1 to be marked. Specifically, the user can long press the voice annotation tag V to display drag. For example, if there is text information in the image P1 to be marked, when a certain line of text or word in the image to be marked P1 needs to be marked, after the voice information is input for the line of text or word, the voice associated with the voice information can be marked. Drag and drop the tag V near the line of text or word to generate a marked image P2. When the user views the marked image P2 again, he can quickly understand the relevant information marked by the voice information associated with the voice marking tag. Also for example, the user performs playback and drag processing on the voice annotation tag V at the end of the recording, or plays and deletes the voice annotation tag V at the end of the recording, or performs drag and drop processing on the voice annotation tag V at the end of the recording. Or play, drag, and delete the voice annotation tag V at the end of the recording, and the specific processing is performed according to the actual situation, which is not limited here.

Please refer to FIG. 2 and FIG. 8. In some embodiments, step 02: Generate an labeled image P2 according to the input voice information and the image P1 to be labeled, which may further include:

021: Generate a voice annotation label according to the voice information;

023: Display the voice annotation label V in the to-be-labeled image P1 to generate the labeled image P2; and

025: Process the voice annotation tag V to generate an annotated image P2, and the processing includes at least one of playing, deleting, and dragging.

Please refer to Fig. 4, the generation module 12 is also used to execute the method in 021, 023 and 025, that is, the generation module 12 is also used to generate a voice annotation label according to the voice information; Control to display the voice annotation label V in the image P1 to be marked; and processing the voice annotation tag V to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging.

Please refer to FIG. 5, one or more processors 30 are also used to execute the methods in 021, 023 and 025, that is, one or more processors 30 are also used to generate a voice annotation label according to the voice information; The voice annotation label V is displayed in P1; and the speech annotation label V is processed to generate a marked image P2, and the processing includes at least one of playing, deleting, and dragging.

In another embodiment, the user performs recording by long pressing the recording label L, and the recording ends and the input of the voice information is completed after releasing the hand, and the generation module 12 or one or more processors 30 generates the voice annotation label V according to the input voice information, And control to display the corresponding voice annotation label V in the to-be-annotated image P1 to generate an annotated image P2. At this time, the user can perform at least one of playing, deleting, and dragging the voice annotation tag V according to the actual situation. After the user's processing of the voice annotation label V is completed, the generation module 12 or one or more processors 30 update the marked image P2 in real time to ensure that the voice annotation label V in the marked image P2 and the processed voice annotation Label V corresponds.

Please refer to FIG. 3 and FIG. 9, in some embodiments, 04: Save the marked image P2 and the voice information, including:

041: Save the marked image P2 and the voice information as a video file.

Please refer to FIG. 4 , the storage module 14 is further configured to execute the method in 041, that is, the storage module 14 is further configured to save the marked image P2 and the voice information as a video file.

Please refer to FIG. 5 , the memory 50 is also used to execute the method in 041, that is, the memory 50 is also used to save the marked image P2 and the voice information as a video file.

In one embodiment, the storage module 14 or the memory 50 post-processes the associated voice information and the marked image P2 (including the voice annotation tag V associated with the voice information) in a video file format (such as MPEG format, AVI format, nAVI format, ASF format, MOV format, WMV format, etc.) are combined and stored in the electronic device 100, and the marked image P2 and the voice information are stored in the electronic device 100 in the format of one file, which can save the storage space of the electronic device 100, At the same time, the operation is simple when calling the marked image P2 and the voice information again. For example, the storage module 15 or the memory 50 saves the voice information and the marked image P2 in the MPEG format (Moving Picture Experts Group, the format of the Moving Picture Experts Group) through the video encapsulation format. When the user views the marked image P2 and the related voice information At the same time, the voice information in the marked image P2 can be viewed through only one video file.

Please refer to FIG. 3 and FIG. 10, in some embodiments, the voice annotation of the image and the use method may further include:

05: Play the marked image P2 and voice information.

Please refer to FIG. 4 , the apparatus 10 for voice annotation and use of images according to the embodiment of the present application may further include a playback module 15, and the playback module 15 is further configured to execute the method in 05, that is, the playback module 15 is also configured to play the marked image P2 and voice messages.

Please refer to FIG. 5 , the electronic device 100 of the embodiment of the present application may further include a display 70 and a speaker 90 , wherein the display 70 and the speaker 90 are used to execute the method in 05 . That is, the display 70 is used to display the marked image P2, and the speaker 90 is used to play voice information.

In the embodiment of the present application, the storage module 14 or the memory 50 saves the voice information and the marked image P2 in the format of a video file. When the user views the voice information and the marked image P2 again, the playback module 15 or the display 70 . The speaker 90 checks the marked image P2 and the voice information. Specifically, when the playback module 15 plays the video, it plays the voice information in the marked image P2 to realize audio recording and playback of the image marking; or, the display 70 displays the marked image P2 (including the voice marking label V) in the video, and the speaker 90 The voice information in the video (annotated image P2) is played.

Please refer to FIG. 3 and FIG. 11 , in some embodiments, the voice annotation tags V include multiple, and the multiple voice annotation tags V have a predetermined playback order. Method 05: Play the marked image P2 and the voice information, including:

051: Play the voice information associated with the voice annotation tag V according to the playing sequence.

Please refer to FIG. 4 , the playing module 15 is further configured to execute the method in 051, that is, the playing module 15 is further configured to play the voice information associated with the voice annotation tag V according to the playing sequence.

Please refer to FIG. 5 , the speaker 90 is also used to execute the method in 051 . That is, the speaker 90 is also used for playing the voice information associated with the voice annotation tag V in the playing order.

Specifically, the one or more processors 30 control the generated multiple voice annotation tags V to have a predetermined playback order, and when the playback module 15 or the speaker 90 plays the voice information associated with the voice annotation tags V, the audio tags V are played according to the predetermined playback order. The voice information associated with the voice annotation tag V ensures that the voice information in the marked image P2 is played in an orderly manner.

In one embodiment, one or more processors 30 may set the playback order of the voice annotation tags V to be associated with the positions of the voice annotation tags V. For example, multiple voice annotation tags V are displayed at the positions shown in FIG. 3 , when When the user plays the video obtained by combining the marked image P2 and the voice information, the playing order of the voice marking tag V can be played from top to bottom, that is, when the video is played, the voice with the voice durations of 34s, 65s, and 25s is played in sequence. For another example, when the user plays the video obtained by combining the marked image P2 and the voice information, the playing order of the voice marking label V can be played from bottom to top, that is, when the video is played, the duration of playing the voice in turn is 25s , 65s, 34s of voice information; for another example, when the user plays the video obtained by combining the marked image P2 and the voice information, the playback order of the voice tag V can be played from left to right, that is, when the video is played, Play the voice information with the voice durations of 65s, 34s, and 25s in turn; also for example, when the user plays the video obtained by combining the marked image P2 and the voice information, the playback order of the voice annotation tag V can be played from left to right in turn, That is, when the video is played, the voice information with the voice durations of 25s, 34s, and 65s is played in sequence.

In another embodiment, one or more processors 30 may set the playback order of the voice annotation tags V to be associated with the generation time of the voice annotation tags V, that is, when the user inputs the voice information to mark the image P1 once, One or more processors 30 record the entry time of the corresponding voice information, and sort them in chronological order according to the entry time. For example, in the voice annotation labels V shown in FIG. 3 , the three voice annotation labels V are sorted in chronological order. The obtained sequence is the speech information sequence of 25s, 65s, 34s. When the user plays the video obtained by combining the marked image P2 and the voice information, the playback sequence of the voice annotation tag V is to play the voice information of 25s, 65s, and 34s in sequence; 64s, 25s voice information.

In yet another embodiment, one or more processors 30 may set the playback order of the voice annotation tag V to be associated with the time axis of the video. Voice information of different durations will be synthesized into different time periods of the video. When the user plays the video obtained by combining the marked image P2 and the voice information, one or more processors 30 detect whether there is voice information in the video being played. When there is voice information in the time period, one or more processors 30 control the playback module 15 or the speaker 90 to play the voice information of the corresponding period until the video playback ends. In this way, the voice information in the video can be automatically played while playing the video, and the implementation is simple.

The above-mentioned video storage format realizes that the marked image P2 and the voice information are stored in the electronic device 100 through a file, and the playback mode of the marked image P2 and the voice information is simple.

Referring to FIG. 3 and FIG. 12, in some embodiments, method 04: saving the marked image P2 and the voice information, may further include:

043: Save the marked image P2 as a first format file;

045: save the voice information as a second format file; and

047: Save the first format file and the second format file separately.

Please refer to FIG. 4 , the storage module 14 is also used to execute the methods in 043, 045 and 047, that is, the storage module 14 is also used to: save the marked image P2 as the first format file; save the voice information as the second format and save the first format file and the second format file separately.

Please refer to FIG. 5, the memory 50 is also used to execute the method in 041, that is, the memory 50 is also used to: save the marked image P2 as a first format file; save the voice information as a second format file; The format file and the second format file are saved separately.

In the embodiment of the present application, the way of saving the marked image P2 and the voice information may also be to save the marked image P2 and the voice information separately, that is, the storage module 14 or the memory 50 stores the marked image P2 in an image format (such as JPEG format, RAW format, PNG format, GIF format, PDF format, etc.), save the voice information in audio format (such as MPEG format, MPEG-4 format, MP3 format, WMA format, FLAC format, etc.), one or more The processor 30 associates the saved two files to ensure that when the marked image P2 and the voice information are played, the played voice information is the voice information marked on the image. This storage method does not require subsequent processing of the marked image P2 and voice information, and the storage method is simple.

Please refer to FIG. 3 and FIG. 13 , in some embodiments, the voice annotation of the image and the use method may further include:

06: Trigger the voice annotation tag V to play the voice information associated with the voice annotation tag V.

Please refer to FIG. 4 , the playing module 15 is further configured to execute the method in 06, that is, the playing module 15 is further configured to play the voice information associated with the voice annotation tag V according to the triggered voice annotation tag V.

Please refer to FIG. 5 , the speaker 90 is further configured to execute the method in 06, that is, the speaker 90 is further configured to play the voice information associated with the voice annotation tag V according to the triggered voice annotation tag V.

Further, when the marked image P2 and the voice information are stored separately, the voice information associated with the voice marking tag V is played when the voice marking tag V is triggered by design. Specifically, as shown in FIG. 3 with the marked image P2 and the voice marked tag V, the user can click on any voice marked tag V in the marked image P2, and the playback module 15 or the speaker 90 will play the voice marked tag V associated with it. The voice information ensures that the user can selectively listen to the voice information in the marked image P2.

Please refer to FIG. 14 and FIG. 15. In some embodiments, when the marked image P2 exceeds the display area 40, the voice marking of the image and the method of using the image may further include:

07: Play the voice information associated with the voice annotation label V in the display area 40;

08: After playing the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; and

09: Play the voice information associated with the voice annotation tag V entered in the display area 40 .

Please refer to FIG. 4, the playback module 15 is also used to execute the methods in 07, 08 and 09, that is, the playback module 15 is also used to: play the voice information associated with the voice annotation label V in the display area 40; After the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; voice message.

Please refer to FIG. 5, the speaker 90 is also used for executing the method in 06, that is, the speaker 90 is also used for: after playing the voice information associated with the voice annotation tag V in the display area 40, scrolling and displaying the marked image P2 to Make the unplayed voice annotation tag V enter the display area 40 ; and play the voice information associated with the voice annotation tag V entered into the display area 40 .

In an actual situation, the image to be marked P1 acquired by the acquisition module 11 or one or more processors 30 is a long image, such as a long image obtained in a panorama mode during photography, or a long image obtained by scrolling screenshots, and the display area 40 normally displays a long image. When the to-be-labeled image P1 is shown in the figure, all the information in the image cannot be displayed. As shown in FIG. 15 , when the user plays the marked image P2 and the voice information, the marked image P2 exceeds the display area 40, and when playing the voice information in the marked image P2, the voice marked label V in the display area 40 is played first. Associated voice information, after playing the voice information associated with the voice annotation label V in the display area 40, one or more processors 30 control the marked image P2 to automatically scroll from top to bottom to display the image information not displayed, not displayed. After the image enters the display area 40, one or more processors 30 detect whether there is a voice annotation label V in the image entering the display area, and if there is a voice annotation label V, then control the playback module 15 or the speaker 90 to play and enter the display area 40. The phonetic information within the phonetic annotation tag V is associated with. The playback mode is applicable to the above-mentioned video playback and trigger playback, which will not be repeated here.

Referring to FIG. 16 , an embodiment of the present application further provides a non-volatile computer-readable storage medium 200 including a computer program 201 . When the computer program 201 is executed by one or more processors 30, it causes the processors 30 to execute Methods in 06, 07, 08, 09.

1 and 2, for example, when the computer program 201 is executed by one or more processors 30, the processors 30 are caused to execute the following methods:

01: Obtain the image P1 to be marked;

03: Associate the voice annotation tag V with the voice information; and

04: Save the marked image P2 and voice information.

For another example, when the computer program 201 is executed by one or more processors 30, it causes the processors 30 to perform the following methods:

01: Obtain the image P1 to be marked;

021: Generate a voice annotation label according to the voice information;

023: Display the voice annotation label V in the to-be-labeled image P1 to generate the labeled image P2;

025: processing the voice annotation tag V to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging;

03: Associate the voice annotation tag V and voice information;

041: Save the marked image P2 and voice information as a video file;

051: Play the voice information associated with the voice annotation tag V according to the playback order;

01: Obtain the image P1 to be marked;

021: Generate a voice annotation label according to the voice information;

03: Associate the voice annotation tag V and voice information;

043: Save the marked image P2 as a first format file;

045: save the voice information as a second format file;

047: save the first format file and the second format file separately;

Also for example, the computer program 201, when executed by one or more processors 30, causes the processors 30 to perform the following methods:

01: Obtain the image P1 to be marked;

021: Generate a voice annotation label according to the voice information;

03: Associate the voice annotation tag V and voice information;

041: Save the marked image P2 and voice information as a video file;

08: after playing the voice information associated with the voice annotation label V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation label V enters the display area 40; and

01: Obtain the image P1 to be marked;

021: Generate a voice annotation label according to the voice information;

03: Associate the voice annotation tag V and voice information;

043: Save the marked image P2 as a first format file;

045: save the voice information as a second format file;

047: save the first format file and the second format file separately;

In the description of this specification, description with reference to the terms "embodiment," "example," etc. means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application . In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the present application, The scope of the application is defined by the claims and their equivalents.

Claims

A voice annotation of images and a method for using them, comprising:

Get the image to be labeled;

Generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed in the to-be-marked image;

associating the phonetic callout tag with the phonetic information; and

Save the annotated image and the voice information.
The voice labeling and using method of an image according to claim 1, wherein the generating a labelled image according to the input speech information and the to-be-labeled image comprises:

generating a voice annotation tag based on the voice information; and

The voice annotation label is displayed in the to-be-annotated image to generate the annotated image.
The voice labeling and using method of an image according to claim 1, wherein the generating a labelled image according to the input speech information and the to-be-labeled image comprises:

generating a voice annotation label according to the voice information;

displaying the voice annotation label in the to-be-annotated image; and

The voice annotation tag is processed to generate the marked image, the processing including at least one of playing, deleting, and dragging.
The voice annotation of images and the method for using them according to claim 1, wherein the storing of the marked images and the voice information comprises:

Save the marked image and the voice information as a video file.
The voice annotation of image according to claim 4 and using method, it is characterized in that, also comprises:

Play the marked image and the voice information.
The voice annotation of images and the method for using them according to claim 5, wherein there are multiple voice annotation tags, and a plurality of said voice annotation tags have a predetermined playback order, and said playback of said marked images and The voice information includes:

The voice information associated with the voice annotation tag is played in the playing order.
The voice annotation of an image and the method for using it according to claim 1, wherein the storing the marked image and the voice information comprises:

saving the marked image as a first format file;

saving the voice information as a second format file; and

Save the first format file and the second format file separately.
The voice labeling and using method of image according to claim 7, is characterized in that, also comprises:

The voice annotation tag is triggered to play the voice information associated with the voice annotation tag.
The voice labeling and using method of an image according to any one of claims 1 to 8, further comprising:

Play the voice information associated with the voice annotation label in the display area;

After playing the voice information associated with the voice annotation tags in the display area, scrolling the marked images so that the unplayed voice annotation tags enter the display area; and

Plays the voice message associated with the voice callout tag entering the display area.
A device for voice annotation and use of images, comprising:

The acquisition module is used to acquire the image to be annotated;

A generating module, configured to generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed in the to-be-marked image;

an association module for associating the voice annotation tag and the voice information; and

The storage module is used for saving the marked image and the voice information.
An electronic device, characterized in that, comprising:

One or more processors, one or more of the processors are used to obtain the image to be marked; according to the input voice information and the image to be marked, a marked image is generated, the marked image includes a voice marking label, and the voice An annotation tag is displayed in the to-be-annotated image; and the voice annotation tag is associated with the voice information; and

a memory, where the memory is used for saving the marked image and the voice information.
The electronic device of claim 11, wherein one or more of the processors are further configured to:

generating a voice annotation label according to the voice information;

The voice annotation label is controlled to be displayed in the to-be-annotated image to generate the annotated image.
The electronic device of claim 11, wherein one or more of the processors are further configured to:

generating a voice annotation label according to the voice information;

controlling the display of the voice annotation label in the to-be-annotated image; and

The voice annotation tag is processed to generate the marked image, the processing including at least one of playing, deleting, and dragging.
The electronic device according to claim 11, wherein the memory is further configured to save the marked image and the voice information as a video file.
The electronic device according to claim 11, further comprising a display and a speaker, wherein the display is used for displaying the marked image, and the speaker is used for playing the voice information.
The electronic device according to claim 15, wherein there are multiple voice annotation tags, and the multiple voice annotation tags have a predetermined playing order, and the speaker is further configured to play and play the corresponding voice tags according to the playing order. the voice information associated with the voice annotation tag.
The electronic device according to claim 11, wherein the memory is further used for:

saving the marked image as a first format file;

saving the voice information as a second format file; and

Save the first format file and the second format file separately.
The electronic device according to claim 17, wherein the speaker is further configured to play the voice information associated with the voice annotation tag according to the triggered voice annotation tag.
The electronic device according to any one of claims 11 to 17, wherein the speaker is further used for:

Play the voice information associated with the voice annotation label in the display area;

After playing the voice information associated with the voice annotation tags in the display area, scrolling the marked images so that the unplayed voice annotation tags enter the display area; and

Plays the voice message associated with the voice callout tag entering the display area.
A non-volatile computer-readable storage medium storing a computer program, when the computer program is executed by one or more processors, the voice annotation and use of the image according to any one of claims 1 to 9 are realized method.