WO2022183814A1 - Voice annotation and use method and device for image, electronic device, and storage medium - Google Patents

Voice annotation and use method and device for image, electronic device, and storage medium Download PDF

Info

Publication number
WO2022183814A1
WO2022183814A1 PCT/CN2021/140547 CN2021140547W WO2022183814A1 WO 2022183814 A1 WO2022183814 A1 WO 2022183814A1 CN 2021140547 W CN2021140547 W CN 2021140547W WO 2022183814 A1 WO2022183814 A1 WO 2022183814A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
image
marked
annotation
voice information
Prior art date
Application number
PCT/CN2021/140547
Other languages
French (fr)
Chinese (zh)
Inventor
彭映
刘昱玥
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022183814A1 publication Critical patent/WO2022183814A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models

Definitions

  • the present application relates to the technical field of image processing, and more particularly, to a voice annotation of an image and a method of using it, a device for voice annotation and use of an image, an electronic device, and a non-volatile computer-readable storage medium.
  • Embodiments of the present application provide a voice annotation of an image and a method of using it, a device for voice annotation and use of an image, an electronic device, and a non-volatile computer-readable storage medium.
  • the voice annotation of images in the embodiments of the present application and the method for using them include: acquiring an image to be annotated; generating an annotated image according to input voice information and the to-be-annotated image, where the annotated image includes a voice annotation label, and the voice annotation label displaying in the image to be marked; associating the voice marking label and the voice information; and saving the marked image and the voice information.
  • the apparatus for voice annotation and use of images includes: an acquisition module, a generation module, an association module, and a storage module.
  • the acquisition module is used to acquire the image to be marked;
  • the generation module is used to generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed on the to-be-marked image.
  • the association module is used for associating the voice annotation label and the voice information; and the storage module is used to save the marked image and the voice information.
  • the electronic device of the embodiment of the present application includes: one or more processors and a memory.
  • One or more of the processors are used to obtain an image to be marked; generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed on the to-be-marked image. annotating the image; and associating the voice annotation label and the voice information.
  • the memory is used for saving the marked image and the voice information.
  • the non-volatile computer-readable storage medium of the embodiment of the present application contains a computer program.
  • the computer program When the computer program is executed by one or more processors, it enables the processor to implement the following image annotation and usage methods: acquiring an image to be annotated; generating an annotated image according to the input voice information and the image to be annotated, the The annotated image includes a voice annotation label, which is displayed in the to-be-annotated image; associates the voice annotation label and the voice information; and saves the annotated image and the voice information.
  • FIG. 1 is a schematic flowchart of a voice annotation of an image and a method of using it according to some embodiments of the present application;
  • FIG. 2 is a schematic diagram of performing voice annotation on images to be marked for voice annotation of images and methods of use according to some embodiments of the present application;
  • FIG. 3 is a schematic diagram of a marked image of the voice annotation of an image and a method of using it according to some embodiments of the present application;
  • FIG. 4 is a schematic structural diagram of a device for voice annotation and use of images according to some embodiments of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device according to some embodiments of the present application.
  • 6 to 14 are schematic flowcharts of voice annotation of images and methods of using them according to some embodiments of the present application.
  • 15 is a schematic diagram of the use of voice annotation of an image and a method of using the image according to some embodiments of the present application to the voice annotation of the marked image beyond the display area;
  • FIG. 16 is a schematic diagram of the connection between the non-volatile computer-readable storage medium and the processor according to some embodiments of the present application.
  • the embodiments of the present application provide a voice annotation of an image and a method of using the image.
  • the voice annotation of the image and the method of using the image include: acquiring an image to be marked; generating a marked image according to the input voice information and the image to be marked, and the marked image includes the voice annotation tag, the voice tagging tag is displayed in the image to be tagged; associates the voice tagging tag and voice information; and saves the tagged image and voice information.
  • generating a marked image according to the input voice information and the image to be marked includes: generating a voice marking label according to the voice information; and displaying the voice marking label in the to-be-marked image to generate the marked image.
  • generating a marked image according to the input voice information and the image to be marked includes: generating a voice marking label according to the voice information; displaying the voice marking label in the image to be marked; and processing the voice marking label to Generating annotated images, the processing includes at least one of playing, deleting, and dragging.
  • saving the marked image and voice information includes: saving the marked image and voice information as a video file.
  • the voice annotation of the image and the method for using the image further include: playing the marked image and voice information.
  • the multiple voice annotation tags have a predetermined playing order. Playing the marked images and voice information includes: playing the voice information associated with the voice annotation tags in the playing order.
  • saving the marked image and voice information includes: saving the marked image as a first format file; saving the voice information as a second format file; and saving the first format file and the second format file separately save.
  • the voice annotation of the image and the method of using the image further include: triggering the voice annotation tag to play the voice information associated with the voice annotation tag.
  • the voice annotation of the image and the method for using it further include: playing the voice information associated with the voice annotation tag in the display area; after playing the voice information associated with the voice annotation tag in the display area, scrolling the display Annotating the image so that unplayed voice annotation tags enter the display area; and playing voice information associated with the voice annotation tags entering the display area.
  • Embodiments of the present application further provide a device for voice annotation and use of images
  • the device for voice annotation and use of images includes: an acquisition module, a generation module, an association module, and a storage module.
  • the acquisition module is used to acquire the image to be labeled.
  • the generating module is configured to generate a marked image according to the input voice information and the image to be marked, the marked image includes a voice marked label, and the voice marked label is displayed in the to-be-marked image.
  • the association module is used to associate the voice annotation label and voice information.
  • the storage module is used to save the marked images and voice information.
  • Embodiments of the present application further provide an electronic device, where the electronic device includes one or more processors and a memory.
  • One or more processors are used to obtain the image to be marked; generate a marked image according to the input voice information and the image to be marked, the marked image includes a voice marked tag, and the voice marked tag is displayed in the image to be marked; and the associated voice marked tag and Voice information; the memory is used to save the marked images and voice information.
  • the one or more processors are further configured to: generate a voice annotation label according to the voice information; and control the display of the voice annotation label in the image to be annotated, so as to generate an annotated image.
  • the one or more processors are further configured to: generate a voice annotation tag according to the voice information; control the display of the voice annotation tag in the image to be annotated; and process the voice annotation tag to generate an annotated image,
  • the processing includes at least one of playing, deleting, and dragging.
  • the memory is also used to save the annotated image and voice information as a video file.
  • the electronic device further includes a display and a speaker, the display is used for displaying the marked image, and the speaker is used for playing the voice information.
  • the multiple voice annotation tags have a predetermined playing order
  • the speaker is further configured to play the voice information associated with the voice annotation tags according to the playing order.
  • the memory is further used for: saving the marked image as a first format file; saving the voice information as a second format file; and saving the first format file and the second format file separately.
  • the speaker is further configured to play voice information associated with the voice annotation tag according to the triggered voice annotation tag.
  • the speaker is further used to: play the voice information associated with the voice annotation tags in the display area; after playing the voice information associated with the voice annotation tags in the display area, scroll and display the marked images to make The unplayed voice annotation tag enters the display area; and the voice information associated with the voice annotation tag entered into the display area is played.
  • Embodiments of the present application further provide a non-volatile computer-readable storage medium storing a computer program, when the computer program is executed by one or more processors, it implements any of the above-mentioned image annotation and usage methods.
  • an embodiment of the present application provides a voice annotation of an image and a method of using the image, and the voice annotation of the image and the method of using include:
  • an embodiment of the present application provides an apparatus 10 for voice annotation and use of images.
  • the apparatus 10 for voice annotation and use of images includes an acquisition module 11 , a generation module 12 , an association module 13 and a storage module 14 .
  • the image voice annotation and use method according to the embodiment of the present application can be applied to the image voice annotation and use device 10, wherein the acquisition module 11, the generation module 12, the association module 13 and the storage module 14 are respectively used to execute 01, 02, Methods in 03 and 04.
  • the acquiring module 11 is used to acquire the image P1 to be marked; the generating module 12 is used to generate the marked image P2 according to the input voice information and the image P1 to be marked, and the marked image P2 includes the voice annotation label V, which is displayed on the In the image P1 to be marked; the association module 13 is used for associating the voice marking label V and the voice information; and the storage module 14 is used for saving the marked image P2 and the voice information.
  • an embodiment of the present application provides an electronic device 100 .
  • the electronic device 100 includes one or more processors 30 and a memory 50 .
  • the voice annotation of images and the method for using the image in this embodiment can be applied to the electronic device 100, wherein one or more processors 30 are used for executing the methods in 01, 02 and 03, and the memory 50 is used for executing the method in 04 . That is, one or more processors 30 are used to: acquire the image P1 to be marked; generate the marked image P2 according to the input voice information and the image P1 to be marked, and the marked image P2 includes the voice marking label V, and the voice marking label V is displayed on the In the image P1 to be marked; associate the voice marking label V and the voice information.
  • the memory 50 is used to store the marked image P2 and voice information.
  • the electronic device 100 may be a terminal device such as a mobile phone, a notebook computer, a smart watch, a computer, etc.
  • the device 10 for voice annotation and use of images may be an application program installed in the electronic device 100 , for example , application programs such as screenshots, photo albums, etc.; it can also be a certain functional module in some application programs, such as an image editing function; this application only takes the electronic device 100 as a mobile phone as an example for description, and when the electronic device 100 is other types of terminals The situation is similar to that of mobile phones, so we will not explain in detail.
  • the acquisition module 11 or one or more processors 30 may acquire an image to be labeled P1 by capturing an image as the to-be-labeled image P1.
  • the acquiring module 11 or one or more processors 30 acquires the image P1 to be tagged, and may acquire an image from the album in the electronic device 100 as the image P1 to be tagged.
  • the acquisition module 11 or one or more processors 30 may acquire an image to be marked P1 by taking a screenshot of the electronic device 100 to acquire an image as the to-be-marked image P1.
  • the acquisition module 11 or one or more processors 30 may acquire the image P1 to be marked in other ways, which are not limited here.
  • the generating module 12 or one or more processors 30 generates a marked image P2 according to the input voice information and the acquired image P1 to be marked, and the marked image P2 includes a voice marking label V, and the voice marking label V is displayed on the image P1 to be marked Specifically, the initial display position of the voice annotation label V may be the bottom position of the image P1 to be annotated.
  • the association module 13 or one or more processors 30 associates the input voice information with the voice annotation tag V, wherein the user can input the voice information one or more times through the recording tag L, and each time the input voice information is associated with a voice annotation tag V , in this way, the marked image P2 may include multiple voice marking labels V, so as to realize the multi-voice marking function of the image P1 to be marked.
  • the storage module 14 or the memory 50 saves the marked image P2 and the marked voice information, so that when viewing the marked image P2 again, the voice information in the marked image P2 can be listened to.
  • the information annotation of the image P1 to be annotated is realized by inputting the voice information, which improves the efficiency of image annotation compared with the annotation methods such as text and brushes.
  • the apparatus 10 for voice annotation and use of images may also implement text and brush annotation on the image P1 to be annotated.
  • the user when performing the text annotation function, the user can input voice information by recording, and the generation module 12 or one or more processors 30 convert the input voice information into text information and display it in the image to be marked P1, so as to generate a Annotate image P2.
  • the user directly inputs text information to implement text labeling of the image P1 to be labelled.
  • the generation module 12 or one or more processors 30 converts the input voice information into picture information and displays it in the image to be marked P1 to generate a picture.
  • Image P2 is annotated.
  • the user directly inputs drawing information (drawing in the image P1 to be annotated) to realize the brush annotation of the image P1 to be annotated.
  • drawing information drawn in the image P1 to be annotated
  • the apparatus 10 for voice annotation and use of images can realize not only the voice annotation function of the image P1 to be annotated, but also the text and brush annotation functions of the image P1 to be annotated, the application scenarios are more diverse, and the user is provided with More callout options.
  • 02 Generate a marked image P2 according to the input voice information and the image P1 to be marked, including:
  • the generation module 12 is also used to execute the methods in 021 and 023, that is, the generation module 12 is also used to generate a voice annotation label according to the voice information; Generate annotated image P2.
  • one or more processors 30 are also used to execute the methods in 021 and 023, that is, one or more processors 30 are also used to generate a voice annotation label according to the voice information; and control the image P1 to be annotated.
  • the voice annotation label V is displayed in , to generate the annotated image P2.
  • the user after obtaining the image P1 to be marked, the user records the voice information. Specifically, the user generates the corresponding voice according to the input voice information through the generation module 12 or one or more processors 30. Labeling label V, correspondingly, each time the user enters a piece of speech information, the generation module 12 or one or more processors 30 generates a speech labeling label corresponding to the input speech information, and at the same time, the generation module 12 or one or more processors 30 The corresponding voice annotation label V is controlled to be displayed in the to-be-annotated image P1 to generate an annotated image P2 to ensure that the user can quickly learn the voice information annotated in the annotated image P2 when viewing the annotated image P2 again.
  • the voice annotation tag V associated with the input voice information is not displayed in the marked image P2
  • the user cannot determine whether the relevant voice information has been successfully entered in the to-be-marked image P1 after inputting the voice information; or when the user checks the marked voice information again
  • the image P2 is used, if the voice annotation label V associated with the input voice information is not displayed in the marked image P2, the user cannot determine whether there is marked voice information in the marked image P2, so that the user needs to perform two steps on the marked image P1.
  • the phenomenon of secondary voice annotation, the efficiency of image annotation is low.
  • the generation module 12 or one or more processors 30 generate corresponding voice annotation labels V according to the input voice information, and control the voice annotation labels V to be displayed in the image to be labeled P1, Thereby, the marked image P2 is generated, which is convenient for the user to confirm whether the voice information is successfully inputted, and it is convenient for the user to quickly learn the voice information marked in the marked image P2 when viewing the marked image P2 again, thereby improving the efficiency of image marking.
  • step 023 displaying the voice annotation label V in the image to be annotated P1, to generate an annotated image P2, including:
  • 0233 Process the voice annotation tag V to generate a marked image P2, and the processing includes at least one of playing, deleting, and dragging.
  • the generation module 12 is also used to execute the methods in 0231 and 0233, that is, the generation module 12 is also used to: control the display of the voice annotation label V in the image P1 to be marked; and process the voice annotation label V, To generate the marked image P2, the processing includes at least one of playing, deleting, and dragging.
  • one or more processors 30 are also used to execute the methods in 021 and 023, that is, one or more processors 30 are also used to: control the display of the voice annotation label V in the image P1 to be annotated; and The voice annotation tag V is processed to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging.
  • the user performs recording by long pressing the recording label L. After letting go, the recording ends and the input of the voice information is completed.
  • the voice annotation tag V At this time, the user can at least one of play, delete, and drag the voice annotation tag V whose recording has ended. For example, the user plays the voice annotation tag V at the end of the recording. At this time, the voice annotation tag V displays an animation of the icon to play the associated voice information, which is convenient for the user to listen to the entered voice information and determine whether the entered voice information is accurate and whether the sound is accurate or not. For another example, the user deletes the voice annotation label V at the end of the recording, and clicks to select the pre-deleted voice annotation label V.
  • a delete icon appears on the voice annotation label V, and the voice can be deleted by clicking the delete image.
  • the user drags the voice annotation tag V at the end of the recording to display the voice annotation tag V in a suitable position in the image P1 to be marked.
  • the user can long press the voice annotation tag V to display drag. For example, if there is text information in the image P1 to be marked, when a certain line of text or word in the image to be marked P1 needs to be marked, after the voice information is input for the line of text or word, the voice associated with the voice information can be marked. Drag and drop the tag V near the line of text or word to generate a marked image P2.
  • the user When the user views the marked image P2 again, he can quickly understand the relevant information marked by the voice information associated with the voice marking tag. Also for example, the user performs playback and drag processing on the voice annotation tag V at the end of the recording, or plays and deletes the voice annotation tag V at the end of the recording, or performs drag and drop processing on the voice annotation tag V at the end of the recording. Or play, drag, and delete the voice annotation tag V at the end of the recording, and the specific processing is performed according to the actual situation, which is not limited here.
  • step 02 Generate an labeled image P2 according to the input voice information and the image P1 to be labeled, which may further include:
  • the generation module 12 is also used to execute the method in 021, 023 and 025, that is, the generation module 12 is also used to generate a voice annotation label according to the voice information; Control to display the voice annotation label V in the image P1 to be marked; and processing the voice annotation tag V to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging.
  • one or more processors 30 are also used to execute the methods in 021, 023 and 025, that is, one or more processors 30 are also used to generate a voice annotation label according to the voice information;
  • the voice annotation label V is displayed in P1; and the speech annotation label V is processed to generate a marked image P2, and the processing includes at least one of playing, deleting, and dragging.
  • the user performs recording by long pressing the recording label L, and the recording ends and the input of the voice information is completed after releasing the hand, and the generation module 12 or one or more processors 30 generates the voice annotation label V according to the input voice information, And control to display the corresponding voice annotation label V in the to-be-annotated image P1 to generate an annotated image P2.
  • the user can perform at least one of playing, deleting, and dragging the voice annotation tag V according to the actual situation.
  • the generation module 12 or one or more processors 30 update the marked image P2 in real time to ensure that the voice annotation label V in the marked image P2 and the processed voice annotation Label V corresponds.
  • 04 Save the marked image P2 and the voice information, including:
  • the storage module 14 is further configured to execute the method in 041, that is, the storage module 14 is further configured to save the marked image P2 and the voice information as a video file.
  • the memory 50 is also used to execute the method in 041, that is, the memory 50 is also used to save the marked image P2 and the voice information as a video file.
  • the storage module 14 or the memory 50 post-processes the associated voice information and the marked image P2 (including the voice annotation tag V associated with the voice information) in a video file format (such as MPEG format, AVI format, nAVI format, ASF format, MOV format, WMV format, etc.) are combined and stored in the electronic device 100, and the marked image P2 and the voice information are stored in the electronic device 100 in the format of one file, which can save the storage space of the electronic device 100, At the same time, the operation is simple when calling the marked image P2 and the voice information again.
  • a video file format such as MPEG format, AVI format, nAVI format, ASF format, MOV format, WMV format, etc.
  • the storage module 15 or the memory 50 saves the voice information and the marked image P2 in the MPEG format (Moving Picture Experts Group, the format of the Moving Picture Experts Group) through the video encapsulation format.
  • the voice information in the marked image P2 can be viewed through only one video file.
  • the voice annotation of the image and the use method may further include:
  • the apparatus 10 for voice annotation and use of images may further include a playback module 15, and the playback module 15 is further configured to execute the method in 05, that is, the playback module 15 is also configured to play the marked image P2 and voice messages.
  • the electronic device 100 of the embodiment of the present application may further include a display 70 and a speaker 90 , wherein the display 70 and the speaker 90 are used to execute the method in 05 . That is, the display 70 is used to display the marked image P2, and the speaker 90 is used to play voice information.
  • the storage module 14 or the memory 50 saves the voice information and the marked image P2 in the format of a video file.
  • the playback module 15 or the display 70 When the user views the voice information and the marked image P2 again, the playback module 15 or the display 70 .
  • the speaker 90 checks the marked image P2 and the voice information. Specifically, when the playback module 15 plays the video, it plays the voice information in the marked image P2 to realize audio recording and playback of the image marking; or, the display 70 displays the marked image P2 (including the voice marking label V) in the video, and the speaker 90 The voice information in the video (annotated image P2) is played.
  • the voice annotation tags V include multiple, and the multiple voice annotation tags V have a predetermined playback order.
  • Method 05 Play the marked image P2 and the voice information, including:
  • the playing module 15 is further configured to execute the method in 051, that is, the playing module 15 is further configured to play the voice information associated with the voice annotation tag V according to the playing sequence.
  • the speaker 90 is also used to execute the method in 051 . That is, the speaker 90 is also used for playing the voice information associated with the voice annotation tag V in the playing order.
  • the one or more processors 30 control the generated multiple voice annotation tags V to have a predetermined playback order, and when the playback module 15 or the speaker 90 plays the voice information associated with the voice annotation tags V, the audio tags V are played according to the predetermined playback order.
  • the voice information associated with the voice annotation tag V ensures that the voice information in the marked image P2 is played in an orderly manner.
  • one or more processors 30 may set the playback order of the voice annotation tags V to be associated with the positions of the voice annotation tags V. For example, multiple voice annotation tags V are displayed at the positions shown in FIG. 3 , when the user plays the video obtained by combining the marked image P2 and the voice information, the playing order of the voice marking tag V can be played from top to bottom, that is, when the video is played, the voice with the voice durations of 34s, 65s, and 25s is played in sequence.
  • the playing order of the voice marking label V can be played from bottom to top, that is, when the video is played, the duration of playing the voice in turn is 25s , 65s, 34s of voice information;
  • the playback order of the voice tag V can be played from left to right, that is, when the video is played, Play the voice information with the voice durations of 65s, 34s, and 25s in turn;
  • the playback order of the voice annotation tag V can be played from left to right in turn, That is, when the video is played, the voice information with the voice durations of 25s, 34s, and 65s is played in sequence.
  • one or more processors 30 may set the playback order of the voice annotation tags V to be associated with the generation time of the voice annotation tags V, that is, when the user inputs the voice information to mark the image P1 once.
  • One or more processors 30 record the entry time of the corresponding voice information, and sort them in chronological order according to the entry time. For example, in the voice annotation labels V shown in FIG. 3 , the three voice annotation labels V are sorted in chronological order.
  • the obtained sequence is the speech information sequence of 25s, 65s, 34s.
  • the playback sequence of the voice annotation tag V is to play the voice information of 25s, 65s, and 34s in sequence; 64s, 25s voice information.
  • one or more processors 30 may set the playback order of the voice annotation tag V to be associated with the time axis of the video. Voice information of different durations will be synthesized into different time periods of the video.
  • one or more processors 30 detect whether there is voice information in the video being played.
  • one or more processors 30 control the playback module 15 or the speaker 90 to play the voice information of the corresponding period until the video playback ends. In this way, the voice information in the video can be automatically played while playing the video, and the implementation is simple.
  • the above-mentioned video storage format realizes that the marked image P2 and the voice information are stored in the electronic device 100 through a file, and the playback mode of the marked image P2 and the voice information is simple.
  • method 04 saving the marked image P2 and the voice information, may further include:
  • the storage module 14 is also used to execute the methods in 043, 045 and 047, that is, the storage module 14 is also used to: save the marked image P2 as the first format file; save the voice information as the second format and save the first format file and the second format file separately.
  • the memory 50 is also used to execute the method in 041, that is, the memory 50 is also used to: save the marked image P2 as a first format file; save the voice information as a second format file; The format file and the second format file are saved separately.
  • the way of saving the marked image P2 and the voice information may also be to save the marked image P2 and the voice information separately, that is, the storage module 14 or the memory 50 stores the marked image P2 in an image format (such as JPEG format, RAW format, PNG format, GIF format, PDF format, etc.), save the voice information in audio format (such as MPEG format, MPEG-4 format, MP3 format, WMA format, FLAC format, etc.), one or more
  • the processor 30 associates the saved two files to ensure that when the marked image P2 and the voice information are played, the played voice information is the voice information marked on the image.
  • This storage method does not require subsequent processing of the marked image P2 and voice information, and the storage method is simple.
  • the voice annotation of the image and the use method may further include:
  • the playing module 15 is further configured to execute the method in 06, that is, the playing module 15 is further configured to play the voice information associated with the voice annotation tag V according to the triggered voice annotation tag V.
  • the speaker 90 is further configured to execute the method in 06, that is, the speaker 90 is further configured to play the voice information associated with the voice annotation tag V according to the triggered voice annotation tag V.
  • the voice information associated with the voice marking tag V is played when the voice marking tag V is triggered by design. Specifically, as shown in FIG. 3 with the marked image P2 and the voice marked tag V, the user can click on any voice marked tag V in the marked image P2, and the playback module 15 or the speaker 90 will play the voice marked tag V associated with it.
  • the voice information ensures that the user can selectively listen to the voice information in the marked image P2.
  • the voice marking of the image and the method of using the image may further include:
  • the playback module 15 is also used to execute the methods in 07, 08 and 09, that is, the playback module 15 is also used to: play the voice information associated with the voice annotation label V in the display area 40; After the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; voice message.
  • the speaker 90 is also used for executing the method in 06, that is, the speaker 90 is also used for: after playing the voice information associated with the voice annotation tag V in the display area 40, scrolling and displaying the marked image P2 to Make the unplayed voice annotation tag V enter the display area 40 ; and play the voice information associated with the voice annotation tag V entered into the display area 40 .
  • the image to be marked P1 acquired by the acquisition module 11 or one or more processors 30 is a long image, such as a long image obtained in a panorama mode during photography, or a long image obtained by scrolling screenshots, and the display area 40 normally displays a long image.
  • the to-be-labeled image P1 is shown in the figure, all the information in the image cannot be displayed.
  • the marked image P2 exceeds the display area 40, and when playing the voice information in the marked image P2, the voice marked label V in the display area 40 is played first.
  • one or more processors 30 control the marked image P2 to automatically scroll from top to bottom to display the image information not displayed, not displayed.
  • one or more processors 30 detect whether there is a voice annotation label V in the image entering the display area, and if there is a voice annotation label V, then control the playback module 15 or the speaker 90 to play and enter the display area 40.
  • the phonetic information within the phonetic annotation tag V is associated with.
  • the playback mode is applicable to the above-mentioned video playback and trigger playback, which will not be repeated here.
  • an embodiment of the present application further provides a non-volatile computer-readable storage medium 200 including a computer program 201 .
  • the computer program 201 When the computer program 201 is executed by one or more processors 30, it causes the processors 30 to execute Methods in 06, 07, 08, 09.
  • processors 30 for example, when the computer program 201 is executed by one or more processors 30, the processors 30 are caused to execute the following methods:
  • the computer program 201 when executed by one or more processors 30, it causes the processors 30 to perform the following methods:
  • the computer program 201 when executed by one or more processors 30, it causes the processors 30 to perform the following methods:
  • the computer program 201 when executed by one or more processors 30, causes the processors 30 to perform the following methods:
  • the computer program 201 when executed by one or more processors 30, causes the processors 30 to perform the following methods:

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A voice annotation and use method for an image, a voice annotation and use device (10) for an image, an electronic device (100), and a non-volatile computer readable storage medium (201). The voice annotation and use method for an image comprises: acquiring an image to be annotated (01); generating an annotated image according to input voice information and the image to be annotated, the annotated image comprising a voice annotation tag, and the voice annotation tag being displayed in the image to be annotated (02); associating the voice annotation tag with the voice information (03); and storing the annotated image and the voice information (04). In the voice annotation and use method for an image, the voice annotation of the image to be annotated is achieved by inputting the voice information, thereby improving the image annotation efficiency.

Description

图像的语音标注及使用方法与装置、电子装置及存储介质Image voice annotation and using method and device, electronic device and storage medium
优先权信息priority information
本申请请求2021年03月03日向中国国家知识产权局提交的、专利申请号为202110235765.3的专利申请优先权和权益,并且通过参展将其全文并入此处。This application claims the priority and rights and interests of the patent application with the patent application number 202110235765.3 submitted to the State Intellectual Property Office of China on March 03, 2021, and the full text of which is incorporated here through the exhibition.
技术领域technical field
本申请涉及图像处理技术领域,更具体而言,涉及一种图像的语音标注及使用方法、图像的语音标注及使用装置、电子装置及非易失性计算机可读存储介质。The present application relates to the technical field of image processing, and more particularly, to a voice annotation of an image and a method of using it, a device for voice annotation and use of an image, an electronic device, and a non-volatile computer-readable storage medium.
背景技术Background technique
随着技术的发展,手机、平板电脑、电脑等电子装置成为人们获取外界信息的工具,当需要将一些重要的信息保留下来时,往往会通过图像的方式保存下来,并对图像进行信息标注,以方便再次查看图像时快速获取图像的重要信息。但当前的图像标注方式仅能通过文字、画笔等进行,标注效率较低。With the development of technology, electronic devices such as mobile phones, tablet computers, and computers have become tools for people to obtain information from the outside world. When some important information needs to be preserved, it is often saved by means of images, and the images are marked with information. In order to facilitate the quick access to important information of the image when viewing the image again. However, the current image labeling method can only be performed through text, brushes, etc., and the labeling efficiency is low.
发明内容SUMMARY OF THE INVENTION
本申请实施方式提供一种图像的语音标注及使用方法、图像的语音标注及使用装置、电子装置及非易失性计算机可读存储介质。Embodiments of the present application provide a voice annotation of an image and a method of using it, a device for voice annotation and use of an image, an electronic device, and a non-volatile computer-readable storage medium.
本申请实施方式的图像的语音标注及使用方法包括:获取待标注图像;根据输入的语音信息及所述待标注图像生成已标注图像,所述已标注图像包括语音标注标签,所述语音标注标签显示于所述待标注图像中;关联所述语音标注标签及所述语音信息;及保存所述已标注图像及所述语音信息。The voice annotation of images in the embodiments of the present application and the method for using them include: acquiring an image to be annotated; generating an annotated image according to input voice information and the to-be-annotated image, where the annotated image includes a voice annotation label, and the voice annotation label displaying in the image to be marked; associating the voice marking label and the voice information; and saving the marked image and the voice information.
本申请实施方式的图像的语音标注及使用装置包括:获取模块、生成模块、关联模块及存储模块。获取模块用于获取待标注图像;生成模块用于根据输入的语音信息及所述待标注图像生成已标注图像,所述已标注图像包括语音标注标签,所述语音标注标签显示于所述待标注图像中;关联模块用于关联所述语音标注标签及所述语音信息;及存储模块用于保存所述已标注图像及所述语音信息。The apparatus for voice annotation and use of images according to the embodiment of the present application includes: an acquisition module, a generation module, an association module, and a storage module. The acquisition module is used to acquire the image to be marked; the generation module is used to generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed on the to-be-marked image. The association module is used for associating the voice annotation label and the voice information; and the storage module is used to save the marked image and the voice information.
本申请实施方式的电子装置包括:一个或多个处理器及存储器。一个或多个所述处理器用于获取待标注图像;根据输入的语音信息及所述待标注图像生成已标注图像,所述已标注图像包括语音标注标签,所述语音标注标签显示于所述待标注图像中;及关联所述语音标注标签及所述语音信息。所述存储器用于保存所述已标注图像及所述语音信息。The electronic device of the embodiment of the present application includes: one or more processors and a memory. One or more of the processors are used to obtain an image to be marked; generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed on the to-be-marked image. annotating the image; and associating the voice annotation label and the voice information. The memory is used for saving the marked image and the voice information.
本申请实施方式的非易失性计算机可读存储介质包含有计算机程序。当计算机程序被一个或多个处理器执行时,使得处理器实现如下图像的语音标注及使用方法:获取待标注图像;根据输入的语音信息及所述待标注图像生成已标注图像,所述已标注图像包括语音标注标签,所述语音标注标签显示于所述待标注图像中;关联所述语音标注标签及所述语音信息;及保存所述已标注图像及所述语音信息。The non-volatile computer-readable storage medium of the embodiment of the present application contains a computer program. When the computer program is executed by one or more processors, it enables the processor to implement the following image annotation and usage methods: acquiring an image to be annotated; generating an annotated image according to the input voice information and the image to be annotated, the The annotated image includes a voice annotation label, which is displayed in the to-be-annotated image; associates the voice annotation label and the voice information; and saves the annotated image and the voice information.
本申请的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the present application will be set forth, in part, from the following description, and in part will become apparent from the following description, or may be learned by practice of the present application.
附图说明Description of drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:
图1是本申请某些实施方式的图像的语音标注及使用方法的流程示意图;1 is a schematic flowchart of a voice annotation of an image and a method of using it according to some embodiments of the present application;
图2是本申请某些实施方式的图像的语音标注及使用方法的待标注图像执行语音标注的示意图;2 is a schematic diagram of performing voice annotation on images to be marked for voice annotation of images and methods of use according to some embodiments of the present application;
图3是本申请某些实施方式的图像的语音标注及使用方法的已标注图像的示意图;3 is a schematic diagram of a marked image of the voice annotation of an image and a method of using it according to some embodiments of the present application;
图4是本申请某些实施方式的图像的语音标注及使用装置的结构示意图;4 is a schematic structural diagram of a device for voice annotation and use of images according to some embodiments of the present application;
图5是本申请某些实施方式的电子装置的结构示意图;5 is a schematic structural diagram of an electronic device according to some embodiments of the present application;
图6至图14是本申请某些实施方式的图像的语音标注及使用方法的流程示意图;6 to 14 are schematic flowcharts of voice annotation of images and methods of using them according to some embodiments of the present application;
图15是本申请某些实施方式的图像的语音标注及使用方法对超出显示区的已标注图像的语音标注使用的示意图;15 is a schematic diagram of the use of voice annotation of an image and a method of using the image according to some embodiments of the present application to the voice annotation of the marked image beyond the display area;
图16是本申请某些实施方式的非易失性计算机可读存储介质与处理器的连接示意图。FIG. 16 is a schematic diagram of the connection between the non-volatile computer-readable storage medium and the processor according to some embodiments of the present application.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.
下文的公开提供了许多不同的实施例或例子用来实现本申请的不同结构。为了简化本申请的公开,下文中对特定例子的部件和设置进行描述。当然,它们仅仅为示例,并且目的不在于限制本申请。此外,本申请可以在不同例子中重复参考数字和/或字母。这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施例和/或设置之间的关系。此外,本申请提供了的各种特定的工艺和材料的例子,但是本领域普通技术人员可以意识到其他工艺的可应用于性和/或其他材料的使用。The following disclosure provides many different embodiments or examples for implementing different structures of the present application. To simplify the disclosure of the present application, the components and arrangements of specific examples are described below. Of course, they are only examples and are not intended to limit the application. Furthermore, this application may repeat reference numerals and/or letters in different instances. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed. In addition, this application provides examples of various specific processes and materials, but one of ordinary skill in the art will recognize the applicability of other processes and/or the use of other materials.
本申请实施方式提供一种图像的语音标注及使用方法,该图像的语音标注及使用方法包括:获取待标注图像;根据输入的语音信息及待标注图像生成已标注图像,已标注图像包括语音标注标签,语音标注标签显示于待标注图像中;关联语音标注标签及语音信息;及保存已标注图像及语音信息。The embodiments of the present application provide a voice annotation of an image and a method of using the image. The voice annotation of the image and the method of using the image include: acquiring an image to be marked; generating a marked image according to the input voice information and the image to be marked, and the marked image includes the voice annotation tag, the voice tagging tag is displayed in the image to be tagged; associates the voice tagging tag and voice information; and saves the tagged image and voice information.
在某些实施方式中,根据输入的语音信息及待标注图像生成已标注图像,包括:根据语音信息生成语音标注标签;及在待标注图像中显示语音标注标签,以生成已标注图像。In some embodiments, generating a marked image according to the input voice information and the image to be marked includes: generating a voice marking label according to the voice information; and displaying the voice marking label in the to-be-marked image to generate the marked image.
在某些实施方式中,根据输入的语音信息及待标注图像生成已标注图像,包括:根据语音信息生成语音标注标签;在待标注图像中显示语音标注标签;及对语音标注标签进行处理,以生成已标注图像,处理包括播放、删除、拖拽中的至少一个。In some embodiments, generating a marked image according to the input voice information and the image to be marked includes: generating a voice marking label according to the voice information; displaying the voice marking label in the image to be marked; and processing the voice marking label to Generating annotated images, the processing includes at least one of playing, deleting, and dragging.
在某些实施方式中,保存已标注图像及语音信息,包括:将已标注图像及语音信息保存为一个视频文件。In some embodiments, saving the marked image and voice information includes: saving the marked image and voice information as a video file.
在某些实施方式中,图像的语音标注及使用方法还包括:播放已标注图像及语音信息。In some embodiments, the voice annotation of the image and the method for using the image further include: playing the marked image and voice information.
在某些实施方式中,语音标注标签为多个,多个语音标注标签具有预定的播放顺序,播放已标注图像及语音信息包括:按照播放顺序播放与语音标注标签关联的语音信息。In some embodiments, there are multiple voice annotation tags, and the multiple voice annotation tags have a predetermined playing order. Playing the marked images and voice information includes: playing the voice information associated with the voice annotation tags in the playing order.
在某些实施方式中,保存已标注图像及语音信息,包括:将已标注图像保存为第一格式文件;将语音信息保存为第二格式文件;及将第一格式文件与第二格式文件分别保存。In some embodiments, saving the marked image and voice information includes: saving the marked image as a first format file; saving the voice information as a second format file; and saving the first format file and the second format file separately save.
在某些实施方式中,图像的语音标注及使用方法还包括:触发语音标注标签以播放与语音标注标签关联的语音信息。In some embodiments, the voice annotation of the image and the method of using the image further include: triggering the voice annotation tag to play the voice information associated with the voice annotation tag.
在某些实施方式中,图像的语音标注及使用方法还包括:播放与显示区内的语音标注标签关联的语音信息;在播放完与显示区内的语音标注标签关联的语音信息后,滚动显示已标注图像以使未播放的语音标注标签进入显示区;及播放与进入显示区内的语音标注标签关联的语音信息。In some embodiments, the voice annotation of the image and the method for using it further include: playing the voice information associated with the voice annotation tag in the display area; after playing the voice information associated with the voice annotation tag in the display area, scrolling the display Annotating the image so that unplayed voice annotation tags enter the display area; and playing voice information associated with the voice annotation tags entering the display area.
本申请实施方式还提供一种图像的语音标注及使用装置,该图像的语音标注及使用装置包括:获取模块、生成模块、关联模块及存储模块。获取模块用于获取待标注图像。生成模块用于根据输入的语音信息及待标注图像生成已标注图像,已标注图像包括语音标注标签,语音标注标签显示于待标注图像中。关联模块用于关联语音标注标签及语音信息。存储模块用于保存已标注图像及语音信息。Embodiments of the present application further provide a device for voice annotation and use of images, the device for voice annotation and use of images includes: an acquisition module, a generation module, an association module, and a storage module. The acquisition module is used to acquire the image to be labeled. The generating module is configured to generate a marked image according to the input voice information and the image to be marked, the marked image includes a voice marked label, and the voice marked label is displayed in the to-be-marked image. The association module is used to associate the voice annotation label and voice information. The storage module is used to save the marked images and voice information.
本申请实施方式还提供一种电子装置,电子装置包括一个或多个处理器及存储器。一个或多个处理器用于获取待标注图像;根据输入的语音信息及待标注图像生成已标注图像,已标注图像包括语音标注标签,语音标注标签显示于待标注图像中;及关联语音标注标签及语音信息;存储器用于保存已标注图像及语音信息。Embodiments of the present application further provide an electronic device, where the electronic device includes one or more processors and a memory. One or more processors are used to obtain the image to be marked; generate a marked image according to the input voice information and the image to be marked, the marked image includes a voice marked tag, and the voice marked tag is displayed in the image to be marked; and the associated voice marked tag and Voice information; the memory is used to save the marked images and voice information.
在某些实施方式中,一个或多个处理器还用于:根据语音信息生成语音标注标签;控制在待标注图像中显示语音标注标签,以生成已标注图像。In some embodiments, the one or more processors are further configured to: generate a voice annotation label according to the voice information; and control the display of the voice annotation label in the image to be annotated, so as to generate an annotated image.
在某些实施方式中,一个或多个处理器还用于:根据语音信息生成语音标注标签;控制在待标注图像中显示语音标注标签;及对语音标注标签进行处理,以生成已标注图像,处理包括播放、删除、拖拽中的至少一个。In some embodiments, the one or more processors are further configured to: generate a voice annotation tag according to the voice information; control the display of the voice annotation tag in the image to be annotated; and process the voice annotation tag to generate an annotated image, The processing includes at least one of playing, deleting, and dragging.
在某些实施方式中,存储器还用于将已标注图像及语音信息保存为一个视频文件。In some embodiments, the memory is also used to save the annotated image and voice information as a video file.
在某些实施方式中,电子装置还包括显示器和扬声器,显示器用于显示已标注图像,扬声器用于播放语音信息。In some embodiments, the electronic device further includes a display and a speaker, the display is used for displaying the marked image, and the speaker is used for playing the voice information.
在某些实施方式中,语音标注标签为多个,多个语音标注标签具有预定的播放顺序,扬声器还用于按照播放顺序播放与语音标注标签关联的语音信息。In some embodiments, there are multiple voice annotation tags, the multiple voice annotation tags have a predetermined playing order, and the speaker is further configured to play the voice information associated with the voice annotation tags according to the playing order.
在某些实施方式中,存储器还用于:将已标注图像保存为第一格式文件;将语音信息保存为第二格式文件;及将第一格式文件与第二格式文件分别保存。In some embodiments, the memory is further used for: saving the marked image as a first format file; saving the voice information as a second format file; and saving the first format file and the second format file separately.
在某些实施方式中,扬声器还用于根据触发的语音标注标签播放与语音标注标签关联的语音信息。In some embodiments, the speaker is further configured to play voice information associated with the voice annotation tag according to the triggered voice annotation tag.
在某些实施方式中,扬声器还用于:播放与显示区内的语音标注标签关联的语音信息;在播放完与显示区内的语音标注标签关联的语音信息后,滚动显示已标注图像以使未播放的语音标注标签进入显示区;及播放与进入显示区内的语音标注标签关联的语音信息。In some embodiments, the speaker is further used to: play the voice information associated with the voice annotation tags in the display area; after playing the voice information associated with the voice annotation tags in the display area, scroll and display the marked images to make The unplayed voice annotation tag enters the display area; and the voice information associated with the voice annotation tag entered into the display area is played.
本申请实施方式还提供一种存储有计算机程序的非易失性计算机可读存储介质,当计算机程序被一个或多个处理器执行时,实现上述任意一项的图像的语音标注及使用方法。Embodiments of the present application further provide a non-volatile computer-readable storage medium storing a computer program, when the computer program is executed by one or more processors, it implements any of the above-mentioned image annotation and usage methods.
请参阅图1至图3,本申请实施方式提供一种图像的语音标注及使用方法,该图像的语音标注及使用方法包括:Please refer to FIG. 1 to FIG. 3 , an embodiment of the present application provides a voice annotation of an image and a method of using the image, and the voice annotation of the image and the method of using include:
01:获取待标注图像P1;01: Obtain the image P1 to be marked;
02:根据输入的语音信息及待标注图像P1生成已标注图像P2,已标注图像P2包括语音标注标签V,语音标注标签V显示于待标注图像P1中;02: generate a marked image P2 according to the input voice information and the image to be marked P1, the marked image P2 includes a voice marked label V, and the voice marked label V is displayed in the to-be-marked image P1;
03:关联语音标注标签V及语音信息;及03: Associate the voice annotation tag V with the voice information; and
04:保存已标注图像P2及语音信息。04: Save the marked image P2 and voice information.
请结合图4,本申请实施方式提供一种图像的语音标注及使用装置10。图像的语音标注及使用装置10包括获取模块11、生成模块12、关联模块13及存储模块14。本申请实施方式的图像的语音标注及使用方法可应用于图像的语音标注及使用装置10中,其中,获取模块11、生成模块12、关联模块13及存储模块14分别用于执行01、02、03及04中的方法。即,获取模块11用于获取待标注图像P1;生成模块12用于根据输入的语音信息及待标注图像P1生成已标注图像P2,已标注图像P2包括语音标注标签V,语音标注标签V显示于待标注图像P1中;关联模块13用于关联语音标注标签V及语音信息;及存储模块14用于保存已标注图像P2及语音信息。Referring to FIG. 4 , an embodiment of the present application provides an apparatus 10 for voice annotation and use of images. The apparatus 10 for voice annotation and use of images includes an acquisition module 11 , a generation module 12 , an association module 13 and a storage module 14 . The image voice annotation and use method according to the embodiment of the present application can be applied to the image voice annotation and use device 10, wherein the acquisition module 11, the generation module 12, the association module 13 and the storage module 14 are respectively used to execute 01, 02, Methods in 03 and 04. That is, the acquiring module 11 is used to acquire the image P1 to be marked; the generating module 12 is used to generate the marked image P2 according to the input voice information and the image P1 to be marked, and the marked image P2 includes the voice annotation label V, which is displayed on the In the image P1 to be marked; the association module 13 is used for associating the voice marking label V and the voice information; and the storage module 14 is used for saving the marked image P2 and the voice information.
请结合图5,本申请实施方式提供一种电子装置100。电子装置100包括一个或多个处理器30及存储器50。本实施申请方式的图像的语音标注及使用方法可应用于电子装置100中,其中,一个或多个处理器30用于执行01、02和03中的方法,存储器50用于执行04中的方法。即,一个或多个处理器30用于:获取待标注图像P1;根据输入的语音信息及待标注图像P1生成已标注图像P2,已标注图像P2包括语音标注标签V,语音标注标签V显示于待标注图像P1中;关联语音标注标签V及语音信息。存储器50用于保存已标注图像P2及语音信息。Please refer to FIG. 5 , an embodiment of the present application provides an electronic device 100 . The electronic device 100 includes one or more processors 30 and a memory 50 . The voice annotation of images and the method for using the image in this embodiment can be applied to the electronic device 100, wherein one or more processors 30 are used for executing the methods in 01, 02 and 03, and the memory 50 is used for executing the method in 04 . That is, one or more processors 30 are used to: acquire the image P1 to be marked; generate the marked image P2 according to the input voice information and the image P1 to be marked, and the marked image P2 includes the voice marking label V, and the voice marking label V is displayed on the In the image P1 to be marked; associate the voice marking label V and the voice information. The memory 50 is used to store the marked image P2 and voice information.
随着手机、平板电脑、电脑等电子装置的发展,这些装置逐渐成为人们获取外界信息的重要工具,当需要将一些重要的信息保留下来时,往往会通过图像的方式保存下来,并对保存下来的图像进行信息标注,例如对图像中的文字符号信息进行相关说明或理解、对图像拍摄日期进行标注、对图像进行信息更正等,以方便再次查看图像时能够快速获取图像中重要信息。但当前的图像标注方式仅能通过文字、画笔等进行,标注效率较低。本申请的图像的语音标注及使用方法,通过输入语音信息实现对待标注图像P1的语音标注,相较于传统的文字、画笔等标注方式,提升了图像标注的效率。With the development of electronic devices such as mobile phones, tablet computers, and computers, these devices have gradually become important tools for people to obtain information from the outside world. For example, to explain or understand the text symbol information in the image, to mark the date when the image was taken, to correct the information of the image, etc., so that the important information in the image can be quickly obtained when viewing the image again. However, the current image labeling method can only be performed through text, brushes, etc., and the labeling efficiency is low. The voice annotation of an image and its use method of the present application realize the voice annotation of the image P1 to be marked by inputting voice information, which improves the efficiency of image annotation compared with the traditional annotation methods such as text and brushes.
请参阅图4及图5,具体地,电子装置100可以是手机、笔记本电脑、智能手表、电脑等终端设备,图像的语音标注及使用装置10可以是安装在电子装置100中的应用程序,例如,截图、相册等应用程序;还可以是某些应用程序中的某一功能模块,例如图像编辑功能;本申请仅以电子装置100是手机为例进行说明,电子装置100是其他类型的终端时的情形与手机类似,不详细展开说明。Please refer to FIG. 4 and FIG. 5 , specifically, the electronic device 100 may be a terminal device such as a mobile phone, a notebook computer, a smart watch, a computer, etc. The device 10 for voice annotation and use of images may be an application program installed in the electronic device 100 , for example , application programs such as screenshots, photo albums, etc.; it can also be a certain functional module in some application programs, such as an image editing function; this application only takes the electronic device 100 as a mobile phone as an example for description, and when the electronic device 100 is other types of terminals The situation is similar to that of mobile phones, so we will not explain in detail.
在一个实施例中,获取模块11或一个或多个处理器30获取待标注图像P1可通过拍摄的方式获取一张图像作为待标注图像P1。在另一个实施例中,获取模块11或一个或多个处理器30获取待标注图像P1可从电子装置100中的相册中获取一张图像作为待标注图像P1。在再一个实施例中,获取模块11或一个或多个处理器30获取待标注图像P1可通过电子装置100截图的方式获取一张图像作为待标注图像P1。当然,获取模块11或一个或多个处理器30获取待标注图像P1还可以是其他方式,在此不做限制。In one embodiment, the acquisition module 11 or one or more processors 30 may acquire an image to be labeled P1 by capturing an image as the to-be-labeled image P1. In another embodiment, the acquiring module 11 or one or more processors 30 acquires the image P1 to be tagged, and may acquire an image from the album in the electronic device 100 as the image P1 to be tagged. In yet another embodiment, the acquisition module 11 or one or more processors 30 may acquire an image to be marked P1 by taking a screenshot of the electronic device 100 to acquire an image as the to-be-marked image P1. Of course, the acquisition module 11 or one or more processors 30 may acquire the image P1 to be marked in other ways, which are not limited here.
请结合图2,在进入待标注图像P1的语音标注界面后,长按录音标签L进行录音以输入语音信息,松手后录音结束,完成语音信息的输入。生成模块12或一个或多个处理器30根据输入的语音信息及获 取的待标注图像P1生成已标注图像P2,已标注图像P2中包括语音标注标签V,语音标注标签V显示于待标注图像P1中,具体地,语音标注标签V的初始显示位置可以为待标注图像P1的底部位置。关联模块13或一个或多个处理器30将输入的语音信息关联语音标注标签V,其中,用户可通过录音标签L一次或多次输入语音信息,每一次输入语音信息便关联一个语音标注标签V,如此,已标注图像P2中可包括多个语音标注标签V,以实现对待标注图像P1多语音标注功能。待完成语音标注后,存储模块14或存储器50保存已标注图像P2和标注的语音信息,以便于再次查看已标注图像P2时,可收听已标注图像P2中的语音信息。本申请的图像的语音标注及使用方法中,通过录入语音信息实现对待标注图像P1的信息标注,相较于通过文字、画笔等标注方式,提升图像标注的效率。Please refer to FIG. 2 , after entering the voice annotation interface of the image P1 to be marked, long press the recording label L to record to input voice information, and when you release the hand, the recording ends and the input of the voice information is completed. The generating module 12 or one or more processors 30 generates a marked image P2 according to the input voice information and the acquired image P1 to be marked, and the marked image P2 includes a voice marking label V, and the voice marking label V is displayed on the image P1 to be marked Specifically, the initial display position of the voice annotation label V may be the bottom position of the image P1 to be annotated. The association module 13 or one or more processors 30 associates the input voice information with the voice annotation tag V, wherein the user can input the voice information one or more times through the recording tag L, and each time the input voice information is associated with a voice annotation tag V , in this way, the marked image P2 may include multiple voice marking labels V, so as to realize the multi-voice marking function of the image P1 to be marked. After the voice annotation is completed, the storage module 14 or the memory 50 saves the marked image P2 and the marked voice information, so that when viewing the marked image P2 again, the voice information in the marked image P2 can be listened to. In the voice annotation of an image and the method for using it of the present application, the information annotation of the image P1 to be annotated is realized by inputting the voice information, which improves the efficiency of image annotation compared with the annotation methods such as text and brushes.
在本申请的实施例中,图像的语音标注及使用装置10还可实现对待标注图像P1进行文字、画笔标注。例如,在执行文字标注功能时,用户可通过录音的方式输入语音信息,生成模块12或一个或多个处理器30根据输入的语音信息转换成文字信息显示在待标注图像P1中,以生成已标注图像P2。或者,用户直接输入文字信息实现对待标注图像P1的文字标注。再例如,在执行画笔标注功能时,用户可通过录音的方式输入语音信息,生成模块12或一个或多个处理器30根据输入的语音信息转换成图画信息显示在待标注图像P1中,以生成已标注图像P2。或者,用户直接输入图画信息(在待标注图像P1中画画)实现对待标注图像P1的画笔标注。即,本申请实施方式的图像的语音标注及使用装置10既能够实现对待标注图像P1的语音标注功能,还能实现对待标注图像P1的文字、画笔标注功能,应用场景更加多样,为用户提供了更多标注的选择。In the embodiment of the present application, the apparatus 10 for voice annotation and use of images may also implement text and brush annotation on the image P1 to be annotated. For example, when performing the text annotation function, the user can input voice information by recording, and the generation module 12 or one or more processors 30 convert the input voice information into text information and display it in the image to be marked P1, so as to generate a Annotate image P2. Alternatively, the user directly inputs text information to implement text labeling of the image P1 to be labelled. For another example, when performing the brush marking function, the user can input voice information by recording, and the generation module 12 or one or more processors 30 converts the input voice information into picture information and displays it in the image to be marked P1 to generate a picture. Image P2 is annotated. Alternatively, the user directly inputs drawing information (drawing in the image P1 to be annotated) to realize the brush annotation of the image P1 to be annotated. That is, the apparatus 10 for voice annotation and use of images according to the embodiment of the present application can realize not only the voice annotation function of the image P1 to be annotated, but also the text and brush annotation functions of the image P1 to be annotated, the application scenarios are more diverse, and the user is provided with More callout options.
请参阅图2及图6,在某些实施方式中,02:根据输入的语音信息及待标注图像P1生成已标注图像P2,包括:Please refer to FIG. 2 and FIG. 6 , in some embodiments, 02: Generate a marked image P2 according to the input voice information and the image P1 to be marked, including:
021:根据语音信息生成语音标注标签;及021: generate a voice annotation label based on the voice information; and
023:在待标注图像P1中显示语音标注标签V,以生成已标注图像P2。023: Display the voice annotation label V in the to-be-annotated image P1 to generate an annotated image P2.
请结合图4,生成模块12还用于执行021及023中的方法,即,生成模块12还用于根据语音信息生成语音标注标签;及控制在待标注图像P1中显示语音标注标签V,以生成已标注图像P2。Please refer to Fig. 4, the generation module 12 is also used to execute the methods in 021 and 023, that is, the generation module 12 is also used to generate a voice annotation label according to the voice information; Generate annotated image P2.
请结合图5,一个或多个处理器30还用于执行021及023中的方法,即,一个或多个处理器30还用于根据语音信息生成语音标注标签;及控制在待标注图像P1中显示语音标注标签V,以生成已标注图像P2。Please refer to FIG. 5, one or more processors 30 are also used to execute the methods in 021 and 023, that is, one or more processors 30 are also used to generate a voice annotation label according to the voice information; and control the image P1 to be annotated. The voice annotation label V is displayed in , to generate the annotated image P2.
在一个实施例中,获取得到待标注图像P1后,用户通过录音的方式实现语音信息的录入,具体地,用户通过生成模块12或一个或多个处理器30根据输入的语音信息生成对应的语音标注标签V,相应的,用户每录入一个语音信息,生成模块12或一个或多个处理器30生成与输入的语音信息对应的语音标注标签,同时,生成模块12或一个或多个处理器30控制相应的语音标注标签V在待标注图像P1中显示,以生成已标注图像P2,保证用户再次查看已标注图像P2时,能够快速了解到已标注图像P2中标注的语音信息。如果已标注图像P2中没有显示与输入的语音信息关联的语音标注标签V,用户在输入语音信息完成之后,无法确定待标注图像P1中是否成功录入相关的语音信息;或者当用户再次查看已标注图像P2时,如果已标注图像P2中没有显示与输入的语音信息关联的语音标注标签V,用户无法确定已标注图像P2中是否存在标注的语音信息,从而导致出现用户需要对待标注图像P1进行二次语音标注的现象,图像标注的效率较低。本申请的图像的语音标注及使用方法中,生成模块12或一个或多个处理器30根据输入的语音信息生成对应的语音标注标签V,并控制语音标注标签V显示在待标注图像P1中,从而生成已标注图像P2,方便用户确认语音信息是否成功录入,且方便用户再次查看已标注图像P2时快速了解到已标注图像P2中标注的语音信息,提升图像标注的效率。In one embodiment, after obtaining the image P1 to be marked, the user records the voice information. Specifically, the user generates the corresponding voice according to the input voice information through the generation module 12 or one or more processors 30. Labeling label V, correspondingly, each time the user enters a piece of speech information, the generation module 12 or one or more processors 30 generates a speech labeling label corresponding to the input speech information, and at the same time, the generation module 12 or one or more processors 30 The corresponding voice annotation label V is controlled to be displayed in the to-be-annotated image P1 to generate an annotated image P2 to ensure that the user can quickly learn the voice information annotated in the annotated image P2 when viewing the annotated image P2 again. If the voice annotation tag V associated with the input voice information is not displayed in the marked image P2, the user cannot determine whether the relevant voice information has been successfully entered in the to-be-marked image P1 after inputting the voice information; or when the user checks the marked voice information again When the image P2 is used, if the voice annotation label V associated with the input voice information is not displayed in the marked image P2, the user cannot determine whether there is marked voice information in the marked image P2, so that the user needs to perform two steps on the marked image P1. The phenomenon of secondary voice annotation, the efficiency of image annotation is low. In the voice annotation of images of the present application and the method for using them, the generation module 12 or one or more processors 30 generate corresponding voice annotation labels V according to the input voice information, and control the voice annotation labels V to be displayed in the image to be labeled P1, Thereby, the marked image P2 is generated, which is convenient for the user to confirm whether the voice information is successfully inputted, and it is convenient for the user to quickly learn the voice information marked in the marked image P2 when viewing the marked image P2 again, thereby improving the efficiency of image marking.
请参阅图2及图7,在某些实施方式中,023:在待标注图像P1中显示语音标注标签V,以生成已标注图像P2,包括:Referring to FIG. 2 and FIG. 7, in some embodiments, step 023: displaying the voice annotation label V in the image to be annotated P1, to generate an annotated image P2, including:
0231:在待标注图像P1中显示语音标注标签V;及0231: Display the voice annotation label V in the image P1 to be annotated; and
0233:对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个。0233: Process the voice annotation tag V to generate a marked image P2, and the processing includes at least one of playing, deleting, and dragging.
请结合图4,生成模块12还用于执行0231及0233中的方法,即,生成模块12还用于:控制在待标注图像P1中显示语音标注标签V;及对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个。Please refer to FIG. 4, the generation module 12 is also used to execute the methods in 0231 and 0233, that is, the generation module 12 is also used to: control the display of the voice annotation label V in the image P1 to be marked; and process the voice annotation label V, To generate the marked image P2, the processing includes at least one of playing, deleting, and dragging.
请结合图5,一个或多个处理器30还用于执行021及023中的方法,即,一个或多个处理器30还用于:控制在待标注图像P1中显示语音标注标签V;及对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个。Please refer to FIG. 5 , one or more processors 30 are also used to execute the methods in 021 and 023, that is, one or more processors 30 are also used to: control the display of the voice annotation label V in the image P1 to be annotated; and The voice annotation tag V is processed to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging.
进一步地,用户通过长按录音标签L进行录音,松手后录音结束并完成语音信息的输入,生成模块12或一个或多个处理器30根据输入的语音信息控制在待标注图像P1中显示相应的语音标注标签V,此时,用户可对录音结束的语音标注标签V进行播放、删除、拖拽中的至少一个。例如,用户对录音结束的语音标注标签V进行播放,此时,语音标注标签V显示图标的动画以播放关联的语音信息,方便用户试听录入的语音信息,判断录入的语音信息是否准确、声音是否清晰等;再例如,用户对录音结束的语音标注标签V进行删除,通过单击选中预删除的语音标注标签V,此时,语音标注标签V出现删除的图标,点击删除图像即可删除该语音标签;又例如,用户对录音结束的语音标注标签V进行拖拽以将该语音标注标签V放置在待标注图像P1中的适应的位置显示,具体地,用户可通过长按语音标注标签V进行拖拽。如,待标注图像P1中有文字信息,当需要对待标注图像P1中的某一行文字或词语进行标注时,在对该行文字或词语进行语音信息输入后,可将关联该语音信息的语音标注标签V拖拽到该行文字或词语附近,以生成已标注图像P2,当用户再次查看已标注图像P2时,可快速了解该语音标注标签关联的语音信息标注的相关信息。还例如,用户对录音结束的语音标注标签V进行播放、拖拽处理,或者对录音结束的语音标注标签V进行播放、删除处理,或者对录音结束的语音标注标签V进行拖拽、删除处理,或者对录音结束的语音标注标签V进行播放、拖拽、删除处理,具体的处理根据实际情况执行,在此不作限制。Further, the user performs recording by long pressing the recording label L. After letting go, the recording ends and the input of the voice information is completed. The voice annotation tag V. At this time, the user can at least one of play, delete, and drag the voice annotation tag V whose recording has ended. For example, the user plays the voice annotation tag V at the end of the recording. At this time, the voice annotation tag V displays an animation of the icon to play the associated voice information, which is convenient for the user to listen to the entered voice information and determine whether the entered voice information is accurate and whether the sound is accurate or not. For another example, the user deletes the voice annotation label V at the end of the recording, and clicks to select the pre-deleted voice annotation label V. At this time, a delete icon appears on the voice annotation label V, and the voice can be deleted by clicking the delete image. For another example, the user drags the voice annotation tag V at the end of the recording to display the voice annotation tag V in a suitable position in the image P1 to be marked. Specifically, the user can long press the voice annotation tag V to display drag. For example, if there is text information in the image P1 to be marked, when a certain line of text or word in the image to be marked P1 needs to be marked, after the voice information is input for the line of text or word, the voice associated with the voice information can be marked. Drag and drop the tag V near the line of text or word to generate a marked image P2. When the user views the marked image P2 again, he can quickly understand the relevant information marked by the voice information associated with the voice marking tag. Also for example, the user performs playback and drag processing on the voice annotation tag V at the end of the recording, or plays and deletes the voice annotation tag V at the end of the recording, or performs drag and drop processing on the voice annotation tag V at the end of the recording. Or play, drag, and delete the voice annotation tag V at the end of the recording, and the specific processing is performed according to the actual situation, which is not limited here.
请参阅图2及图8,在某些实施方式中,02:根据输入的语音信息及待标注图像P1生成已标注图像P2,还可包括:Please refer to FIG. 2 and FIG. 8. In some embodiments, step 02: Generate an labeled image P2 according to the input voice information and the image P1 to be labeled, which may further include:
021:根据语音信息生成语音标注标签;021: Generate a voice annotation label according to the voice information;
023:在待标注图像P1中显示语音标注标签V,以生成已标注图像P2;及023: Display the voice annotation label V in the to-be-labeled image P1 to generate the labeled image P2; and
025:对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个。025: Process the voice annotation tag V to generate an annotated image P2, and the processing includes at least one of playing, deleting, and dragging.
请结合图4,生成模块12还用于执行021、023及025中的方法,即,生成模块12还用于根据语音信息生成语音标注标签;控制在待标注图像P1中显示语音标注标签V;及对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个。Please refer to Fig. 4, the generation module 12 is also used to execute the method in 021, 023 and 025, that is, the generation module 12 is also used to generate a voice annotation label according to the voice information; Control to display the voice annotation label V in the image P1 to be marked; and processing the voice annotation tag V to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging.
请结合图5,一个或多个处理器30还用于执行021、023及025中的方法,即,一个或多个处理器30还用于根据语音信息生成语音标注标签;控制在待标注图像P1中显示语音标注标签V;及对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个。Please refer to FIG. 5, one or more processors 30 are also used to execute the methods in 021, 023 and 025, that is, one or more processors 30 are also used to generate a voice annotation label according to the voice information; The voice annotation label V is displayed in P1; and the speech annotation label V is processed to generate a marked image P2, and the processing includes at least one of playing, deleting, and dragging.
在另一个实施例中,用户通过长按录音标签L进行录音,松手后录音结束并完成语音信息的输入,生成模块12或一个或多个处理器30根据输入的语音信息生成语音标注标签V,并控制在待标注图像P1中显示相应的语音标注标签V,以生成已标注图像P2。此时,用户可根据实际情况对语音标注标签V进行播放、删除、拖拽处理中的至少一个,具体实施方式如同上述,在此不再赘述。在用户对语音标注标签V的处理工作完成后,生成模块12或一个或多个处理器30实时更新已标注图像P2,以保证已标注图像P2中的语音标注标签V与经过处理后的语音标注标签V对应。In another embodiment, the user performs recording by long pressing the recording label L, and the recording ends and the input of the voice information is completed after releasing the hand, and the generation module 12 or one or more processors 30 generates the voice annotation label V according to the input voice information, And control to display the corresponding voice annotation label V in the to-be-annotated image P1 to generate an annotated image P2. At this time, the user can perform at least one of playing, deleting, and dragging the voice annotation tag V according to the actual situation. After the user's processing of the voice annotation label V is completed, the generation module 12 or one or more processors 30 update the marked image P2 in real time to ensure that the voice annotation label V in the marked image P2 and the processed voice annotation Label V corresponds.
请参阅图3及图9,在某些实施方式中,04:保存已标注图像P2及语音信息,包括:Please refer to FIG. 3 and FIG. 9, in some embodiments, 04: Save the marked image P2 and the voice information, including:
041:将已标注图像P2及语音信息保存为一个视频文件。041: Save the marked image P2 and the voice information as a video file.
请结合图4,存储模块14还用于执行041中的方法,即,存储模块14还用于将已标注图像P2及语音信息保存为一个视频文件。Please refer to FIG. 4 , the storage module 14 is further configured to execute the method in 041, that is, the storage module 14 is further configured to save the marked image P2 and the voice information as a video file.
请结合图5,存储器50还用于执行041中的方法,即,存储器50还用于将已标注图像P2及语音信息保存为一个视频文件。Please refer to FIG. 5 , the memory 50 is also used to execute the method in 041, that is, the memory 50 is also used to save the marked image P2 and the voice information as a video file.
在一个实施例中,存储模块14或存储器50将关联后的语音信息及已标注图像P2(包括关联语音信息的语音标注标签V)经过后期处理以视频文件的格式(如MPEG格式、AVI格式、nAVI格式、ASF格式、MOV格式、WMV格式等)合并保存到电子装置100中,以一个文件的格式将已标注图像P2和语音信息保存到电子装置100中,可节省电子装置100的存储空间,同时,再次调用已标注图像P2及语音信息时操作简单。例如,存储模块15或存储器50将语音信息及已标注图像P2通过视频封装的格 式保存为MPEG格式(Moving Picture Experts Group,运动图像专家组格式),当用户查看已标注图像P2和相关的语音信息时仅需通过一个视频文件便可查看已标注图像P2中的语音信息。In one embodiment, the storage module 14 or the memory 50 post-processes the associated voice information and the marked image P2 (including the voice annotation tag V associated with the voice information) in a video file format (such as MPEG format, AVI format, nAVI format, ASF format, MOV format, WMV format, etc.) are combined and stored in the electronic device 100, and the marked image P2 and the voice information are stored in the electronic device 100 in the format of one file, which can save the storage space of the electronic device 100, At the same time, the operation is simple when calling the marked image P2 and the voice information again. For example, the storage module 15 or the memory 50 saves the voice information and the marked image P2 in the MPEG format (Moving Picture Experts Group, the format of the Moving Picture Experts Group) through the video encapsulation format. When the user views the marked image P2 and the related voice information At the same time, the voice information in the marked image P2 can be viewed through only one video file.
请参阅图3及图10,在某些实施方式中,图像的语音标注及使用方法还可包括:Please refer to FIG. 3 and FIG. 10, in some embodiments, the voice annotation of the image and the use method may further include:
05:播放已标注图像P2及语音信息。05: Play the marked image P2 and voice information.
请结合图4,本申请实施方式的图像的语音标注及使用装置10还可包括播放模块15,播放模块15还用于执行05中的方法,即,播放模块15还用于播放已标注图像P2及语音信息。Please refer to FIG. 4 , the apparatus 10 for voice annotation and use of images according to the embodiment of the present application may further include a playback module 15, and the playback module 15 is further configured to execute the method in 05, that is, the playback module 15 is also configured to play the marked image P2 and voice messages.
请结合图5,本申请实施方式的电子装置100还可包括显示器70及扬声器90,其中,显示器70和扬声器90用于执行05中的方法。即,显示器70用于显示已标注图像P2,扬声器90用于播放语音信息。Please refer to FIG. 5 , the electronic device 100 of the embodiment of the present application may further include a display 70 and a speaker 90 , wherein the display 70 and the speaker 90 are used to execute the method in 05 . That is, the display 70 is used to display the marked image P2, and the speaker 90 is used to play voice information.
本申请的实施例中,存储模块14或存储器50将语音信息及已标注图像P2保存为一个视频文件的格式,用户再次查看该语音信息及已标注图像P2时,可通过播放模块15或显示器70、扬声器90查看已标注图像P2及语音信息。具体地,播放模块15播放视频时,播放已标注图像P2中的语音信息,实现图像标注的有声记录及播放;或者,显示器70显示视频中的已标注图像P2(包括语音标注标签V),扬声器90播放视频中(已标注图像P2)中的语音信息。In the embodiment of the present application, the storage module 14 or the memory 50 saves the voice information and the marked image P2 in the format of a video file. When the user views the voice information and the marked image P2 again, the playback module 15 or the display 70 . The speaker 90 checks the marked image P2 and the voice information. Specifically, when the playback module 15 plays the video, it plays the voice information in the marked image P2 to realize audio recording and playback of the image marking; or, the display 70 displays the marked image P2 (including the voice marking label V) in the video, and the speaker 90 The voice information in the video (annotated image P2) is played.
请参阅图3及图11,在某些实施方式中,语音标注标签V包括多个,多个语音标注标签V具有预定的播放顺序,方法05:播放已标注图像P2及语音信息,包括:Please refer to FIG. 3 and FIG. 11 , in some embodiments, the voice annotation tags V include multiple, and the multiple voice annotation tags V have a predetermined playback order. Method 05: Play the marked image P2 and the voice information, including:
051:按照播放顺序播放与语音标注标签V关联的语音信息。051: Play the voice information associated with the voice annotation tag V according to the playing sequence.
请结合图4,播放模块15还用于执行051中的方法,即,播放模块15还用于按照播放顺序播放与语音标注标签V关联的语音信息。Please refer to FIG. 4 , the playing module 15 is further configured to execute the method in 051, that is, the playing module 15 is further configured to play the voice information associated with the voice annotation tag V according to the playing sequence.
请结合图5,扬声器90还用于执行051中的方法。即,扬声器90还用于按照播放顺序播放与语音标注标签V关联的语音信息。Please refer to FIG. 5 , the speaker 90 is also used to execute the method in 051 . That is, the speaker 90 is also used for playing the voice information associated with the voice annotation tag V in the playing order.
具体地,一个或多个处理器30控制生成的多个语音标注标签V具有预定的播放顺序,当播放模块15或扬声器90播放语音标注标签V关联的语音信息时,按照预定的播放顺序播放与语音标注标签V关联的语音信息,保证已标注图像P2中的语音信息有序播放。Specifically, the one or more processors 30 control the generated multiple voice annotation tags V to have a predetermined playback order, and when the playback module 15 or the speaker 90 plays the voice information associated with the voice annotation tags V, the audio tags V are played according to the predetermined playback order. The voice information associated with the voice annotation tag V ensures that the voice information in the marked image P2 is played in an orderly manner.
在一个实施例中,一个或多个处理器30可设定语音标注标签V的播放顺序与语音标注标签V的位置关联,例如,多个语音标注标签V如图3所示的位置显示,当用户播放已标注图像P2及语音信息合并保存得到的视频时,语音标注标签V的播放顺序可以是从上到下依次播放,即,视频播放时,依次播放语音时长为34s、65s、25s的语音信息;又例如,当用户播放已标注图像P2及语音信息合并保存得到的视频时,语音标注标签V的播放顺序可以是从下到上依次播放,即,视频播放时,依次播放语音时长为25s、65s、34s的语音信息;再例如,当用户播放已标注图像P2及语音信息合并保存得到的视频时,语音标注标签V的播放顺序可以是从左到右依次播放,即,视频播放时,依次播放语音时长为65s、34s、25s的语音信息;还例如,当用户播放已标注图像P2及语音信息合并保存得到的视频时,语音标注标签V的播放顺序可以是从左到右依次播放,即,视频播放时,依次播放语音时长为25s、34s、65s的语音信息。In one embodiment, one or more processors 30 may set the playback order of the voice annotation tags V to be associated with the positions of the voice annotation tags V. For example, multiple voice annotation tags V are displayed at the positions shown in FIG. 3 , when When the user plays the video obtained by combining the marked image P2 and the voice information, the playing order of the voice marking tag V can be played from top to bottom, that is, when the video is played, the voice with the voice durations of 34s, 65s, and 25s is played in sequence. For another example, when the user plays the video obtained by combining the marked image P2 and the voice information, the playing order of the voice marking label V can be played from bottom to top, that is, when the video is played, the duration of playing the voice in turn is 25s , 65s, 34s of voice information; for another example, when the user plays the video obtained by combining the marked image P2 and the voice information, the playback order of the voice tag V can be played from left to right, that is, when the video is played, Play the voice information with the voice durations of 65s, 34s, and 25s in turn; also for example, when the user plays the video obtained by combining the marked image P2 and the voice information, the playback order of the voice annotation tag V can be played from left to right in turn, That is, when the video is played, the voice information with the voice durations of 25s, 34s, and 65s is played in sequence.
在另一个实施例中,一个或多个处理器30可设定语音标注标签V的播放顺序与语音标注标签V生成时间关联,即,在用户对待标注图像P1每输入一次语音信息标注图像时,一个或多个处理器30记录相应的语音信息的录入时间,按照录入的时间按时间先后顺序排序,例如,图3所示的语音标注标签V中,三个语音标注标签V按时间先后顺序排序得到的序列是25s、65s、34s的语音信息序列。当用户播放已标注图像P2及语音信息合并保存得到的视频时,语音标注标签V的播放顺序为依次播放25s、65s、34s的语音信息;或者语音标注标签V的播放顺序为亿次播放34s、64s、25s的语音信息。In another embodiment, one or more processors 30 may set the playback order of the voice annotation tags V to be associated with the generation time of the voice annotation tags V, that is, when the user inputs the voice information to mark the image P1 once, One or more processors 30 record the entry time of the corresponding voice information, and sort them in chronological order according to the entry time. For example, in the voice annotation labels V shown in FIG. 3 , the three voice annotation labels V are sorted in chronological order. The obtained sequence is the speech information sequence of 25s, 65s, 34s. When the user plays the video obtained by combining the marked image P2 and the voice information, the playback sequence of the voice annotation tag V is to play the voice information of 25s, 65s, and 34s in sequence; 64s, 25s voice information.
在又一个实施例中,一个或多个处理器30可设定语音标注标签V的播放顺序与视频的时间轴关联,在将已标注图像(包括语音标注标签V)及语音信息合成视频时,不同时长的语音信息会合成到视频的不同时段,当用户播放已标注图像P2及语音信息合并保存得到的视频时,一个或多个处理器30检测播放中的视频是否存在语音信息,当处于播放时刻的时段存在语音信息时,一个或多个处理器30控制播放模块15或扬声器90播放对应时段的语音信息,直至视频播放结束。此方式播放视频的同时即可自动播放视频中的语音信息,实现方式简单。In yet another embodiment, one or more processors 30 may set the playback order of the voice annotation tag V to be associated with the time axis of the video. Voice information of different durations will be synthesized into different time periods of the video. When the user plays the video obtained by combining the marked image P2 and the voice information, one or more processors 30 detect whether there is voice information in the video being played. When there is voice information in the time period, one or more processors 30 control the playback module 15 or the speaker 90 to play the voice information of the corresponding period until the video playback ends. In this way, the voice information in the video can be automatically played while playing the video, and the implementation is simple.
上述的视频保存格式实现将已标注图像P2及语音信息通过一个文件存储到电子装置100中,实现已标注图像P2及语音信息的播放方式简单。The above-mentioned video storage format realizes that the marked image P2 and the voice information are stored in the electronic device 100 through a file, and the playback mode of the marked image P2 and the voice information is simple.
请参阅图3及图12,在某些实施方式中,方法04:保存已标注图像P2及语音信息,还可包括:Referring to FIG. 3 and FIG. 12, in some embodiments, method 04: saving the marked image P2 and the voice information, may further include:
043:将已标注图像P2保存为第一格式文件;043: Save the marked image P2 as a first format file;
045:将语音信息保存为第二格式文件;及045: save the voice information as a second format file; and
047:将第一格式文件与第二格式文件分别保存。047: Save the first format file and the second format file separately.
请结合图4,存储模块14还用于执行043、045及047中的方法,即,存储模块14还用于:将已标注图像P2保存为第一格式文件;将语音信息保存为第二格式文件;及将第一格式文件与第二格式文件分别保存。Please refer to FIG. 4 , the storage module 14 is also used to execute the methods in 043, 045 and 047, that is, the storage module 14 is also used to: save the marked image P2 as the first format file; save the voice information as the second format and save the first format file and the second format file separately.
请结合图5,存储器50还用于执行041中的方法,即,存储器50还用于:将已标注图像P2保存为第一格式文件;将语音信息保存为第二格式文件;及将第一格式文件与第二格式文件分别保存。Please refer to FIG. 5, the memory 50 is also used to execute the method in 041, that is, the memory 50 is also used to: save the marked image P2 as a first format file; save the voice information as a second format file; The format file and the second format file are saved separately.
在本申请的实施例中,已标注图像P2及语音信息的保存方式还可以是将已标注图像P2及语音信息分开保存,即,存储模块14或存储器50将已标注图像P2以图像格式(如JPEG格式、RAW格式、PNG格式、GIF格式、PDF格式等)保存,将语音信息以音频格式(如MPEG格式、MPEG-4格式、MP3格式、WMA格式、FLAC格式等)保存,一个或多个处理器30将保存后的两个文件进行关联,保证播放已标注图像P2及语音信息时,播放的语音信息为对该图像所标注的语音信息。此保存方式无需对已标注图像P2及语音信息进行后续处理,存储方式简单。In the embodiment of the present application, the way of saving the marked image P2 and the voice information may also be to save the marked image P2 and the voice information separately, that is, the storage module 14 or the memory 50 stores the marked image P2 in an image format (such as JPEG format, RAW format, PNG format, GIF format, PDF format, etc.), save the voice information in audio format (such as MPEG format, MPEG-4 format, MP3 format, WMA format, FLAC format, etc.), one or more The processor 30 associates the saved two files to ensure that when the marked image P2 and the voice information are played, the played voice information is the voice information marked on the image. This storage method does not require subsequent processing of the marked image P2 and voice information, and the storage method is simple.
请参阅图3及图13,在某些实施方式中,图像的语音标注及使用方法还可包括:Please refer to FIG. 3 and FIG. 13 , in some embodiments, the voice annotation of the image and the use method may further include:
06:触发语音标注标签V以播放与语音标注标签V关联的语音信息。06: Trigger the voice annotation tag V to play the voice information associated with the voice annotation tag V.
请结合图4,播放模块15还用于执行06中的方法,即,播放模块15还用于根据触发的语音标注标签V播放与语音标注标签V关联的语音信息。Please refer to FIG. 4 , the playing module 15 is further configured to execute the method in 06, that is, the playing module 15 is further configured to play the voice information associated with the voice annotation tag V according to the triggered voice annotation tag V.
请结合图5,扬声器90还用于执行06中的方法,即,扬声器90还用于根据触发的语音标注标签V播放与语音标注标签V关联的语音信息。Please refer to FIG. 5 , the speaker 90 is further configured to execute the method in 06, that is, the speaker 90 is further configured to play the voice information associated with the voice annotation tag V according to the triggered voice annotation tag V.
进一步的,当已标注图像P2及语音信息以分开保存的方式保存时,通过设计实现触发语音标注标签V时播放与该语音标注标签V关联的语音信息。具体地,如图3所示的已标注图像P2及语音标注标签V,用户可点击已标注图像P2中的任一个语音标注标签V,播放模块15或扬声器90播放与该语音标注标签V关联的语音信息,保证用户可选择性地收听已标注图像P2中的语音信息。Further, when the marked image P2 and the voice information are stored separately, the voice information associated with the voice marking tag V is played when the voice marking tag V is triggered by design. Specifically, as shown in FIG. 3 with the marked image P2 and the voice marked tag V, the user can click on any voice marked tag V in the marked image P2, and the playback module 15 or the speaker 90 will play the voice marked tag V associated with it. The voice information ensures that the user can selectively listen to the voice information in the marked image P2.
请参阅图14及图15,在某些实施方式中,在已标注图像P2超出显示区40时,图像的语音标注及使用方法还可包括:Please refer to FIG. 14 and FIG. 15. In some embodiments, when the marked image P2 exceeds the display area 40, the voice marking of the image and the method of using the image may further include:
07:播放与显示区40内的语音标注标签V关联的语音信息;07: Play the voice information associated with the voice annotation label V in the display area 40;
08:在播放完与显示区40内的语音标注标签V关联的语音信息后,滚动显示已标注图像P2以使未播放的语音标注标签V进入显示区40;及08: After playing the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; and
09:播放与进入显示区40内的语音标注标签V关联的语音信息。09: Play the voice information associated with the voice annotation tag V entered in the display area 40 .
请结合图4,播放模块15还用于执行07、08及09中的方法,即,播放模块15还用于:播放与显示区40内的语音标注标签V关联的语音信息;在播放完与显示区40内的语音标注标签V关联的语音信息后,滚动显示已标注图像P2以使未播放的语音标注标签V进入显示区40;及播放与进入显示区40内的语音标注标签V关联的语音信息。Please refer to FIG. 4, the playback module 15 is also used to execute the methods in 07, 08 and 09, that is, the playback module 15 is also used to: play the voice information associated with the voice annotation label V in the display area 40; After the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; voice message.
请结合图5,扬声器90还用于执行06中的方法,即,扬声器90还用于:在播放完与显示区40内的语音标注标签V关联的语音信息后,滚动显示已标注图像P2以使未播放的语音标注标签V进入显示区40;及播放与进入显示区40内的语音标注标签V关联的语音信息。Please refer to FIG. 5, the speaker 90 is also used for executing the method in 06, that is, the speaker 90 is also used for: after playing the voice information associated with the voice annotation tag V in the display area 40, scrolling and displaying the marked image P2 to Make the unplayed voice annotation tag V enter the display area 40 ; and play the voice information associated with the voice annotation tag V entered into the display area 40 .
实际情况中,获取模块11或一个或多个处理器30获取得到的待标注图像P1为长图,如拍照中全景模式得到的长图、通过滚动截图得到长图,显示区40正常显示为长图的待标注图像P1时,无法显示图像中的全部信息。如图15所示,当用户播放已标注图像P2及语音信息时,已标注图像P2超出显示区40,播放已标注图像P2中的语音信息时,先播放在显示区40内的语音标注标签V关联的语音信息,在播放完显示区40内的语音标注标签V关联的语音信息后,一个或多个处理器30控制已标注图像P2自动从上到下滚动显示未显示的图像信息,未显示的图像进入显示区40后,一个或多个处理器30检测进入显示区的图像中是否存在语音标注标签V,若存在语音标注标签V,则控制播放模块15或扬声器90播放与进入显示区40内的语音标注标签V关联的语音信息。其中,播放方式适用上述的视频播放和触发播放,在此不再赘述。In an actual situation, the image to be marked P1 acquired by the acquisition module 11 or one or more processors 30 is a long image, such as a long image obtained in a panorama mode during photography, or a long image obtained by scrolling screenshots, and the display area 40 normally displays a long image. When the to-be-labeled image P1 is shown in the figure, all the information in the image cannot be displayed. As shown in FIG. 15 , when the user plays the marked image P2 and the voice information, the marked image P2 exceeds the display area 40, and when playing the voice information in the marked image P2, the voice marked label V in the display area 40 is played first. Associated voice information, after playing the voice information associated with the voice annotation label V in the display area 40, one or more processors 30 control the marked image P2 to automatically scroll from top to bottom to display the image information not displayed, not displayed. After the image enters the display area 40, one or more processors 30 detect whether there is a voice annotation label V in the image entering the display area, and if there is a voice annotation label V, then control the playback module 15 or the speaker 90 to play and enter the display area 40. The phonetic information within the phonetic annotation tag V is associated with. The playback mode is applicable to the above-mentioned video playback and trigger playback, which will not be repeated here.
请参阅图16,本申请实施方式还提供一种包含计算机程序201的非易失性计算机可读存储介质200。当计算机程序201被一个或多个处理器30执行时,使得处理器30执行01、02、021、023、0231、0233、025、03、04、041、043、045、047、05、051、06、07、08、09中的方法。Referring to FIG. 16 , an embodiment of the present application further provides a non-volatile computer-readable storage medium 200 including a computer program 201 . When the computer program 201 is executed by one or more processors 30, it causes the processors 30 to execute Methods in 06, 07, 08, 09.
请结合图1及图2,例如,当计算机程序201被一个或多个处理器30执行时,使得处理器30执行以下方法:1 and 2, for example, when the computer program 201 is executed by one or more processors 30, the processors 30 are caused to execute the following methods:
01:获取待标注图像P1;01: Obtain the image P1 to be marked;
02:根据输入的语音信息及待标注图像P1生成已标注图像P2,已标注图像P2包括语音标注标签V,语音标注标签V显示于待标注图像P1中;02: generate a marked image P2 according to the input voice information and the image to be marked P1, the marked image P2 includes a voice marked label V, and the voice marked label V is displayed in the to-be-marked image P1;
03:关联语音标注标签V及语音信息;及03: Associate the voice annotation tag V with the voice information; and
04:保存已标注图像P2及语音信息。04: Save the marked image P2 and voice information.
又例如,当计算机程序201被一个或多个处理器30执行时,使得处理器30执行以下方法:For another example, when the computer program 201 is executed by one or more processors 30, it causes the processors 30 to perform the following methods:
01:获取待标注图像P1;01: Obtain the image P1 to be marked;
021:根据语音信息生成语音标注标签;021: Generate a voice annotation label according to the voice information;
023:在待标注图像P1中显示语音标注标签V,以生成已标注图像P2;023: Display the voice annotation label V in the to-be-labeled image P1 to generate the labeled image P2;
025:对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个;025: processing the voice annotation tag V to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging;
03:关联语音标注标签V及语音信息;03: Associate the voice annotation tag V and voice information;
041:将已标注图像P2及语音信息保存为一个视频文件;041: Save the marked image P2 and voice information as a video file;
051:按照播放顺序播放与语音标注标签V关联的语音信息;051: Play the voice information associated with the voice annotation tag V according to the playback order;
07:播放与显示区40内的语音标注标签V关联的语音信息;07: Play the voice information associated with the voice annotation label V in the display area 40;
08:在播放完与显示区40内的语音标注标签V关联的语音信息后,滚动显示已标注图像P2以使未播放的语音标注标签V进入显示区40;及08: After playing the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; and
09:播放与进入显示区40内的语音标注标签V关联的语音信息。09: Play the voice information associated with the voice annotation tag V entered in the display area 40 .
再例如,当计算机程序201被一个或多个处理器30执行时,使得处理器30执行以下方法:For another example, when the computer program 201 is executed by one or more processors 30, it causes the processors 30 to perform the following methods:
01:获取待标注图像P1;01: Obtain the image P1 to be marked;
021:根据语音信息生成语音标注标签;021: Generate a voice annotation label according to the voice information;
023:在待标注图像P1中显示语音标注标签V,以生成已标注图像P2;023: Display the voice annotation label V in the to-be-labeled image P1 to generate the labeled image P2;
03:关联语音标注标签V及语音信息;03: Associate the voice annotation tag V and voice information;
043:将已标注图像P2保存为第一格式文件;043: Save the marked image P2 as a first format file;
045:将语音信息保存为第二格式文件;045: save the voice information as a second format file;
047:将第一格式文件与第二格式文件分别保存;047: save the first format file and the second format file separately;
051:按照播放顺序播放与语音标注标签V关联的语音信息;051: Play the voice information associated with the voice annotation tag V according to the playback order;
07:播放与显示区40内的语音标注标签V关联的语音信息;07: Play the voice information associated with the voice annotation label V in the display area 40;
08:在播放完与显示区40内的语音标注标签V关联的语音信息后,滚动显示已标注图像P2以使未播放的语音标注标签V进入显示区40;及08: After playing the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; and
09:播放与进入显示区40内的语音标注标签V关联的语音信息。09: Play the voice information associated with the voice annotation tag V entered in the display area 40 .
还例如,当计算机程序201被一个或多个处理器30执行时,使得处理器30执行以下方法:Also for example, the computer program 201, when executed by one or more processors 30, causes the processors 30 to perform the following methods:
01:获取待标注图像P1;01: Obtain the image P1 to be marked;
021:根据语音信息生成语音标注标签;021: Generate a voice annotation label according to the voice information;
023:在待标注图像P1中显示语音标注标签V,以生成已标注图像P2;023: Display the voice annotation label V in the to-be-labeled image P1 to generate the labeled image P2;
025:对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个;025: processing the voice annotation tag V to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging;
03:关联语音标注标签V及语音信息;03: Associate the voice annotation tag V and voice information;
041:将已标注图像P2及语音信息保存为一个视频文件;041: Save the marked image P2 and voice information as a video file;
06:触发语音标注标签V以播放与语音标注标签V关联的语音信息。06: Trigger the voice annotation tag V to play the voice information associated with the voice annotation tag V.
07:播放与显示区40内的语音标注标签V关联的语音信息;07: Play the voice information associated with the voice annotation label V in the display area 40;
08:在播放完与显示区40内的语音标注标签V关联的语音信息后,滚动显示已标注图像P2以使未 播放的语音标注标签V进入显示区40;及08: after playing the voice information associated with the voice annotation label V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation label V enters the display area 40; and
09:播放与进入显示区40内的语音标注标签V关联的语音信息。09: Play the voice information associated with the voice annotation tag V entered in the display area 40 .
还例如,当计算机程序201被一个或多个处理器30执行时,使得处理器30执行以下方法:Also for example, the computer program 201, when executed by one or more processors 30, causes the processors 30 to perform the following methods:
01:获取待标注图像P1;01: Obtain the image P1 to be marked;
021:根据语音信息生成语音标注标签;021: Generate a voice annotation label according to the voice information;
023:在待标注图像P1中显示语音标注标签V,以生成已标注图像P2;023: Display the voice annotation label V in the to-be-labeled image P1 to generate the labeled image P2;
025:对语音标注标签V进行处理,以生成已标注图像P2,处理包括播放、删除、拖拽中的至少一个;025: processing the voice annotation tag V to generate the marked image P2, and the processing includes at least one of playing, deleting, and dragging;
03:关联语音标注标签V及语音信息;03: Associate the voice annotation tag V and voice information;
043:将已标注图像P2保存为第一格式文件;043: Save the marked image P2 as a first format file;
045:将语音信息保存为第二格式文件;045: save the voice information as a second format file;
047:将第一格式文件与第二格式文件分别保存;047: save the first format file and the second format file separately;
06:触发语音标注标签V以播放与语音标注标签V关联的语音信息。06: Trigger the voice annotation tag V to play the voice information associated with the voice annotation tag V.
07:播放与显示区40内的语音标注标签V关联的语音信息;07: Play the voice information associated with the voice annotation label V in the display area 40;
08:在播放完与显示区40内的语音标注标签V关联的语音信息后,滚动显示已标注图像P2以使未播放的语音标注标签V进入显示区40;及08: After playing the voice information associated with the voice annotation tag V in the display area 40, scroll and display the marked image P2 so that the unplayed voice annotation tag V enters the display area 40; and
09:播放与进入显示区40内的语音标注标签V关联的语音信息。09: Play the voice information associated with the voice annotation tag V entered in the display area 40 .
在本说明书的描述中,参考术语“实施例”、“示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, description with reference to the terms "embodiment," "example," etc. means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application . In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
尽管已经示出和描述了本申请的实施例,本领域的普通技术人员可以理解:在不脱离本申请的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本申请的范围由权利要求及其等同物限定。Although the embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the present application, The scope of the application is defined by the claims and their equivalents.

Claims (20)

  1. 一种图像的语音标注及使用方法,其特征在于,包括:A voice annotation of images and a method for using them, comprising:
    获取待标注图像;Get the image to be labeled;
    根据输入的语音信息及所述待标注图像生成已标注图像,所述已标注图像包括语音标注标签,所述语音标注标签显示于所述待标注图像中;Generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed in the to-be-marked image;
    关联所述语音标注标签及所述语音信息;及associating the phonetic callout tag with the phonetic information; and
    保存所述已标注图像及所述语音信息。Save the annotated image and the voice information.
  2. 根据权利要求1所述的图像的语音标注及使用方法,其特征在于,所述根据输入的语音信息及所述待标注图像生成已标注图像,包括:The voice labeling and using method of an image according to claim 1, wherein the generating a labelled image according to the input speech information and the to-be-labeled image comprises:
    根据所述语音信息生成语音标注标签;及generating a voice annotation tag based on the voice information; and
    在所述待标注图像中显示所述语音标注标签,以生成所述已标注图像。The voice annotation label is displayed in the to-be-annotated image to generate the annotated image.
  3. 根据权利要求1所述的图像的语音标注及使用方法,其特征在于,所述根据输入的语音信息及所述待标注图像生成已标注图像,包括:The voice labeling and using method of an image according to claim 1, wherein the generating a labelled image according to the input speech information and the to-be-labeled image comprises:
    根据所述语音信息生成语音标注标签;generating a voice annotation label according to the voice information;
    在所述待标注图像中显示所述语音标注标签;及displaying the voice annotation label in the to-be-annotated image; and
    对所述语音标注标签进行处理,以生成所述已标注图像,所述处理包括播放、删除、拖拽中的至少一个。The voice annotation tag is processed to generate the marked image, the processing including at least one of playing, deleting, and dragging.
  4. 根据权利要求1所述的图像的语音标注及使用方法,其特征在于,所述保存所述已标注图像及所述语音信息,包括:The voice annotation of images and the method for using them according to claim 1, wherein the storing of the marked images and the voice information comprises:
    将所述已标注图像及所述语音信息保存为一个视频文件。Save the marked image and the voice information as a video file.
  5. 根据权利要求4所述的图像的语音标注及使用方法,其特征在于,还包括:The voice annotation of image according to claim 4 and using method, it is characterized in that, also comprises:
    播放所述已标注图像及所述语音信息。Play the marked image and the voice information.
  6. 根据权利要求5所述的图像的语音标注及使用方法,其特征在于,所述语音标注标签为多个,多个所述语音标注标签具有预定的播放顺序,所述播放所述已标注图像及所述语音信息包括:The voice annotation of images and the method for using them according to claim 5, wherein there are multiple voice annotation tags, and a plurality of said voice annotation tags have a predetermined playback order, and said playback of said marked images and The voice information includes:
    按照所述播放顺序播放与所述语音标注标签关联的所述语音信息。The voice information associated with the voice annotation tag is played in the playing order.
  7. 根据权利要求1所述的图像的语音标注及使用方法,其特征在于,所述保存所述已标注图像及所述语音信息,包括:The voice annotation of an image and the method for using it according to claim 1, wherein the storing the marked image and the voice information comprises:
    将所述已标注图像保存为第一格式文件;saving the marked image as a first format file;
    将所述语音信息保存为第二格式文件;及saving the voice information as a second format file; and
    将所述第一格式文件与所述第二格式文件分别保存。Save the first format file and the second format file separately.
  8. 根据权利要求7所述的图像的语音标注及使用方法,其特征在于,还包括:The voice labeling and using method of image according to claim 7, is characterized in that, also comprises:
    触发所述语音标注标签以播放与所述语音标注标签关联的所述语音信息。The voice annotation tag is triggered to play the voice information associated with the voice annotation tag.
  9. 根据权利要求1至8任意一项所述的图像的语音标注及使用方法,其特征在于,还包括:The voice labeling and using method of an image according to any one of claims 1 to 8, further comprising:
    播放与显示区内的语音标注标签关联的语音信息;Play the voice information associated with the voice annotation label in the display area;
    在播放完与显示区内的语音标注标签关联的语音信息后,滚动显示已标注图像以使未播放的语音标注标签进入显示区;及After playing the voice information associated with the voice annotation tags in the display area, scrolling the marked images so that the unplayed voice annotation tags enter the display area; and
    播放与进入显示区内的语音标注标签关联的语音信息。Plays the voice message associated with the voice callout tag entering the display area.
  10. 一种图像的语音标注及使用装置,其特征在于,包括:A device for voice annotation and use of images, comprising:
    获取模块,用于获取待标注图像;The acquisition module is used to acquire the image to be annotated;
    生成模块,用于根据输入的语音信息及所述待标注图像生成已标注图像,所述已标注图像包括语音标注标签,所述语音标注标签显示于所述待标注图像中;A generating module, configured to generate a marked image according to the input voice information and the to-be-marked image, the marked image includes a voice-marked label, and the voice-marked label is displayed in the to-be-marked image;
    关联模块,用于关联所述语音标注标签及所述语音信息;及an association module for associating the voice annotation tag and the voice information; and
    存储模块,用于保存所述已标注图像及所述语音信息。The storage module is used for saving the marked image and the voice information.
  11. 一种电子装置,其特征在于,包括:An electronic device, characterized in that, comprising:
    一个或多个处理器,一个或多个所述处理器用于获取待标注图像;根据输入的语音信息及所述待标注图像生成已标注图像,所述已标注图像包括语音标注标签,所述语音标注标签显示于所述待标注图像中;及关联所述语音标注标签及所述语音信息;及One or more processors, one or more of the processors are used to obtain the image to be marked; according to the input voice information and the image to be marked, a marked image is generated, the marked image includes a voice marking label, and the voice An annotation tag is displayed in the to-be-annotated image; and the voice annotation tag is associated with the voice information; and
    存储器,所述存储器用于保存所述已标注图像及所述语音信息。a memory, where the memory is used for saving the marked image and the voice information.
  12. 根据权利要求11所述的电子装置,其特征在于,一个或多个所述处理器还用于:The electronic device of claim 11, wherein one or more of the processors are further configured to:
    根据所述语音信息生成语音标注标签;generating a voice annotation label according to the voice information;
    控制在所述待标注图像中显示所述语音标注标签,以生成所述已标注图像。The voice annotation label is controlled to be displayed in the to-be-annotated image to generate the annotated image.
  13. 根据权利要求11所述的电子装置,其特征在于,一个或多个所述处理器还用于:The electronic device of claim 11, wherein one or more of the processors are further configured to:
    根据所述语音信息生成语音标注标签;generating a voice annotation label according to the voice information;
    控制在所述待标注图像中显示所述语音标注标签;及controlling the display of the voice annotation label in the to-be-annotated image; and
    对所述语音标注标签进行处理,以生成所述已标注图像,所述处理包括播放、删除、拖拽中的至少一个。The voice annotation tag is processed to generate the marked image, the processing including at least one of playing, deleting, and dragging.
  14. 根据权利要求11所述的电子装置,其特征在于,所述存储器还用于将所述已标注图像及所述语音信息保存为一个视频文件。The electronic device according to claim 11, wherein the memory is further configured to save the marked image and the voice information as a video file.
  15. 根据权利要求11所述的电子装置,其特征在于,还包括显示器和扬声器,所述显示器用于显示所述已标注图像,所述扬声器用于播放所述语音信息。The electronic device according to claim 11, further comprising a display and a speaker, wherein the display is used for displaying the marked image, and the speaker is used for playing the voice information.
  16. 根据权利要求15所述的电子装置,其特征在于,所述语音标注标签为多个,多个所述语音标注标签具有预定的播放顺序,所述扬声器还用于按照所述播放顺序播放与所述语音标注标签关联的所述语音信息。The electronic device according to claim 15, wherein there are multiple voice annotation tags, and the multiple voice annotation tags have a predetermined playing order, and the speaker is further configured to play and play the corresponding voice tags according to the playing order. the voice information associated with the voice annotation tag.
  17. 根据权利要求11所述的电子装置,其特征在于,所述存储器还用于:The electronic device according to claim 11, wherein the memory is further used for:
    将所述已标注图像保存为第一格式文件;saving the marked image as a first format file;
    将所述语音信息保存为第二格式文件;及saving the voice information as a second format file; and
    将所述第一格式文件与所述第二格式文件分别保存。Save the first format file and the second format file separately.
  18. 根据权利要求17所述的电子装置,其特征在于,所述扬声器还用于根据触发的所述语音标注标签播放与所述语音标注标签关联的所述语音信息。The electronic device according to claim 17, wherein the speaker is further configured to play the voice information associated with the voice annotation tag according to the triggered voice annotation tag.
  19. 根据权利要求11至17任意一项所述的电子装置,其特征在于,所述扬声器还用于:The electronic device according to any one of claims 11 to 17, wherein the speaker is further used for:
    播放与显示区内的语音标注标签关联的语音信息;Play the voice information associated with the voice annotation label in the display area;
    在播放完与显示区内的语音标注标签关联的语音信息后,滚动显示已标注图像以使未播放的语音标注标签进入显示区;及After playing the voice information associated with the voice annotation tags in the display area, scrolling the marked images so that the unplayed voice annotation tags enter the display area; and
    播放与进入显示区内的语音标注标签关联的语音信息。Plays the voice message associated with the voice callout tag entering the display area.
  20. 一种存储有计算机程序的非易失性计算机可读存储介质,当所述计算机程序被一个或多个处理器执行时,实现权利要求1至9任意一项所述的图像的语音标注及使用方法。A non-volatile computer-readable storage medium storing a computer program, when the computer program is executed by one or more processors, the voice annotation and use of the image according to any one of claims 1 to 9 are realized method.
PCT/CN2021/140547 2021-03-03 2021-12-22 Voice annotation and use method and device for image, electronic device, and storage medium WO2022183814A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110235765.3A CN115101057A (en) 2021-03-03 2021-03-03 Voice annotation of image, using method and device thereof, electronic device and storage medium
CN202110235765.3 2021-03-03

Publications (1)

Publication Number Publication Date
WO2022183814A1 true WO2022183814A1 (en) 2022-09-09

Family

ID=83155001

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140547 WO2022183814A1 (en) 2021-03-03 2021-12-22 Voice annotation and use method and device for image, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN115101057A (en)
WO (1) WO2022183814A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030144843A1 (en) * 2001-12-13 2003-07-31 Hewlett-Packard Company Method and system for collecting user-interest information regarding a picture
CN107223246A (en) * 2017-03-20 2017-09-29 深圳前海达闼云端智能科技有限公司 Image labeling method, device and electronic equipment
CN108320318A (en) * 2018-01-15 2018-07-24 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN110046271A (en) * 2019-03-22 2019-07-23 中国科学院西安光学精密机械研究所 A kind of remote sensing images based on vocal guidance describe method
CN111355912A (en) * 2020-02-17 2020-06-30 江苏济楚信息技术有限公司 Law enforcement recording method and system
CN111629156A (en) * 2019-02-28 2020-09-04 北京字节跳动网络技术有限公司 Image special effect triggering method and device and hardware device
CN112383734A (en) * 2020-10-29 2021-02-19 岭东核电有限公司 Video processing method, video processing device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030144843A1 (en) * 2001-12-13 2003-07-31 Hewlett-Packard Company Method and system for collecting user-interest information regarding a picture
CN107223246A (en) * 2017-03-20 2017-09-29 深圳前海达闼云端智能科技有限公司 Image labeling method, device and electronic equipment
CN108320318A (en) * 2018-01-15 2018-07-24 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN111629156A (en) * 2019-02-28 2020-09-04 北京字节跳动网络技术有限公司 Image special effect triggering method and device and hardware device
CN110046271A (en) * 2019-03-22 2019-07-23 中国科学院西安光学精密机械研究所 A kind of remote sensing images based on vocal guidance describe method
CN111355912A (en) * 2020-02-17 2020-06-30 江苏济楚信息技术有限公司 Law enforcement recording method and system
CN112383734A (en) * 2020-10-29 2021-02-19 岭东核电有限公司 Video processing method, video processing device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN115101057A (en) 2022-09-23

Similar Documents

Publication Publication Date Title
US11023666B2 (en) Narrative-based media organizing system for transforming and merging graphical representations of digital media within a work area
US11627001B2 (en) Collaborative document editing
JP4453738B2 (en) File transfer method, apparatus, and program
US9122886B2 (en) Track changes permissions
US9230356B2 (en) Document collaboration effects
US9542366B2 (en) Smart text in document chat
WO2019047508A1 (en) Method for processing e-book comment information, electronic device and storage medium
KR20180002702A (en) Bookmark management technology for media files
WO2019007213A1 (en) Data processing method and apparatus, and terminal device
KR20160016810A (en) Automatic isolation and selection of screenshots from an electronic content repository
US20170046350A1 (en) Media organization
WO2021098263A1 (en) Application program sharing method and apparatus, electronic device and readable medium
WO2022183814A1 (en) Voice annotation and use method and device for image, electronic device, and storage medium
WO2023184745A1 (en) Data labeling method and apparatus, electronic device, and storage medium
TWI299466B (en) System and method for providing presentation files for an embedded system
CN113315691B (en) Video processing method and device and electronic equipment
TW201430728A (en) Icon generating system and method for generating icon
US20140250055A1 (en) Systems and Methods for Associating Metadata With Media Using Metadata Placeholders
CN105205069B (en) Cache opening method and device based on paging file
US11210639B2 (en) Electronic dynamic calendar system, operation method and computer readable storage medium
WO2023016364A1 (en) Video processing method and apparatus, and device and storage medium
WO2020050055A1 (en) Document creation assistance device, document creation assistance system, and program
TW201539399A (en) Digital notes for distance learning system
TW201505432A (en) Electronic apparatus and method for annotating media file thereof
TW201333729A (en) Display system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21928885

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21928885

Country of ref document: EP

Kind code of ref document: A1