CN113487705A

CN113487705A - Image annotation method, terminal and storage medium

Info

Publication number: CN113487705A
Application number: CN202110796593.7A
Authority: CN
Inventors: 朱峰结; 袁佳鹏
Original assignee: Shanghai Chuanying Information Technology Co Ltd
Current assignee: Shanghai Chuanying Information Technology Co Ltd
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-10-08

Abstract

The application discloses an image labeling method, a terminal and a storage medium. The image annotation method comprises the following steps: providing at least one annotation mode; responding at least one of the first operation, the image information and the scene information, and acquiring a label added to the image; and integrating the label into the image according to the template. The marking mode of the application is rich and diverse, the form of triggering the marking is also multiple, more meticulous and humanized, the application is easy to use, and the user experience is improved.

Description

Image annotation method, terminal and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image annotation method, a terminal, and a storage medium.

Background

In some implementations, the user can perform operations such as annotation or annotation on the image.

In the course of conceiving and implementing the present application, the inventors found that at least the following problems existed: most of the existing labels can only label characters of images, labeling modes are not rich enough, and labeling forms are not fine and humanized enough, so that the labels are not easy to use by users, and user experience is poor.

The foregoing description is provided for general background information and is not admitted to be prior art.

Disclosure of Invention

In view of the above technical problems, the present application provides an image annotation method, a terminal and a storage medium, the annotation mode is rich and diverse, the form of triggering annotation is also diverse, more detailed and humanized, the method is easy to use, and the user experience is improved.

In order to solve the above technical problem, the present application provides an image annotation method, including:

s11, providing at least one annotation mode;

s12, responding at least one of the first operation, the image information and the scene information, and acquiring the label added to the image;

and S13, integrating the annotation into the image according to the template.

Optionally, the annotation pattern comprises at least one of:

a mood selection mode, optionally the mood selection mode is used for adding mood labels to the image;

a graffiti mode, optionally, a graffiti mode is used to add graffiti annotations to the image;

optionally, the annotation mode is used for adding character labels to the image;

and optionally, the audio-video annotation mode is used for adding audio-video annotation to the image.

Optionally, the first operation may be at least one of:

sliding the screen along a preset direction, touching the screen at multiple points, clicking the screen for more than a preset time, and clicking the screen for more than a preset number of times.

Optionally, the step of S12 includes:

identifying image information and/or scene information;

detecting whether the image information and/or the scene information contain preset characteristic information;

if so, entering a labeling mode corresponding to the preset characteristic information; and/or the presence of a gas in the gas,

if not, the labeling mode corresponding to the preset characteristic information is not entered.

Optionally, the preset feature information may be at least one of:

object information, person information, face information, scene information.

Optionally, the manner of obtaining the label in step S12 includes at least one of:

acquiring a corresponding label according to the marking instruction;

generating a label according to the editing operation;

and acquiring a corresponding label according to the image information and/or the scene information.

Optionally, the step of S13 includes:

displaying the annotation on the image;

saving the image with the label as a new image;

setting the annotation into a hidden mode, wherein optionally, the hidden mode is used for displaying the image by default and displaying the annotation when responding to preset operation;

and importing the image with the label into a preset template and storing.

Optionally, the hidden mode comprises at least one hidden layer, and optionally, the hidden layer is used for displaying labels of different types.

Optionally, after the step of S13, the image annotation method further includes:

storing the label to preset equipment according to the type of the label;

storing the image to preset equipment according to preset information of the image;

optionally, the preset information may be at least one of the following:

presetting a tag, browsing times, shooting time, receiving time, a storage path and occupied storage capacity.

Optionally, the pre-set device comprises a local and/or server.

The present application further provides a terminal, including: the image labeling method comprises a memory and a processor, wherein the memory stores programs, and the programs realize the steps of the image labeling method when being executed by the processor.

The present application further provides a computer storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the image annotation method as described above.

As described above, the application provides at least one annotation mode, and the corresponding annotation mode is entered by responding to at least one of the first operation, the image information and the scene information, so that the annotation mode is rich and diverse, the forms of triggering annotation are also various, and the annotation is more precise and humanized, is easy to use, and improves the user experience; and the labels are classified and stored in the local and/or server according to the label types or preset information, so that the local storage space is optimized and balanced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic hardware structure diagram of a mobile terminal implementing various embodiments of the present application;

fig. 2 is a communication network system architecture diagram according to an embodiment of the present application;

fig. 3 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 4 is a flowchart illustrating an image annotation method according to an embodiment of the present application;

FIG. 5 is a diagram of a display interface for a mood selection mode provided in an embodiment of the present application;

FIG. 6 is a display interface diagram of a doodle mode provided by embodiments of the present application;

FIG. 7 is a display interface diagram of an annotation mode provided in an embodiment of the present application;

fig. 8 is a display interface diagram of an audio/video annotation mode provided in an embodiment of the present application;

FIG. 9 is a flowchart illustrating another image annotation method according to an embodiment of the present application;

FIG. 10 is a flowchart illustrating a further image annotation method according to an embodiment of the present application;

FIG. 11 is a flowchart illustrating an image preview method according to an embodiment of the present application;

fig. 12 is an operation diagram of displaying a hidden label according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings. With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of an element by the phrase "comprising an … …" does not exclude the presence of additional like elements in the process, method, article, or apparatus that comprises the element, and further, where similarly-named elements, features, or elements in different embodiments of the disclosure may have the same meaning, or may have different meanings, that particular meaning should be determined by their interpretation in the embodiment or further by context with the embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or," "and/or," "including at least one of the following," and the like, as used herein, are to be construed as inclusive or mean any one or any combination. For example, "includes at least one of: A. b, C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C ", again for example," A, B or C "or" A, B and/or C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C'. An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that step numbers such as S15 and S16 are used herein for the purpose of more clearly and briefly describing the corresponding content, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S16 first and then S15 in specific implementation, which should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

The mobile terminal may be implemented in various forms. For example, the mobile terminal described in the present application may include mobile terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present application, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile terminal in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex Long Term Evolution), and TDD-LTE (Time Division duplex Long Term Evolution).

WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor that may adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1061 and/or the backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Alternatively, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Optionally, the touch detection device detects a touch orientation of a user, detects a signal caused by a touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Optionally, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited thereto.

Alternatively, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a program storage area and a data storage area, and optionally, the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, optionally, the application processor mainly handles operating systems, user interfaces, application programs, etc., and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.

In order to facilitate understanding of the embodiments of the present application, a communication network system on which the mobile terminal of the present application is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present disclosure, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Optionally, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Alternatively, the eNodeB2021 may be connected with other enodebs 2022 through a backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. Optionally, the MME2031 is a control node that handles signaling between the UE201 and the EPC203, providing bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present application is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems.

Based on the above mobile terminal hardware structure and communication network system, various embodiments of the present application are provided.

For the convenience of understanding, a specific application scenario of the embodiment of the present application is described below with reference to fig. 3.

Fig. 3 is a schematic diagram of an application scenario provided in an embodiment of the present application. Referring to fig. 3, the mobile terminal is included, and an image (the image includes a picture or a video) is displayed on a display interface of the mobile terminal. The user can trigger the annotation of the image by some preset operation (e.g., double-clicking the center position of the image). The terminal receives the preset operation, a display interface pops up a marking frame, and the marking frame is provided with a text prompt: please enter (other similar prompts are also possible). The user can input characters, symbols and the like in the label box, for example, input coffee, and click to finish, so as to label the characters of the image. If the user does not want to add the label to the image, the user can click to cancel the label and quit the label.

Besides character labeling, the embodiment of the application also provides mood labeling, doodle labeling, audio and video labeling and the like, and the labeling modes are rich and diverse. Moreover, the triggering and marking modes are various, more detailed and humanized, the use is easy, and the user experience is improved.

The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may exist alone or in combination with each other, and description of the same or similar contents is not repeated in different embodiments.

Fig. 4 is a schematic flowchart of an image annotation method according to an embodiment of the present application. Referring to fig. 4, the image annotation method may include:

and S11, providing at least one annotation mode.

The execution main body of the embodiment of the application can be a mobile terminal, and can also be a control device arranged in the mobile terminal. Alternatively, the control means may be implemented by software, or by a combination of software and hardware.

As described above, the display interface of the mobile terminal displays an image to which the user can add annotations. Optionally, at least one annotation mode may be pre-stored in the mobile terminal and provided for the user to select. The annotation mode is used to control the type of annotation added to the image. Optionally, the type of the annotation may include a mood annotation, a graffiti annotation, a character annotation, an audio-video annotation, and the like. Correspondingly, the annotation pattern can include at least one of:

the mood selection mode is used for adding mood marks to the image;

the scrawling mode is used for adding scrawling marks to the image;

the annotation mode is used for adding character labels to the images;

and the audio and video annotation mode is used for adding audio and video annotation to the image.

Optionally, the audiovisual annotation comprises an audio annotation or a video annotation.

Fig. 5 is a display interface diagram of a mood selection mode according to an embodiment of the present application. Referring to fig. 5, a mood selection mode is entered, and a preset mood label, including sun, moon, smiling face, sadness, etc., is popped up on the display interface. The user can click and drag any mood label to any position of the image, the click is completed, and the mood label is added to the image.

Fig. 6 is a display interface diagram of a doodle mode according to an embodiment of the present application. Referring to fig. 6, entering the graffiti mode, a graffiti pen is popped up on one side of the display interface, and the graffiti pen comprises graffiti pens with different thicknesses. The double-tap graffiti pen may also select a color. The user can click a scrawling pen, scrawls on the image, and adds the scrawling mark to the image after clicking.

Fig. 7 is a display interface diagram of an annotation mode provided in the embodiment of the present application. Referring to fig. 7, entering the annotating mode, and displaying an interface pop-up label box, where the label box has a text prompt: please enter. The user can input characters, symbols and the like in the labeling box, click the characters and label the image. Optionally, the display interface may also pop up some preset templates that already contain characters, and the user only needs to select any one of the templates or edit the template based on the preset template, and then drag the template to any position.

Fig. 8 is a display interface diagram of an audio/video annotation mode provided in an embodiment of the present application. Referring to fig. 8, the audio/video annotation mode is entered, and the display interface displays the click recording/recording. Optionally, the audio annotation or the video annotation is entered by setting the difference of the click duration or the click times. For example, clicking 1 time, entering an audio annotation, popping up a microphone and clicking a recording prompt on the display interface, and the user can click or long-press the microphone to start recording. And after the recording is finished, loosening the microphone, finishing clicking, and adding audio annotation to the image. And clicking for 2 times, entering video annotation, popping up a recording button and clicking a recording prompt on a display interface, and enabling a user to click or press the recording button for a long time to start recording the video. And after the recording is finished, releasing the button, finishing clicking and adding video labels to the images.

And S12, responding at least one of the first operation, the image information and the scene information, and acquiring the label added to the image.

In some embodiments, the first operation may be at least one of: sliding the screen along a preset direction, touching the screen at multiple points, clicking the screen for more than a preset time, and clicking the screen for more than a preset number of times. Sliding the screen in a preset direction includes, but is not limited to: the screen comprises an upper sliding screen, a lower sliding screen, a left sliding screen and a right sliding screen. Multi-touch screens include, but are not limited to: two-finger touch screens, three-finger touch screens. Clicking on the screen for more than a preset length of time includes, but is not limited to: and long-pressing the screens for 2s and 3 s. Clicking the screen more than a preset number of times includes, but is not limited to: double clicking the screen continuously, and clicking the screen continuously 3 times.

In some embodiments, when entering the corresponding annotation mode in response to the image information and/or the scene information, referring to fig. 9, the step S12 may include:

s121, identifying image information and/or scene information;

and S122, detecting whether the image information and/or the scene information contain preset characteristic information. If the image information and/or the scene information contains the preset feature information, executing step S123; and/or if the image information andor the scene information does not contain the preset characteristic information, returning to the step S121;

and S123, entering a labeling mode corresponding to the preset characteristic information.

The image information may be feature information contained in an image, for example, a human face contained in the image. The scene information may be scene information or environment information where the terminal is located when the image is marked, for example, the terminal is participating in a video conference; the scene information may also be scene information or environment information presented by the content in the tagged image, for example, the material is sorted after the meeting is ended, and the scene of the meeting is described.

The image information and/or the scene information may include preset feature information. And when the detected image information and/or the scene information comprise preset characteristic information, entering a labeling mode corresponding to the preset characteristic information. In some embodiments, the preset feature information may be at least one of: object information, person information, face information, scene information.

For example, if the image information includes object information, referring to fig. 3, if the image includes coffee, the annotation "coffee" is added to the image in response to the object information entering the annotation mode. Optionally, if the user does not want to add a character label to the image, the user may also choose to exit the annotation mode, and then the user enters the labeling mode that the user wants to use through one of the first operations.

For another example, if the image information includes character information or face information, a mood selection mode is entered in response to the character information or face information, and the mood or facial expression of the character is recorded. Optionally, if the user does not want to add a mood label to the image, the user may also choose to exit the mood selection mode, and then enter the labeling mode that the user wants to use through one of the first operations.

For another example, the scene information includes scene information, that is, when the terminal is located in a certain scene area when the image is labeled, the terminal enters an audio/video labeling mode in response to the scene information, so that voice similar to "7/2/7/2021, xiaodan to a certain scene area" can be recorded or a scene video can be recorded. Optionally, if the user does not want to add the audio/video annotation to the image, the user may also choose to exit the audio/video annotation mode, and then the user enters the annotation mode that the user wants to use through one of the first operations.

For another example, if the terminal is detected to be participating in a video conference, the terminal cannot enter the audio/video tagging mode and can enter other tagging modes, so that the influence on the conference when the audio/video tagging is added is avoided.

For another example, if the scene presented by the content in the annotated image is a conference scene, the image has participant scenes, and participants exist, then the scene information is responded, a graffiti mode is entered, and the participants or the participant scenes can be described in a graffiti mode, for example, names are written beside each participant.

After the user adds the label to the image, the terminal can acquire the label added to the image by the user after clicking.

Optionally, the method for the terminal to obtain the annotation includes at least one of the following three methods:

(1) and acquiring a corresponding label according to the marking instruction.

The marking instructions may include operations that the user does when adding the annotation. For example, after entering the mood selection mode, a preset mood label pops up on the display interface, and a user clicks and drags the smiling face label to any position of the image, so that the terminal can obtain a mood selection instruction, obtain a corresponding smiling face label according to the mood selection instruction, and add the smiling face label to the corresponding position of the image. Optionally, the marking instruction may also be a doodling instruction, a text annotation instruction, a voice annotation instruction, or a video annotation instruction, and the implementation process is consistent with the mood selection instruction, which is not described herein again.

(2) And generating a label according to the editing operation.

The editing operation can comprise graffiti, character marking and the like, or can be edited on the basis of the existing graffiti template and text template. The terminal can generate annotations according to the editing operations.

(3) And acquiring a corresponding label according to the image information and/or the scene information.

The terminal can also automatically acquire the corresponding label according to the image information and/or the scene information. For example, if the image includes a smiling face, the smiling face can be arranged on the top in the preset mood label popped up on the display interface, so that the user can add a label quickly. Or the terminal detects that the image contains the smiling face, automatically adding the smiling face label to the image. For another example, when the terminal is participating in a video conference, a text label is automatically added to the image: 7/2-month department meeting in 2021.

And S13, integrating the annotation into the image according to the template.

And the terminal integrates the label added to the image into the image. The means of integration includes at least one of:

(1) the annotation is displayed on the image.

Optionally, the added mood label, doodle label, character label and audio-video label are displayed on the image. For example, referring to the right image in fig. 3, the character label "coffee" is displayed on the image. Optionally, the audio-video annotation may be displayed on the image in the form of a small mark or tag, and by clicking on the mark or tag, the audio or video may be played without affecting the user's view of the image. This integration is equivalent to overlaying the original image with the new image with the added annotations for saving.

(2) And saving the image with the label as a new image.

Optionally, on the basis of the 1 st integration mode, images displaying the mood annotation, the doodle annotation, the character annotation and the audio/video annotation are stored as new images. The integration mode does not affect the original image, and the original image after being stored and the new image after being added with the label exist.

(3) And setting the annotation into a hidden mode, wherein the hidden mode is used for displaying the annotation only by default and responding to preset operation.

Alternatively, in some scenarios, the user does not want the added annotations to be seen by other users, or to affect viewing of the original image, the added annotations may be set to a hidden mode. The hidden mode is used for displaying the label only when the image is displayed by default and the preset operation is responded, namely the label is not displayed on the image and is displayed only when the preset operation is responded, so that the privacy of a user can be protected.

Optionally, the preset operations include, but are not limited to: double clicking on an image, zooming out an image with a gesture, zooming in an image with a gesture, etc.

(4) And importing the image with the label into the template and storing the image.

Optionally, the images with mood marks, doodle marks, character marks and audio/video marks are imported into the template for editing and storing. For example, based on the image itself, the user may add appropriate background music and filters to the template, form a video, and then save it.

In some embodiments, referring to fig. 10, after step S13, the image annotation method further includes:

and S14, saving the label to a preset device according to the type of the label.

Optionally, the type of the annotation may include a mood annotation, a graffiti annotation, a character annotation, an audio-video annotation, and the like. The provisioning device may include a local and/or server. Annotations can be automatically saved locally and/or to a server based on the type of annotation. For example, according to the type of the label, the mood label, the doodle label and the character label which occupy a small storage capacity can be stored locally, and the audio/video label which occupies a large storage capacity can be stored in the server, so that the storage space of the terminal can be saved.

Optionally, the type of annotation may also include annotation time. The annotation can be saved locally and/or to a server based on the annotation time. For example, a label is added, the terminal records the time of adding the label, after a week (the specific time can be set by the user), the user is automatically or prompted to manually store the label to the server, and the label stored locally is deleted, so that the storage space of the terminal can be saved. Of course, the locally stored annotations may not be deleted, and the annotations uploaded to the server are only used as backups.

Optionally, the hidden mode includes at least one hidden layer, and the hidden layer is used for storing and displaying labels of different types. And because the annotations are stored to the local and/or server according to types, the annotations stored in different areas (local or server) can be stored and displayed in different hidden layers. For example, mood marks, doodle marks and character marks occupying small storage capacity are stored in a first hidden layer, audio and video marks occupying large storage capacity are stored in a second hidden layer, and a user can clearly know the area of mark storage according to the hidden layer where the marks are located.

And S15, saving the image to a preset device according to the preset information of the image.

Optionally, the preset information of the image includes at least one of: presetting a tag, browsing times, shooting time, receiving time, a storage path and occupied storage capacity.

Alternatively, the preset tag may be a tag set by a user. For example, after the user sets that the image is stored for 3 days, the terminal automatically judges the image, and selects to store the image locally or upload the image to the server according to preset judgment conditions. The preset determination condition may be a browsing number, a shooting time, a receiving time, a storage path, an occupied storage capacity, or other determination conditions.

Alternatively, the number of times of browsing may be the number of times the user browses the image. For example, if the terminal detects that the number of times that the user browses a certain image is large (exceeds a preset number of times), the image is stored locally; and/or if the number of times a certain image is browsed is less (less than the preset number of times), saving the image to the server. For another example, if the terminal detects that the user has browsed the images of the same annotation type (for example, the images or text annotations of the terminal participating in the video conference include images of the conference) more times in the last few days, the images and other images with the same annotation type are stored locally.

Alternatively, the photographing time may be a time when the image is photographed. For example, the terminal stores the images shot in the last year locally, and automatically uploads the images shot before one year to the server, so that the storage space of the terminal is saved.

Alternatively, the reception time may be a time when the image is received. For example, the terminal receives an image transmitted from another device and stores the image locally. After one month, the terminal automatically uploads the received image to the server, so that the storage space of the terminal is saved.

Alternatively, the storage path may be a storage tag set by the user for the image, excluding storage to a local or server. For example, when the user stores images in batch, the user can set a label for storing the images to the server, so that the images can be uploaded to the server in batch, and the efficiency is improved.

Alternatively, the occupied storage capacity may be the size of the storage capacity occupied by the image. For example, the terminal stores the image with small occupied storage capacity in the local, and stores the image with large occupied storage capacity in the server, so that the storage space of the terminal is saved.

Fig. 11 is a flowchart illustrating an image previewing method according to an embodiment of the present application. Referring to fig. 11, the image preview method includes:

s21, opening the image;

and S22, responding to the second operation and displaying the corresponding label.

With reference to the above embodiments, optionally, in some scenarios, the annotation that the user does not wish to add is seen by other users, or the added annotation does not wish to affect viewing of the original image, the added annotation may be set to a hidden mode. The hidden mode is used to display only the image by default and to display the annotation in response to a preset operation.

And opening the image by the terminal, namely displaying the image on a display interface of the terminal, wherein the hidden label cannot be seen. And only when the terminal responds to the second operation, the corresponding label is displayed on the display interface, so that the privacy of the user can be protected.

In some embodiments, annotations are stored locally and/or on a server, depending on the type of annotation. Optionally, the type of the label may include a mood label, a graffiti label, a character label, an audio-video label, and the like, and may also include a labeling time. Please refer to the above description, which is not described herein again, for the implementation process of saving the annotation in the local and/or the server according to the type of the annotation.

Alternatively, annotations stored in different regions (local or server) may be saved in different hidden layers. For example, mood marks, doodle marks and character marks which occupy small storage capacity are stored in a first hidden layer, audio and video marks which occupy large storage capacity are stored in a second hidden layer, and a user can clearly know the area of marked storage according to the hidden layer where the marks are located.

Correspondingly, the step S22 includes:

and responding to the second operation, and displaying the annotations stored in the local and/or the server.

Optionally, the second operation includes, but is not limited to: double clicking on an image, zooming out an image with a gesture, zooming in an image with a gesture, etc. For example, the mood label, the doodle label and the character label which occupy a small storage capacity are stored locally, and the audio/video label which occupies a large storage capacity is stored in the server. The terminal responds to a second operation, such as double-click image operation, and displays the mood label, the doodle label and the character label stored in the local; or responding the gesture to reduce the picture and displaying the audio and video annotation stored in the server.

Referring to fig. 12, the terminal display interface displays an image, and the user double-clicks the image to display the locally stored character label "coffee".

The embodiment of the present application further provides a terminal, where the terminal includes a memory and a processor, and the memory stores a program, and the program is executed by the processor to implement the steps of the method in any of the above embodiments.

An embodiment of the present application further provides a computer-readable storage medium, where an image processing program is stored on the computer-readable storage medium, and when the image processing program is executed by a processor, the image processing program implements the steps of the method in any of the above embodiments.

In the embodiments of the terminal and the computer-readable storage medium provided in the present application, all technical features of the embodiments of the image annotation method or the image preview method are included, and the expanding and explaining contents of the specification are basically the same as those of the embodiments of the method, and are not described herein again.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the method in the above various possible embodiments.

Embodiments of the present application further provide a chip, which includes a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a device in which the chip is installed executes the method in the above various possible embodiments.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device in the embodiment of the application can be merged, divided and deleted according to actual needs.

In the present application, the same or similar term concepts, technical solutions and/or application scenario descriptions will be generally described only in detail at the first occurrence, and when the description is repeated later, the detailed description will not be repeated in general for brevity, and when understanding the technical solutions and the like of the present application, reference may be made to the related detailed description before the description for the same or similar term concepts, technical solutions and/or application scenario descriptions and the like which are not described in detail later.

In the present application, each embodiment is described with emphasis, and reference may be made to the description of other embodiments for parts that are not described or illustrated in any embodiment.

The technical features of the technical solution of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present application should be considered as being described in the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. An image annotation method, comprising:

s11, providing at least one annotation mode;

and S13, integrating the label on the image according to a template.

2. The method of claim 1, wherein the annotation pattern comprises at least one of: the system comprises a mood selection mode, a doodling mode, an annotation mode and an audio and video annotation mode.

3. The method of claim 1, wherein the step of S12 includes:

identifying the image information and/or scene information;

and if so, entering a labeling mode corresponding to the preset characteristic information.

4. The method according to any one of claims 1 to 3, wherein the manner of obtaining the label in the step S12 includes at least one of:

acquiring a corresponding label according to the marking instruction;

generating a label according to the editing operation;

5. The method according to any one of claims 1 to 3, wherein the step S13 includes:

displaying the annotation on the image;

saving the image with the label as a new image;

setting the label as a hidden mode, wherein the hidden mode is used for displaying the image by default and displaying the label when responding to preset operation;

and importing the image with the label into a template and storing the image.

6. The method of claim 5, wherein the hidden mode comprises at least one hidden layer for displaying different types of labels.

7. The image annotation method according to any one of claims 1 to 3, wherein after the step of S13, the method further comprises:

storing the label to preset equipment according to the type of the label;

and storing the image to preset equipment according to the preset information of the image.

8. The method of claim 7, wherein the pre-defined device comprises a local and/or server.

9. A terminal, characterized in that the terminal comprises: memory, processor, wherein the memory has stored thereon a program which, when executed by the processor, carries out the steps of the image annotation method according to any one of claims 1 to 8.

10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the image annotation method according to any one of claims 1 to 8.